[Pandoc](https://pandoc.org/) can convert pretty much any document format into any other. The version shipped with Solus in early 2023 is `2.5` which lacks a huge number of features, such as RTF and Github-flavored markdown support. # Usage Check out the [list of examples](https://pandoc.org/demos.html) on the official site. ```sh pandoc -s infile.html --to gfm -o outfile.md ``` The `-s` parameter explicitly indicates the source file, it will often work without it, but sometimes it will get confused and its error messages are not good, so just use `-s`. If `-o` isn't specified, it will write the converted document to stdout. The `--to` specifies the exact output document format in case the file extensions are misleading or not precise enough. Particularly useful for specifying Github Flavored Markdown (`gfm`) as shown above. There is also a `--from` which does the same for the input document. # Limitations ## Cannot Convert *From* PDF Cannot convert *from* PDF, but does a great job converting **to** PDF. To extract content from a [[PDF]] use [[Poppler]] instead: ```sh pdftohtml document.pdf ``` For further conversions from [[HTML]], use [[html2md by suntong - doc converter|html2md]] or similar: ```sh html2md document.html ``` # Tips ## Math Unicode Characters with LaTeX - When `pandoc` uses a [[LaTeX]] backend, it is possible to include files as part of the preamble. - When [[LaTeX]] encounters a [[Unicode Standard|Unicode]] character that it doesn't know about, it fails, because it needs a way to map the data to a shape. - To map Unicode characters to [[LaTeX]] it requires some additional definitions, either to existing [[LaTeX]] shapes or to bespoke shapes using specific fonts. - [[Elmar Zander]] created a [[LaTeX]] package which maps many math-related Unicode characters to existing shapes. Using the above context, we can load Zander's package to help alleviate issues around some common characters that [[LaTeX]] doesn't know about. In this example, we are converting a normal [[HTML]] web page into a [[PDF]] using the `utf8math` package. ```sh pandoc input.html -o output.pdf --resource-path="$HOME/Source/Clones/utf8math/" --include-in-header="utf8math.sty" ``` ## Fixing Broken Tables Rendered Markdown table widths in Pandoc are determined by the number of en-dashes. Since most reasonable Markdown systems use a fixed-width font, where the en-dashes are the same width as other characters, this create a huge problem when they are rendered to a variable width font. Too often, the table columns are too narrow and the text overlaps. Instead of fixing this, pandoc decided to introduce its own Markdown table format, which doesn't suffer from this problem. This makes standard markdown tables broken and working Pandoc tables incompatible with other markdown parsers. You're left with the choice to either force your Markdown editor to add a bunch of extra dashes between the header and body of the table, or else rewrite all of your tables in a format which no other system can preview and render. Utterly bizarre and frustrating. - https://github.com/Wandmalfarbe/pandoc-latex-template/issues/160 - https://pandoc.org/MANUAL.html#tables ## Combining Multiple Documents ```sh pandoc -s -f markdown -t pdf -o ~/Documents/OUTFILE.pdf *.md ``` Multiple input files can be listed in order and they will all be combined. - https://unix.stackexchange.com/questions/77207/how-to-compile-a-selection-of-markdown-documents ## Disable Word-Wrap Sometimes when converting to HTML or Markdown, `pandoc` will do this thing where it inserts a bunch of line breaks at a certain column of text. This can be disabled with `--wrap=none`. ## Converting Manpages to Markdown `pandoc` supports this out of the box, the only trick is knowing to use `zcat` if the man pages are compressed and using the [[#Disable Word-Wrap]] option above to get usable output. ```sh zcat /usr/share/man/man4/console_codes.4.gz | pandoc --wrap=none --from man --to gfm -o console_codes.md ``` There are some alternatives: - https://github.com/phillbush/man2md (awk) - https://github.com/mle86/man-to-md (perl) # Alternatives Depending what the goal is, specialized commands like [[html2md by suntong - doc converter|html2md]] or [[html2text by jaytaylor - doc converter|html2text]] might produce superior or faster results.