[Pandoc](https://pandoc.org/) can convert pretty much any document format into any other. The version shipped with Solus in early 2023 is `2.5` which lacks a huge number of features, such as RTF and Github-flavored markdown support. # Usage Check out the [list of examples](https://pandoc.org/demos.html) on the official site. ```sh pandoc -s infile.html --to gfm -o outfile.md ``` The `-s` parameter explicitly indicates the source file, it will often work without it, but sometimes it will get confused and its error messages are not good, so just use `-s`. If `-o` isn't specified, it will write the converted document to stdout. The `--to` specifies the exact output document format in case the file extensions are misleading or not precise enough. Particularly useful for specifying Github Flavored Markdown (`gfm`) as shown above. There is also a `--from` which does the same for the input document. # Limitations Cannot convert *from* PDF, but does a great job converting *to* PDF. # Tips ## Math Unicode Characters with LaTeX - When `pandoc` uses a [[LaTeX]] backend, it is possible to include files as part of the preamble. - When [[LaTeX]] encounters a [[Unicode Standard|Unicode]] character that it doesn't know about, it fails, because it needs a way to map the data to a shape. - To map Unicode characters to [[LaTeX]] it requires some additional definitions, either to existing [[LaTeX]] shapes or to bespoke shapes using specific fonts. - [[Elmar Zander]] created a [[LaTeX]] package which maps many math-related Unicode characters to existing shapes. Using the above context, we can load Zander's package to help alleviate issues around some common characters that [[LaTeX]] doesn't know about. In this example, we are converting a normal [[HTML]] web page into a [[PDF]] using the `utf8math` package. ```sh pandoc input.html -o output.pdf --resource-path="$HOME/Source/Clones/utf8math/" --include-in-header="utf8math.sty" ``` ## Fixing Broken Tables Rendered Markdown table widths in Pandoc are determined by the number of en-dashes. Since most reasonable Markdown systems use a fixed-width font, where the en-dashes are the same width as other characters, this create a huge problem when they are rendered to a variable width font. Too often, the table columns are too narrow and the text overlaps. Instead of fixing this, pandoc decided to introduce its own Markdown table format, which doesn't suffer from this problem. This makes tables in Pandoc broken and working Pandoc tables incompatible. Utterly bizarre and frustrating. So, either force your Markdown editor to add a bunch of extra dashes between the header and body of the table, or else rewrite all of your tables in a format which no other system can preview and render. - https://github.com/Wandmalfarbe/pandoc-latex-template/issues/160 - https://pandoc.org/MANUAL.html#tables ## Combining Multiple Documents ```sh pandoc -s -f markdown -t pdf -o ~/Documents/OUTFILE.pdf *.md ``` Multiple input files can be listed in order and they will all be combined. - https://unix.stackexchange.com/questions/77207/how-to-compile-a-selection-of-markdown-documents # Alternatives Depending what the goal is, specialized commands like [[html2md by suntong - doc converter|html2md]] or [[html2text by jaytaylor - doc converter]] might work better.