-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify and reorganize Lua filter introduction #9106
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,34 +6,19 @@ date: 'January 10, 2020' | |
title: Pandoc Lua Filters | ||
--- | ||
|
||
# Introduction | ||
Create custom outputs with pandoc's embedded Lua engine. | ||
|
||
Pandoc has long supported filters, which allow the pandoc | ||
abstract syntax tree (AST) to be manipulated between the parsing | ||
and the writing phase. [Traditional pandoc | ||
filters](https://pandoc.org/filters.html) accept a JSON | ||
representation of the pandoc AST and produce an altered JSON | ||
representation of the AST. They may be written in any | ||
programming language, and invoked from pandoc using the | ||
`--filter` option. | ||
|
||
Although traditional filters are very flexible, they have a | ||
couple of disadvantages. First, there is some overhead in | ||
writing JSON to stdout and reading it from stdin (twice, once on | ||
each side of the filter). Second, whether a filter will work | ||
will depend on details of the user's environment. A filter may | ||
require an interpreter for a certain programming language to be | ||
available, as well as a library for manipulating the pandoc AST | ||
in JSON form. One cannot simply provide a filter that can be | ||
used by anyone who has a certain version of the pandoc | ||
executable. | ||
|
||
Starting with version 2.0, pandoc makes it possible to write | ||
filters in Lua without any external dependencies at all. A Lua | ||
## Introduction | ||
|
||
With Lua filters, you can write Pandoc filters without any | ||
external dependencies. Besides the simpler set-up, Lua filters are | ||
generally faster and can access utility functions to manipulate | ||
document elements. | ||
|
||
Since Pandoc 2.0, the pandoc executable has a built-in Lua | ||
interpreter (version 5.4) and a Lua library for creating pandoc | ||
filters is built into the pandoc executable. Pandoc data types | ||
are marshaled to Lua directly, avoiding the overhead of writing | ||
JSON to stdout and reading it from stdin. | ||
filters. Pandoc data types are marshaled to Lua directly, avoiding | ||
the overhead of writing JSON to stdout and reading it from stdin. | ||
|
||
Here is an example of a Lua filter that converts strong emphasis | ||
to small caps: | ||
|
@@ -62,17 +47,31 @@ replace it with a SmallCaps element with the same content. | |
To run it, save it in a file, say `smallcaps.lua`, and invoke | ||
pandoc with `--lua-filter=smallcaps.lua`. | ||
|
||
## Why Lua filters over JSON? | ||
|
||
[JSONfilters](https://pandoc.org/filters.html) accept a JSON | ||
representation of the pandoc AST and produce an altered JSON | ||
representation of the AST. They may be written in any programming | ||
language, and invoked from pandoc using the `--filter` option. | ||
|
||
However, JSON filters have limitations: | ||
|
||
- Writing JSON to stdout and reading it from stdin (twice, once | ||
on each side of the filter) is inefficient. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this line should be indented so it lines up to the list content. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. also, I think the parenthetical comment could be removed |
||
- External dependencies vary between users, and universal JSON | ||
filters are not possible. | ||
Comment on lines
+61
to
+62
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think it will be clear to readers what is meant by "universal JSON filters" or why dependency variation is important. I think the original text on this was clearer. |
||
|
||
Here's a quick performance comparison, converting the pandoc | ||
manual (MANUAL.txt) to HTML, with versions of the same JSON | ||
filter written in compiled Haskell (`smallcaps`) and interpreted | ||
Python (`smallcaps.py`): | ||
|
||
Command Time | ||
--------------------------------------- ------- | ||
`pandoc` 1.01s | ||
`pandoc --filter ./smallcaps` 1.36s | ||
`pandoc --filter ./smallcaps.py` 1.40s | ||
`pandoc --lua-filter ./smallcaps.lua` 1.03s | ||
manual (MANUAL.txt) to HTML, with versions of the same JSON filter | ||
written in compiled Haskell (`smallcaps`) and interpreted Python | ||
(`smallcaps.py`): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. space at end of line |
||
|
||
Command Time | ||
--------------------------------------- ------- | ||
`pandoc` 1.01s | ||
`pandoc --filter ./smallcaps` 1.36s | ||
`pandoc --filter ./smallcaps.py` 1.40s | ||
`pandoc --lua-filter ./smallcaps.lua` 1.03s | ||
|
||
Comment on lines
+69
to
75
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It looks like this is indented 4 spaces and thus a code block instead of a table? What this change? |
||
As you can see, the Lua filter avoids the substantial overhead | ||
associated with marshaling to and from JSON over a pipe. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
faster than what? Since the text mentioning JSON filters was removed, this is no longer clear.