Module gutenberg_post_parser::parser [−][src]
The Gutenberg post parser.
The Gutenberg post parser is a parser combinator. Thus it provides
mulitple parsers, aka combinators. They are based on the nom
project. Each parser receives an input, and produces an output of kind
IResult.
The writing of parsers heavily relies on Rust macros. Don't be surprise! To learn more, consult the documentation. Nonetheless, a grammar is maintained with the EBNF notation hereinbelow.
Grammar
This section describes the Gutenberg post grammar with the Extended Backus-Naur form (EBNF) metasyntax notation.
block_list
This rule is the axiom of the grammar.
block_list =
{ block | phrase } ;
block
A balanced block has an opening and a closing tag. Their names must be identical, i.e. the respective namespaces and names must match. A void block is an “auto-closing” block.
A balanced block can have children, while a void block cannot.
block =
block_balanced | block_void ;
block_balanced =
"<!--", [ wss ], "wp:", block_name, wss, block_attributes, [ wss ], "-->",
block_list,
"<!--", [ wss ], "/wp:", block_name, [ wss ], "-->" ;
block_void =
"<!--", [ wss ], "wp:", block_name, wss, block_attributes, [ wss ], "/-->" ;
block_name
A block name is a pair composed of a namespace, and a name. The
namespace is optional, and defaults to core.
block_name =
namespaced_block_name | core_block_name ;
namespaced_block_name =
block_name_part, "/", block_name_part ;
core_block_name =
block_name_part ;
block_name_part =
letter, { letter | digit } ;
letter =
"a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" |
"l" | "m" | "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" |
"w" | "x" | "y" | "z" ;
digit =
"0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
block_attributes
Block attributes must be a valid JSON object, like defined in the
RFC 7159. It therefore must start by { and end by }:
block_attributes =
? RFC 7159, JSON, Section 4. Objects ? ;
phrase
A phrase is anything that is not a block.
phrase =
anything - "<!--" ;
anything =
? any bytes ? ;
wss
Whitespace is shortened to ws, and whitespaces is shortened to wss.
wss =
ws, { ws } ;
ws =
" " (* U+0020 *)
| "\n" (* U+000A *)
| "\r" (* U+000D *)
| "\t" (* U+0009 *) ;
Functions
| block |
Recognize a block. |
| block_attributes |
Recognize block attributes. |
| block_list |
Axiom of the grammar: Recognize a list of blocks. |
| block_name |
Recognize a fully-qualified block name. |
| block_name_part |
Recognize a block name part. |
| core_block_name |
Recognize a globally-namespaced block name. |
| namespaced_block_name |
Recognize a namespaced block name. |
| phrase |
Recognize a phrase. |
| whitespaces |
Recognize whitespaces. |