Module gutenberg_post_parser::parser [−][src]
The Gutenberg post parser.
The Gutenberg post parser is a parser combinator. Thus it provides
mulitple parsers, aka combinators. They are based on the nom
project. Each parser receives an input, and produces an output of kind
IResult
.
The writing of parsers heavily relies on Rust macros. Don't be surprise! To learn more, consult the documentation. Nonetheless, a grammar is maintained with the EBNF notation hereinbelow.
Grammar
This section describes the Gutenberg post grammar with the Extended Backus-Naur form (EBNF) metasyntax notation.
block_list
This rule is the axiom of the grammar.
block_list =
{ block | phrase } ;
block
A balanced block has an opening and a closing tag. Their names must be identical, i.e. the respective namespaces and names must match. A void block is an “auto-closing” block.
A balanced block can have children, while a void block cannot.
block =
block_balanced | block_void ;
block_balanced =
"<!--", [ wss ], "wp:", block_name, wss, block_attributes, [ wss ], "-->",
block_list,
"<!--", [ wss ], "/wp:", block_name, [ wss ], "-->" ;
block_void =
"<!--", [ wss ], "wp:", block_name, wss, block_attributes, [ wss ], "/-->" ;
block_name
A block name is a pair composed of a namespace, and a name. The
namespace is optional, and defaults to core
.
block_name =
namespaced_block_name | core_block_name ;
namespaced_block_name =
block_name_part, "/", block_name_part ;
core_block_name =
block_name_part ;
block_name_part =
letter, { letter | digit } ;
letter =
"a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" |
"l" | "m" | "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" |
"w" | "x" | "y" | "z" ;
digit =
"0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
block_attributes
Block attributes must be a valid JSON object, like defined in the
RFC 7159. It therefore must start by {
and end by }
:
block_attributes =
? RFC 7159, JSON, Section 4. Objects ? ;
phrase
A phrase is anything that is not a block.
phrase =
anything - "<!--" ;
anything =
? any bytes ? ;
wss
Whitespace is shortened to ws
, and whitespaces is shortened to wss
.
wss =
ws, { ws } ;
ws =
" " (* U+0020 *)
| "\n" (* U+000A *)
| "\r" (* U+000D *)
| "\t" (* U+0009 *) ;
Functions
block |
Recognize a block. |
block_attributes |
Recognize block attributes. |
block_list |
Axiom of the grammar: Recognize a list of blocks. |
block_name |
Recognize a fully-qualified block name. |
block_name_part |
Recognize a block name part. |
core_block_name |
Recognize a globally-namespaced block name. |
namespaced_block_name |
Recognize a namespaced block name. |
phrase |
Recognize a phrase. |
whitespaces |
Recognize whitespaces. |