Module gutenberg_post_parser::parser[][src]

The Gutenberg post parser.

The Gutenberg post parser is a parser combinator. Thus it provides mulitple parsers, aka combinators. They are based on the nom project. Each parser receives an input, and produces an output of kind IResult.

The writing of parsers heavily relies on Rust macros. Don't be surprise! To learn more, consult the documentation. Nonetheless, a grammar is maintained with the EBNF notation hereinbelow.

Grammar

This section describes the Gutenberg post grammar with the Extended Backus-Naur form (EBNF) metasyntax notation.

block_list

This rule is the axiom of the grammar.

block_list =
    { block | phrase } ;

block

A balanced block has an opening and a closing tag. Their names must be identical, i.e. the respective namespaces and names must match. A void block is an “auto-closing” block.

A balanced block can have children, while a void block cannot.

block =
    block_balanced | block_void ;

block_balanced =
    "<!--", [ wss ], "wp:", block_name, wss, block_attributes, [ wss ], "-->",
    block_list,
    "<!--", [ wss ], "/wp:", block_name, [ wss ], "-->" ;

block_void =
    "<!--", [ wss ], "wp:", block_name, wss, block_attributes, [ wss ], "/-->" ;

block_name

A block name is a pair composed of a namespace, and a name. The namespace is optional, and defaults to core.

block_name =
    namespaced_block_name | core_block_name ;

namespaced_block_name =
    block_name_part, "/", block_name_part ;

core_block_name =
    block_name_part ;

block_name_part =
    letter, { letter | digit } ;

letter =
    "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" |
    "l" | "m" | "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" |
    "w" | "x" | "y" | "z" ;

digit =
    "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;

block_attributes

Block attributes must be a valid JSON object, like defined in the RFC 7159. It therefore must start by { and end by }:

block_attributes =
    ? RFC 7159, JSON, Section 4. Objects ? ;

phrase

A phrase is anything that is not a block.

phrase =
    anything - "<!--" ;

anything =
    ? any bytes ? ;

wss

Whitespace is shortened to ws, and whitespaces is shortened to wss.

wss =
    ws, { ws } ;

ws =
    " "  (* U+0020 *)
  | "\n" (* U+000A *)
  | "\r" (* U+000D *)
  | "\t" (* U+0009 *) ;

Functions

block

Recognize a block.

block_attributes

Recognize block attributes.

block_list

Axiom of the grammar: Recognize a list of blocks.

block_name

Recognize a fully-qualified block name.

block_name_part

Recognize a block name part.

core_block_name

Recognize a globally-namespaced block name.

namespaced_block_name

Recognize a namespaced block name.

phrase

Recognize a phrase.

whitespaces

Recognize whitespaces.