Skip to content

Input -> parsed (1-to-1)

Each subsection shows the original input snippet and the exact parsed output from the example.

Inline span (explicit close)

Showing how tags are parsed and annotated.

Parse settings: default, aka:

ParserConfig(
   recognized_tags=['cite'],
   unknown_mode='strip',
   per_tag_recovery={'cite': retro_line},
   trim_punctuation=true,
   autoclose_on_any_tag=true,
   autoclose_on_same_tag=true,
   case_sensitive_tags=false
)

Input

We shipped <cite id=1>last week</cite>.

Rendered Output

We shipped last week.

Technical detail: the cite tag is recognized and closed normally, so the annotation applies only to the inner span.

Tag annotations

All tags annotations support with quoted attributes, boolean attributes, and multiple attributes.

Parse settings: Tag annotations can have multiple attributes, including boolean attributes (no value), with or without quotes., aka:

ParserConfig(
   recognized_tags=['tag'],
   unknown_mode='strip',
   per_tag_recovery={},
   trim_punctuation=true,
   autoclose_on_any_tag=true,
   autoclose_on_same_tag=true,
   case_sensitive_tags=true
)

Input

Words can have <tag a=1 b='two' c d="4">multiple attributes</tag>. Word can also have <tag 59=42 9000>number as attribute</tag>. Attribute <tag no=quote>without</tag> quotation mark works, and there will be best-effort to auto-close <tag att='one two three>un-closed quotation marks</tag>. Unrecognized tags are <unknown foo=bar>auto dropped</unknown> with the default config.

Rendered Output

Words can have multiple attributes. Word can also have number as attribute. Attribute without quotation mark works, and there will be best-effort to auto-close un-closed quotation marks. Unrecognized tags are auto dropped with the default config.

Technical detail: risk is configured with forward_next_token, so only the next token is annotated for the second tag. Whereas mytag uses retro_line, so it attaches backward to the start of the line.

Retroactive cite (unclosed + auto-close)

Showing how unclosed tags are handled.

Parse settings: default, aka:

ParserConfig(
   recognized_tags=['cite'],
   unknown_mode='strip',
   per_tag_recovery={'cite': retro_line},
   trim_punctuation=true,
   autoclose_on_any_tag=true,
   autoclose_on_same_tag=true,
   case_sensitive_tags=false
)

Input

We shipped last week <cite id=1>. More info <note>soon.

Rendered Output

We shipped last week . More info soon.

Technical detail: cite is configured with retro_line, so the unclosed tag attaches backward on the same line up to the tag start. The following <note> triggers auto-close behavior, because it reaches the end of the line.

Forward token (per-tag strategy)

Setting individual tag recovery strategies.

Parse settings: risk recovery strategy set to forward_next_token, aka:

ParserConfig(
   recognized_tags=['mytag', 'risk'],
   unknown_mode='strip',
   per_tag_recovery={'mytag': retro_line, 'risk': forward_next_token},
   trim_punctuation=true,
   autoclose_on_any_tag=true,
   autoclose_on_same_tag=true,
   case_sensitive_tags=true
)

Input

Risks: <mytag level=high> load tests are late. <risk level=low>Docs slipping.

Rendered Output

Risks: load tests are late. Docs slipping.

Technical detail: risk is configured with forward_next_token, so only the next token is annotated for the second tag. Whereas mytag uses retro_line, so it attaches backward to the start of the line.

Self-closing markers

Using self-closing tags will emit zero-width markers.

Parse settings: default, aka:

ParserConfig(
   recognized_tags=['todo'],
   unknown_mode='strip',
   per_tag_recovery={},
   trim_punctuation=true,
   autoclose_on_any_tag=true,
   autoclose_on_same_tag=true,
   case_sensitive_tags=true
)

Input

Todo list: <todo id=7/>finish rollout <todo/> update docs.

Rendered Output

Todo list: _finish rollout _ update docs.

Technical detail: Self-closing `` tags emit zero-width markers at their positions instead of annotating a span.