Concepts¶
KindaXML is designed to annotate text without building a deep DOM. The parser emits a flat list of text segments and optional markers, recovering from common model mistakes.
Core ideas¶
- Annotation-first: tags annotate text spans rather than create nested trees.
- Deterministic recovery: malformed tags still map to predictable spans.
- Shallow structure: open tags auto-close when the next tag starts to avoid runaway nesting.
Output model¶
The parser returns:
text: tag-stripped textsegments: ordered runs of text, each with zero or more annotationsmarkers: zero-width annotations for self-closing tags
Example (conceptual):
[
{"text": "We shipped ", "annotations": []},
{"text": "last week", "annotations": [{"tag": "cite", "attrs": {"id": "1"}}]},
{"text": ".", "annotations": []}
]
Runnable example¶
Run:
cargo run -q --example basic
Expected output (excerpt):
=== Inline span ===
Original text:
We shipped <cite id="1">last week</cite>.
Parsed text:
We shipped last week.
Segments:
- 'We shipped '
- 'last week' (cite [id="1"])
- '.'
CI tip: compare the excerpt above with the matching section in the command output to verify the annotation model.