kinogaki-codecs
kinogaki-codecs — audit issues
Audited 2026-06-20. Repo: libraries/kinogaki-codecs. Criteria: completeness · tests · separation of concerns · verb-named functions · file size (<1000 lines) · organization.
Health summary
A genuinely solid, well-factored codec library: six codecs (json, markdown, html, svg, text, blob) plus a bundle convention, all sharing a single document model, with a dependency-free JSON/XML/HTML parser stack and 28 tests / 234 checks that build clean (-Wall -Wextra) and pass. The most material problems are documentation drift (the README still presents five of the six codecs as future work) and a handful of honest-but-untested gaps: markdown nested lists are claimed in the header but silently unsupported, and the document/vector decoders never report errors (the ParseError* is ignored and decode always "succeeds"). No file approaches the size limit, there are no committed build artifacts, and separation of concerns is clean apart from two small copy-pasted helpers.
Issues
[HIGH] completeness — markdown nested lists are claimed but silently unsupported
- Where:
src/MarkdownCodec.cpp:220,src/MarkdownCodec.cpp:232; header claiminclude/kinogaki/codecs/MarkdownCodec.h:3 - Problem: The header advertises "ordered/unordered lists incl. nesting", but list parsing is gated on
indentOf(L[i]) == 0. An indented (nested) list marker never opens a sublist — its lines are absorbed into the parent item as continuation text, so- a\n - b\nloses the nested list entirely with no error. The claimed feature is not implemented, and no test exercises nesting. - Fix: Either implement indented sublist recognition (recurse
parseBlockson the de-indented item body — the item content already stripscontentCol, so a marker at the item's indent should start a nested list), or remove "incl. nesting" from the header and add an explicit "nested lists are flattened" note plus a test pinning the actual behaviour.
[MEDIUM] completeness — document/vector decoders ignore ParseError and never fail
- Where:
src/MarkdownCodec.cpp:335,src/HtmlCodec.cpp:398,src/SvgCodec.cpp:194 - Problem: All three
decode()signatures take aParseError*(per theCodecseam) but drop it on the floor — the parameter is unnamed and unused. They always return a Document, so malformed/truncated input (an unterminated<svg, a fence that never closes, garbage tags) yields a degraded document with no signal to the caller. OnlyJsonCodec::decodepopulatesParseError. This is the "error paths that silently swallow failures" smell from criterion 1. - Fix: Decide and document the contract. If these parsers are intentionally total (HTML/MD are lenient by design), say so in each header and consider not accepting a
ParseError*you can't fill; otherwise detect at least gross failures (unterminated raw/code/quote spans, no root element for SVG) and report them.
[MEDIUM] organization — README documents only json; the other five codecs are presented as future work
- Where:
README.md:9-28(esp.README.md:28) - Problem: The README's "Codecs" section describes only the JSON codec, then closes with "(More codecs — text, blob, csv, … — will join here…)". In fact markdown, html, svg, text, blob codecs and the whole bundle/filesystem convention are implemented, tested, and shipping. A reader (or agent) trusting the README would not know they exist. CANONICAL.md is accurate and current, which makes the README the stale artifact.
- Fix: Rewrite the Codecs section to list all shipped codecs (json, markdown, html, svg, text, blob) and the bundle convention, each one line, pointing at CANONICAL.md for the schema contract and the per-format round-trip/loss rules.
[MEDIUM] tests — no malformed-input tests for markdown/html/svg; lossy round-trips asserted but not pinned
- Where:
tests/markdown_codec_test.cpp,tests/html_codec_test.cpp,tests/svg_codec_test.cpp - Problem: Only JSON has negative/garbage tests (
tests/json_codec_test.cpp:89,:105). The document/vector codecs have no malformed-input cases (unterminated tags, unclosed fences/blockquotes, stray</close>, deeply nested input, a non-/svgroot for SVG encode is tested but decode robustness is not). CANONICAL.md promises a tabulated cross-format degradation table where "each row is a test", but the mdencodeof doc-v1-only constructs (definitionList → bold term, figure → italic caption, note → blockquote, rawHtml verbatim —src/MarkdownCodec.cpp:311-328) has no test asserting the degraded output. UTF-8 multibyte / entity edge cases beyond&/<are untested in html. - Fix: Add malformed-input cases per codec (assert they don't crash and produce a defined result), and add the html→md degradation tests CANONICAL.md commits to (one per doc-v1 construct).
[LOW] separation — safeSegment duplicated across two codecs
- Where:
src/JsonCodec.cpp:35andsrc/Bundle.cpp:50 - Problem: Two near-identical "filename/key → safe path segment" routines. JsonCodec's fails closed on an unsafe key (correct for untrusted JSON); Bundle's sanitises and disambiguates. The shared concept (what makes a valid
[A-Za-z0-9_]segment) is reimplemented, so a future change to the path charset must be made in two places. - Fix: Factor the "is this a representable segment" predicate into one shared helper (a small
internal/header), keeping the two distinct policies (reject vs. sanitise) as thin wrappers over it.
[LOW] separation — entity/escape helpers duplicated between HtmlCodec and SvgCodec
- Where:
src/HtmlCodec.cpp:18-64andsrc/SvgCodec.cpp:24-60 - Problem:
appendUtf8,decodeEntities,escapeText,escapeAttrare copy-pasted verbatim between the two markup codecs.SvgCodec.cpp:23acknowledges this ("kept local so the codec is self-contained"), so it is a conscious tradeoff, but it is still duplicated XML-escaping logic that can drift. - Fix: Optional — extract a shared
XmlTexthelper header. If self-containment is preferred, leave a cross-reference comment in both files so a fix to one is mirrored.
[LOW] completeness — JSON \u surrogate pairs not combined (documented, untested)
- Where:
src/Json.cpp:196 - Problem: The string parser decodes each
\uXXXXindependently and never combines a high/low surrogate pair into the astral code point; a comment admits "surrogate pairs aren't combined here; lone units encode as-is". Emoji and other supplementary-plane characters written as surrogate pairs decode to invalid UTF-8. No test covers\uescapes at all (the escape test attests/json_codec_test.cpp:66uses only\n\t\"\\). - Fix: Combine surrogate pairs (
0xD800–0xDBFFfollowed by0xDC00–0xDFFF) into the 4-byte UTF-8 form, and add a😀-style round-trip test. If out of scope for v1, keep the comment but add a test pinning the current behaviour so it's a known, not a surprise.
[LOW] naming — function names are clean
- Where:
src/(all files) - Problem: None material. Free functions read as verbs (
escapeTo,numberTo,dumpTo,parseInline,parseBlocks,renderBlocks,buildChildren,writeElement,materializeFile,addFile,addFolder,graft). A few helpers are predicate-/noun-shaped —attr,lower,root,standalone,safeSegment,docCodec,svgCodec,entries,fileName,leavesOf,childrenOf— but these are accessors/queries/factories where the noun form reads naturally and matches the surrounding codebase style;boolpredicates are appropriatelyis*/blank/contains. Not worth churning. - Fix: None required.
[LOW] filesize — clean
- Where:
src/ - Problem: Largest source file is
src/HtmlCodec.cppat 413 lines; next issrc/MarkdownCodec.cppat 348. Nothing exceeds 800, let alone 1000. - Fix: None required.
[LOW] organization — no committed build artifacts; layout is clean
- Where: repo root
- Problem: None.
build/and.DS_Storeare git-ignored (.gitignore), no.o/.a/_siteis tracked, headers mirror sources, tests mirror codecs, bothbuild.sh(clang loop) and CMake are present and consistent. The only orphan-adjacent note is the stale README (covered above). - Fix: None required.