site

site — audit issues

Audited 2026-06-20. Repo: software/site. Criteria: completeness · tests · separation of concerns · verb-named functions · file size (<1000 lines) · organization.

Health summary

kinogaki-site is a tight, readable ~285-line builder that does one thing well: compose an ordered page list (template.prisma) with per-page /document docs and derive the chrome. The code is clean, well-commented, and the function names mostly read as verbs. The two real concerns are an HTML-injection surface (the rawHtml escape hatch and link/image hrefs flow to disk unsanitized) and a complete absence of tests for logic — slugify, groupCodeTabs, escaping, and heading derivation — that is pure and trivially testable. A couple of derivations (heading-id derivation and the TOC) only scan top-level blocks, so headings nested in notes/quotes/figures silently lose their anchors.

Issues

[HIGH] tests — no tests exist for highly testable pure logic

Where: repo root (git ls-files shows only .gitignore, README.md, build.sh, src/{Site.cpp,Site.h,main.cpp}); no test/tests dir, no test target in build.sh.
Problem: Confirmed: the component has zero tests. Yet most of its logic is pure string→string transformation that is ideal to unit-test, and the rendering powers every docs subdomain — a silent regression ships everywhere at once.
Fix: Add a tests/ dir + a test target in build.sh. Concrete cases to cover:
- slugify (Site.cpp:62): mixed case → lowercase; runs of punctuation/spaces collapse to a single -; leading/trailing dashes stripped; non-ASCII / all-punctuation input → empty (and that empty ids are then skipped); "Hello, World!" → hello-world.
- groupCodeTabs (Site.cpp:97): adjacent cpp+python pair → one .codegroup with C++ pane first regardless of source order; python-then-cpp also orders C++ first; two cpp blocks (la == lb) NOT grouped; cpp + non-code (js) NOT grouped; cpp followed by prose then python NOT grouped (only whitespace allowed between); a single trailing block left untouched; an unterminated <pre><code class="language- (no closing " or </code></pre>) handled without hang/overrun; body with no code blocks passes through verbatim.
- esc (Site.cpp:42) and HtmlCodec escapeText/escapeAttr: &,<,>," mapped; ordering (no double-escaping of &).
- deriveHeadingIds / headings (Site.cpp:75, 85): id derived only when absent; existing id preserved; only level-2 headings enter the TOC; empty-slug headings skipped.
- plainText (Site.cpp:53): concatenates nested text+code leaves.
- End-to-end buildSite: a tiny template + two docs → assert prev/next links, active-state, group ordering by first appearance, and the missing-doc / empty-pages / bad-template error returns (each returns 1).

[HIGH] completeness — rawHtml and link/image hrefs reach disk unsanitized (HTML/JS injection)

Where: HtmlCodec.cpp:389-390 (rawHtml → out += getStr(...) verbatim); HtmlCodec.cpp:326-327 (link href, image src via escapeAttr only); consumed at Site.cpp:210 (out << groupCodeTabs(*body)).
Problem: The body HTML is inlined into each page with no sanitization. A /document doc containing a rawHtml node injects arbitrary markup/<script> into every page built from it; a link with href="javascript:…" survives escapeAttr (which only handles &"<) and yields a clickable script URL. Docs are authored/converted (incl. via the MCP server per the README), so content is not always fully trusted. esc (Site.cpp:42) guards only the renderer's own chrome strings, not document body. Note this is partly a kinogaki-codecs concern, but site is where untrusted body becomes a published file.
Fix: Decide and document the trust boundary. If docs are trusted, say so explicitly in README/header (rawHtml is intentionally an "escape hatch"). If not, scrub javascript:/data: (non-image) URL schemes in link/image and gate or sanitize rawHtml (allowlist svg/math only, which is the codec's stated intent at HtmlCodec.cpp:150). At minimum add a comment at Site.cpp:210 noting the body is emitted unsanitized.

[MEDIUM] completeness — heading-id derivation and TOC only scan top-level blocks

Where: deriveHeadingIds (Site.cpp:75-81) and headings (Site.cpp:85-91) both iterate orderedChildren(doc, documentRoot()) only.
Problem: HtmlCodec nests headings inside note, blockquote, figure, and list items (HtmlCodec.cpp parses blocks recursively). A level-2 heading inside, e.g., a <div class="note"> gets no derived id and never appears in the on-this-page TOC, so its anchor link is dead and the IntersectionObserver spy (Site.cpp:231, observes .content h2[id]) ignores it. Silent, content-dependent.
Fix: Recurse into child blocks when deriving ids and collecting headings (or document that only top-level headings are TOC-eligible by design). A shared recursive walk would serve both functions.

[MEDIUM] completeness — partial writes leave a half-built site on first error; some failures unreported

Where: buildSite loop returns 1 on the first bad/missing doc (Site.cpp:170, 172, 175, 234) after already writing earlier pages; asset/css copy ignores ec (Site.cpp:240-243).
Problem: A failure on page N leaves pages 0..N-1 written to outDir with no rollback and a nonzero exit — a deploy script that ignores exit code (or a re-run) can publish a partial site. The style.css/assets copy swallows its std::error_code silently: a missing or unreadable style.css (Site.cpp:240, guarded by if (auto css = ...)) and a failed assets recursive copy (Site.cpp:242-243) produce no warning, so an unstyled site builds "successfully".
Fix: Either validate/parse all docs before writing any output, or write to a temp dir and swap on full success. Warn (at least to stderr) when style.css is absent or the assets copy sets ec.

Where: Site.cpp (249 lines): file I/O (readFile/writeFile), pure transforms (slugify, plainText, groupCodeTabs, esc), model derivation (deriveHeadingIds, headings), and the large inline HTML-templating block in buildSite (Site.cpp:177-231) are all together.
Problem: Not a god file at this size, and the layering vs. kinogaki-codecs (which owns Document parsing + HtmlCodec body rendering) is genuinely clean. But the page-chrome HTML is a 50-line raw-string ostringstream wall embedded mid-loop, mixing structure, derivation, and presentation; it is also the hardest part to test because it is not a separable function.
Fix: Optional. Extracting the chrome into a renderPage(const Page&, body, toc, prev, next) -> std::string free function would isolate the template from the build loop and make it unit-testable (supports the tests finding above). Pure transforms could move to a Render.{h,cpp} if the file grows.

[LOW] naming — clean

Where: Site.cpp.
Problem: None blocking. Functions read as verbs or are acceptable: readFile, writeFile, buildSite, slugify, deriveHeadingIds, groupCodeTabs. esc (Site.cpp:42), href (Site.cpp:50), plainText (Site.cpp:53), and headings (Site.cpp:85) are noun/abbreviation-named functions that return a value rather than reading as verbs — minor. (Page/Head are types, exempt.)
Fix: If touched, rename to verbs: esc→escapeChrome, href→hrefFor, plainText→flattenText/collectText, headings→collectHeadings. Low priority.

[LOW] filesize — clean

Where: all sources.
Problem: No file exceeds the thresholds. src/Site.cpp 249, src/Site.h 19, src/main.cpp 17 lines. Nothing over 800, nothing over 1000.
Fix: None.

[LOW] organization — committed build/ dir present in tree, thin README build section

Where: build/ directory (contains kinogaki-site and a stale prisma-site binary) on disk; .gitignore = build/; README.md "Build".
Problem: The build/ dir holds binaries including a prisma-site artifact from the pre-rename era (Prism→kinogaki); it is correctly gitignored and NOT tracked (git ls-files confirms only source is committed), so this is hygiene, not a committed-artifact problem. The README documents build + usage well but has no Layout/structure section and (relatedly) nothing on testing — appropriate to add once tests exist. The stale prisma-site binary is dead/orphaned and should be removed from the working tree.
Fix: rm build/prisma-site (and optionally clean build/ between renames). Add a short "Tests" section to README once a test target lands.

‹ Editor Python ›