site
site — audit issues
Audited 2026-06-20. Repo: software/site. Criteria: completeness · tests · separation of concerns · verb-named functions · file size (<1000 lines) · organization.
Health summary
kinogaki-site is a tight, readable ~285-line builder that does one thing well: compose an ordered page list (template.prisma) with per-page /document docs and derive the chrome. The code is clean, well-commented, and the function names mostly read as verbs. The two real concerns are an HTML-injection surface (the rawHtml escape hatch and link/image hrefs flow to disk unsanitized) and a complete absence of tests for logic — slugify, groupCodeTabs, escaping, and heading derivation — that is pure and trivially testable. A couple of derivations (heading-id derivation and the TOC) only scan top-level blocks, so headings nested in notes/quotes/figures silently lose their anchors.
Issues
[HIGH] tests — no tests exist for highly testable pure logic
- Where: repo root (
git ls-filesshows only.gitignore,README.md,build.sh,src/{Site.cpp,Site.h,main.cpp}); notest/testsdir, no test target inbuild.sh. - Problem: Confirmed: the component has zero tests. Yet most of its logic is pure string→string transformation that is ideal to unit-test, and the rendering powers every docs subdomain — a silent regression ships everywhere at once.
-
Fix: Add a
tests/dir + a test target inbuild.sh. Concrete cases to cover:slugify(Site.cpp:62): mixed case → lowercase; runs of punctuation/spaces collapse to a single-; leading/trailing dashes stripped; non-ASCII / all-punctuation input → empty (and that empty ids are then skipped);"Hello, World!"→hello-world.groupCodeTabs(Site.cpp:97): adjacent cpp+python pair → one.codegroupwith C++ pane first regardless of source order; python-then-cpp also orders C++ first; two cpp blocks (la == lb) NOT grouped; cpp + non-code (js) NOT grouped; cpp followed by prose then python NOT grouped (only whitespace allowed between); a single trailing block left untouched; an unterminated<pre><code class="language-(no closing"or</code></pre>) handled without hang/overrun; body with no code blocks passes through verbatim.esc(Site.cpp:42) and HtmlCodecescapeText/escapeAttr:&,<,>,"mapped; ordering (no double-escaping of&).deriveHeadingIds/headings(Site.cpp:75, 85): id derived only when absent; existing id preserved; only level-2 headings enter the TOC; empty-slug headings skipped.plainText(Site.cpp:53): concatenates nestedtext+codeleaves.- End-to-end
buildSite: a tiny template + two docs → assert prev/next links, active-state, group ordering by first appearance, and the missing-doc / empty-pages / bad-template error returns (each returns 1).
[HIGH] completeness — rawHtml and link/image hrefs reach disk unsanitized (HTML/JS injection)
- Where: HtmlCodec.cpp:389-390 (
rawHtml→out += getStr(...)verbatim); HtmlCodec.cpp:326-327 (linkhref,imagesrc viaescapeAttronly); consumed at Site.cpp:210 (out << groupCodeTabs(*body)). - Problem: The body HTML is inlined into each page with no sanitization. A
/documentdoc containing arawHtmlnode injects arbitrary markup/<script>into every page built from it; alinkwithhref="javascript:…"survivesescapeAttr(which only handles&"<) and yields a clickable script URL. Docs are authored/converted (incl. via the MCP server per the README), so content is not always fully trusted.esc(Site.cpp:42) guards only the renderer's own chrome strings, not document body. Note this is partly a kinogaki-codecs concern, but site is where untrusted body becomes a published file. - Fix: Decide and document the trust boundary. If docs are trusted, say so explicitly in README/header (rawHtml is intentionally an "escape hatch"). If not, scrub
javascript:/data:(non-image) URL schemes inlink/imageand gate or sanitizerawHtml(allowlistsvg/mathonly, which is the codec's stated intent at HtmlCodec.cpp:150). At minimum add a comment at Site.cpp:210 noting the body is emitted unsanitized.
[MEDIUM] completeness — heading-id derivation and TOC only scan top-level blocks
- Where:
deriveHeadingIds(Site.cpp:75-81) andheadings(Site.cpp:85-91) both iterateorderedChildren(doc, documentRoot())only. - Problem: HtmlCodec nests headings inside
note,blockquote,figure, and list items (HtmlCodec.cpp parses blocks recursively). A level-2 heading inside, e.g., a<div class="note">gets no derivedidand never appears in the on-this-page TOC, so its anchor link is dead and the IntersectionObserver spy (Site.cpp:231, observes.content h2[id]) ignores it. Silent, content-dependent. - Fix: Recurse into child blocks when deriving ids and collecting headings (or document that only top-level headings are TOC-eligible by design). A shared recursive walk would serve both functions.
[MEDIUM] completeness — partial writes leave a half-built site on first error; some failures unreported
- Where:
buildSiteloop returns 1 on the first bad/missing doc (Site.cpp:170, 172, 175, 234) after already writing earlier pages; asset/css copy ignoresec(Site.cpp:240-243). - Problem: A failure on page N leaves pages 0..N-1 written to
outDirwith no rollback and a nonzero exit — a deploy script that ignores exit code (or a re-run) can publish a partial site. Thestyle.css/assetscopy swallows itsstd::error_codesilently: a missing or unreadablestyle.css(Site.cpp:240, guarded byif (auto css = ...)) and a failedassetsrecursive copy (Site.cpp:242-243) produce no warning, so an unstyled site builds "successfully". - Fix: Either validate/parse all docs before writing any output, or write to a temp dir and swap on full success. Warn (at least to stderr) when
style.cssis absent or theassetscopy setsec.
[LOW] separation — render, templating, and file I/O share one translation unit
- Where: Site.cpp (249 lines): file I/O (
readFile/writeFile), pure transforms (slugify,plainText,groupCodeTabs,esc), model derivation (deriveHeadingIds,headings), and the large inline HTML-templating block inbuildSite(Site.cpp:177-231) are all together. - Problem: Not a god file at this size, and the layering vs. kinogaki-codecs (which owns Document parsing + HtmlCodec body rendering) is genuinely clean. But the page-chrome HTML is a 50-line raw-string
ostringstreamwall embedded mid-loop, mixing structure, derivation, and presentation; it is also the hardest part to test because it is not a separable function. - Fix: Optional. Extracting the chrome into a
renderPage(const Page&, body, toc, prev, next) -> std::stringfree function would isolate the template from the build loop and make it unit-testable (supports the tests finding above). Pure transforms could move to aRender.{h,cpp}if the file grows.
[LOW] naming — clean
- Where: Site.cpp.
- Problem: None blocking. Functions read as verbs or are acceptable:
readFile,writeFile,buildSite,slugify,deriveHeadingIds,groupCodeTabs.esc(Site.cpp:42),href(Site.cpp:50),plainText(Site.cpp:53), andheadings(Site.cpp:85) are noun/abbreviation-named functions that return a value rather than reading as verbs — minor. (Page/Headare types, exempt.) - Fix: If touched, rename to verbs:
esc→escapeChrome,href→hrefFor,plainText→flattenText/collectText,headings→collectHeadings. Low priority.
[LOW] filesize — clean
- Where: all sources.
- Problem: No file exceeds the thresholds.
src/Site.cpp249,src/Site.h19,src/main.cpp17 lines. Nothing over 800, nothing over 1000. - Fix: None.
[LOW] organization — committed build/ dir present in tree, thin README build section
- Where:
build/directory (containskinogaki-siteand a staleprisma-sitebinary) on disk;.gitignore=build/; README.md "Build". - Problem: The
build/dir holds binaries including aprisma-siteartifact from the pre-rename era (Prism→kinogaki); it is correctly gitignored and NOT tracked (git ls-filesconfirms only source is committed), so this is hygiene, not a committed-artifact problem. The README documents build + usage well but has no Layout/structure section and (relatedly) nothing on testing — appropriate to add once tests exist. The staleprisma-sitebinary is dead/orphaned and should be removed from the working tree. - Fix:
rm build/prisma-site(and optionally cleanbuild/between renames). Add a short "Tests" section to README once a test target lands.