Be Civic — Privacy and PII Protection
Canonical system specifications for the Be Civic project.
Be Civic — Privacy and PII Protection
This sub-spec covers every privacy and PII protection mechanism in Be Civic: the trust boundary between consumer and server (§8.1), the submission contract (§8.2), the receiving-end ingestion pipeline and its validation steps (§8.3), the Cloudflare reference implementation and its security details (§8.4), the NER-on-commit held-for-review path (§8.5), the incident response for PII that slips through all gates (§8.6), the consumer-side state contract and its 16-axis profile.json catalogue (§8.7), retention and deletion semantics for local state (§8.8), the document-content-discard rule (§8.9), and the anonymous-by-construction structural reinforcement rules (§8.10).
PII protection in Be Civic is structural, not promissory. The schema-level field bans (§6.2 in schemas.md), the scrub rules file (§6.8 in schemas.md), and the mechanisms in this section together form a layered defence. For the promotion thresholds and rollback mechanics that also interact with IP-hash salting, see lifecycle.md.
8. PII protection
8.1 Trust boundary
The submitting (consumer-side) agent is the only entity that knows what is identifying for the user it serves. The receiving end has no user context and cannot perform context-aware scrub. Consequently:
- Primary scrub: consumer-side, including any LLM-based judgment
- Defence in depth: receiving-side, deterministic only (regex + NER, no LLM)
This placement of the LLM gate at the consumer end (and only there) is non-negotiable. The receiving end never runs an LLM on submission content — eliminating prompt-injection surface, API key dependencies, and per-submission cost.
PII is structurally prevented from reaching the corpus by:
- Schema-level ban on identity-shaped fields (per §6.2 (see schemas.md))
- Hard length caps on free-text fields (per §6.2 (see schemas.md))
- Three-stage scrub: consumer pre-flight, Worker hard-gate, NER on commit (per §6.8 (see schemas.md))
- Salted hashed per-IP correlation only (daily-rotating salt for rate limits; per-proposal salt for state-machine bookkeeping); no plaintext IP storage (per §3 (see architecture.md) principle 4)
- No request-body logging (per §3 (see architecture.md) principle 4)
8.2 Submission contract — global, versioned
The submission contract is a global document at docs/submission-contract-v<N>.mdx. Every skill carries submission_contract_version in frontmatter pointing at the version the consuming agent must follow when submitting. The contract content lives in the contract file, which is being authored in parallel to this rework; this section describes the contract's role and structure.
Contract role. The contract is the single source of truth for what a consumer AI must do at session start, before submitting any of the four submission types, and after submitting. It is canonical — skills do not paraphrase. Per-skill overrides are permitted but must be additive, not replacement.
Contract structure (per the parallel rewrite):
- Session start (one-off framed message; opt-out semantics; conversation-language detection)
- Capability self-classification (against §6.7 (see schemas.md) tiers; recommend stepping up if below; advice-only mode otherwise)
- Pre-flight validation (consumer-side scrub: regex + LLM contextual; cross-ref script via
tool_executionwhen capable; rules checklist when not) - Submission type sections (one per type — observation, skill_amendment, skill_draft, validation — covering when to submit which, schema details, reference assembly)
- Alpha / beta UX (banner copy; first-validator transparency wording per G.8 / G.9; falling back to previous stable on rejection)
- Cancellation (DELETE within 24h; multi-device gap acknowledged)
- Submissions log: project-local at
<output_dir>/.be-civic/submissions.jsonlwhen the agent is writing files for the task, else<USER_DATA_DIR>/be-civic/submissions.jsonlas fallback (user can review and cancel) - Capability-mismatch and filesystem-less behaviour (advice-only mode)
- Language handling (skills read in English; user-facing prose in conversation language; citations resolved per G.13 multilingual rules; commune correspondence language is the user-choice exception)
Alpha / beta UX excerpts (canonical wording lives in the contract; reproduced here so the spec is self-contained):
When loading an alpha skill that has a previous stable (G.9):
"Note: I'm using an alpha version of this skill — meaning a recent change is still being validated. Your session helps validate it. If anything goes wrong with the new content, I'll fall back to the previous stable version (last verified [date])."
When loading a brand-new alpha skill with no previous stable (G.8):
"This skill is brand-new and unvalidated — your session is among the first to use it. We'll proceed with low confidence and I'll flag anything that doesn't match what you experience. If something fails, we have nothing to fall back to except checking with the relevant authority directly."
The agent files higher-grade observations and a validation event at session end on brand-new alpha.
8.3 Receiving-end ingestion pipeline
The receiving end is not GitHub Issues. Submissions go to a staging service that holds them privately for 24 hours before committing. This avoids requiring GitHub accounts and gives genuine cancellation semantics.
Endpoint table (post-2026-05-15 taxonomy normalization):
| Endpoint | Purpose | Required capabilities (consumer must self-declare) |
|---|---|---|
POST /api/feedback |
Recommended primary: polymorphic envelope; submit one or more items in a single request | union of per-item types' capabilities |
POST /api/concerns |
Submit a concern (per-type escape hatch; renamed from /api/observations) |
multi_turn, structured_output |
POST /api/amendments |
Submit an amendment (now covers skill / volatile_value / reference / path / path_source via target_type; renamed and unified from /api/skill-amendments + /api/path-amendments) |
multi_turn, structured_output (+ web_fetch, tool_execution for target_type=skill / path / path_source) |
POST /api/drafts |
Submit a draft (now covers skill + path via target_type; renamed and unified from /api/skill-drafts + /api/path-drafts) |
multi_turn, structured_output, web_fetch, tool_execution, file_read |
POST /api/validations |
Submit a validation (polymorphic over all six target_type values; absorbs the prior /api/path-validations) |
multi_turn, structured_output (+ web_fetch, tool_execution for non-observation target_types) |
POST /api/feedback-channel |
Submit a feedback (new free-text channel; operator-private triage in v1) |
multi_turn, structured_output |
POST /api/ratings |
Submit a rating (sprint 2026-W23 Lock A; opt-in three-axis stars) |
multi_turn, structured_output |
POST /api/analytics |
Submit analytics (opt-in session lifecycle telemetry) |
multi_turn, structured_output |
DELETE /<type>/{id} |
Cancel a staged submission | bearer cancel_token |
GET /<type>/{id} |
Status query (no body content) | none |
GET /api/feedback/sessions/<session_id> |
List committed items under that session_id (anonymous read; recovery key per S61 reversal) | none |
GET /api/skills/<id>/concerns |
RESTful alias for GET /api/concerns?skill=<id> (renamed from /api/skills/<id>/observations) |
none |
Legacy routes removed (pre-launch hard cutover). POST /api/observations, POST /api/skill-amendments, POST /api/skill-drafts, POST /api/path-amendments, POST /api/path-drafts, POST /api/path-validations are deleted in the same PR that adds the new ones. No aliases, no 30-day grace. Pre-launch context: there is no installed base of agents calling the legacy routes.
8.3.0a Recommended primary: POST /api/feedback
POST /api/feedback is the recommended primary submission surface. It accepts a polymorphic envelope carrying one or more items across the five feedback types + rating in a single request, allowing agents to batch a session's submissions into one round-trip rather than separate per-type posts. The per-type endpoints above remain operational (escape hatches); the envelope is the primary tool.
Schema URL: https://becivic.be/schemas/feedback-envelope.schema.json.
Envelope shape. Top-level fields:
schema_version: 1session_id— agent-chosen UUIDv7 (ses_<uuid>); also the recovery key forGET /api/feedback/sessions/<id>. Per the 2026-05-15 S61 reversal,session_idis the recovery key end-to-end; the recovery_token component is dropped.submitted_at— ISO-8601 envelope-level timestamp (per-itemsubmitted_atis permitted and overrides for that item)submitting_agent,submission_contract_version,declared_capabilities— moved up from per-item; declared once per envelopemode: "validate" | "stage"— controls the per-item pipeline (see below)items[]— array of per-item submissions; each item carries atypediscriminator (concern|amendment|validation|draft|feedback|rating) and the type-specific body.contextis per-item, not envelope-level. (Pre-2026-05-15 the discriminator enum wasobservation | validation | skill_amendment | skill_draft; rename per the taxonomy normalization.analyticsis NOT in the envelope — analytics has its own dedicated endpoint.)
Two-call pattern (mode). The contract is a deliberate two-call flow:
- Validate first (
mode: "validate"): runs the full validation pipeline per item — schema, identity-field guard, capability tier, regex scrub, cross-ref against canonical state — but does not stage. Each item returns{idx, type, ok, status: "validated", would_stage_for: <commit_eta>}on pass or{idx, type, ok: false, status: "rejected", error, schema_pointer, missing}on fail. - Stage (
mode: "stage"): runs the full pipeline including stage / commit. Per-item response is the same shape as the per-type endpoints —{idx, type, ok, status: "staged", id, cancel_token, commit_eta}forconcern/amendment/draft/feedback/rating;{idx, type, ok, status: "applied", id, applied_at}forvalidation(which writes directly to D1, per §6.2.3 (see schemas.md));{idx, type, ok, status: "duplicate"}on idempotent re-POST.
?dry_run=1 query alias. ?dry_run=1 is a backwards-compat alias for mode: "validate". If both query and body are present, the body's mode wins.
Per-item independence. The HTTP response is always 200 with {results: [{idx, type, ok, status, ...}]} whenever the envelope itself is well-formed — even when individual items failed schema, identity, capability, scrub, or cross-ref. Each item is processed independently; one item's rejection does not abort the others. Envelope-level 4xx is reserved for: malformed JSON (400), missing top-level envelope fields (400 with error: "schema_fail", missing: <field>), failed auth, or rate limit (429). Empty items: [] is permitted and returns {results: []} with HTTP 200.
Per-item idempotency. Per-item dedup keys off the type-specific id field (concern_id, validation_id, amendment_id, draft_id with proposed_id, feedback_id, rating_id). Resubmitting the same envelope (or any envelope containing an item with a previously-seen id) returns {idx, type, ok: true, status: "duplicate"} for that item — no double-stage, no double-D1-INSERT. Idempotency is per-item.
Pre-launch hard cutover. No backward-compatibility shim for the pre-amendment items[].type enum (observation | skill_amendment | skill_draft | path_amendment | path_draft); the dispatcher's ITEM_TYPES set is rewired to the new 6-type enum (concern, amendment, validation, draft, feedback, rating) in the same PR that lands the new schemas. Legacy item types fail schema_fail at the gate after cutover.
Validation pipeline at submission (in the Worker):
- Parse JSON from request body
- Schema validation against the appropriate schema (
concern/amendment/draft/validation/feedback-channel/rating) - Identity-field ban check — reject if any identity-shaped field is present (§6.2 (see schemas.md), defensive even if not declared in the schema)
- Capability check — declared capabilities must include all required for the endpoint per §6.7 (see schemas.md); reject 4xx on miss
- Regex scrub — apply every rule in
tools/scrub/regex-rules.jsonto every string field; reject 4xx on any hit - Cross-reference validation against canonical state (Worker fetches required canonical resources from latest
mainand queries D1 catalogues as needed):context.commune(when non-null) resolves to an entry indata/communes.json(concerns). The field is now optional — concerns not bound to a specific commune leave it null.target_idresolves against the appropriate target pertarget_type(see §6.2 (see schemas.md) resolution table):skill→skills/<skill_id>/canonical.mdonmain;volatile_value/reference→ D1 rows;path/path_source→bc-docs/paths/index.jsoncatalogue (lookup keys per §6.12.7 —<path_id>forpath,<path_id>:<source_id>forpath_source);observation(validation only) → D1concernsrow.skill_graphcarve-out: whentype=concernANDtarget_type=skill_graph, the resolver short-circuits with{ok: true, resolved_to: "skill_graph_assertion"}— target_id MAY be empty or a proposed kebab-case skill_id that need not resolve. Cross-ref rejects all other target_types whose target_id fails to resolve.context.applies_to_matchkeys are a subset of the referenced skill'sapplies_tokeys- For
amendment(target_type=skill), the per-amendment_subtypechecks (per §6.2.2 (see schemas.md)):body—body_diffparses as unified diff and applies cleanly against the target skill's current canonical body (skill_commitdrift check);frontmatter—frontmatter_change.field_pathresolves to a valid field in the target skill's frontmatter schema andproposed_valuematches the declared type. - For
amendment(target_type=path | path_source), the per-amendment_subtypechecks:field_edit—field_pathresolves to a valid field in the target path / source schema;source_add— thesource_addobject validates againstpath-source.schema.jsonwith the matching per-source_classtemplate branches per §6.12.3. - For
draft: cross-ref script (validate-cross-refs.ts) runs as backstop on the proposed frontmatter andrequiresgraph (target_type=skill) or path / source schema (target_type=path); tag uids are left empty by the consumer and filled by PR-CI on the resulting PR (§6.11 (see schemas.md)). Theproposed_idMUST NOT already exist as a live artefact (the inverse of the standard "must exist" rule). cohort_anchorWorker-stamp. Between step 6 (cross-ref) and step 7 (timing): forconcern/amendment/validationwithtarget_type ∈ {skill, path}, the Worker reads the currentversion:from the targeted canonical and writescohort_anchor: <target_id>@<version>onto the staged row. Agents never carry this field; the schema rejects agent-suppliedcohort_anchorasadditionalProperty. Per C1.
- Self-validation prevention (validations only) — Worker fetches the target artefact's submitter-IP-hash (using the per-artefact salt for the artefact's table); reject 4xx if it matches the validator's IP hash (per G.7). For
target_type='observation', the lookup is against the concern's own submitter-IP-hash (the upvoter-of-own-concern case). Fortarget_type='path_source'the per-artefact salt is scoped to the path, not the individual source row — the lookup key is<path_id>extracted from the<path_id>:<source_id>target_id (see §6.2.3 inschemas.mdfor rationale and the path-creator-salt KV pattern) - Per-IP rate limit check (per G.6):
validationsubmissions: 10/IP/dayvalidationwithinjection_flag: true: 2/IP/day- All submission types combined: 50/IP/day
- Above threshold → 429 with
Retry-Afterheader
On submission pass:
- Dedup check. If
<type>_idalready exists in KV (or in the recently-committed cache, retained 48h), return the existing record (idempotent re-POST). On dedup, bind to the original submitter's IP-hash; if mismatch, return 409 with{error: "duplicate_id_different_submitter"}(per A.1 default). - Otherwise: generate
cancel_token— 32 random bytes, base64url 43 chars no padding. Store{payload, submitted_at, commit_eta = max(submitted_at, received_at) + 24h, cancel_token_hash, submitter_ip_hash}in KV (per A.8 server-stampedreceived_at). Reject ifsubmitted_atis >1h ahead of server clock or >7d behind. - Write a primary KV entry keyed by
<type>_id(TTL ~48h) and a commit-eta index entry keyed bycommit:<commit_eta_iso>:<type>:<id>. The cron job scans the index, not by relying on TTL expiry. - Return
{<type>_id, cancel_token, commit_eta, staging_window_hours: 24}to the consumer.
On submission fail:
- Return 4xx with
{error: <category>, schema_pointer: <if applicable>}— naming the category, never echoing the matched substring - Increment per-IP rejection counter
- No data persisted in the staging KV
- Worker logging discipline: the Worker MUST NOT log request bodies or rejection-detail substrings. Permitted log fields at INFO:
<type>_id, rejection category, response status, request duration. Plaintext IP is NEVER logged; rate-limit counters key onsha256(ip + daily_salt)(per G.14, principles 4 and 5)
Cancellation:
DELETE /<type>/{id}withAuthorization: Bearer <cancel_token>— Worker constant-time-hashes the supplied token and matches against stored hash; on match, deletes both KV entries; returns{cancelled: true}- Token mismatch returns 401 (not 404 — don't reveal whether the id exists)
- KV partition / unreachable returns 503 with
{error: 'staging_unavailable', retry_after} - Cancellation is irreversible
- Consumer DELETE retry policy: queue at
<output_dir>/.be-civic/cancel-retry/<id>.jsonwhen the agent is writing files for the task, else<USER_DATA_DIR>/be-civic/cancel-retry/<id>.jsonas fallback; exponential backoff (60s start, 1h ceiling, full jitter); hard deadline atcommit_eta - 60s
Status query (GET /<type>/{id}, optional):
- If primary KV entry exists:
{state: "staged", commit_eta}(no body content) - If primary KV gone but recently-committed cache hits:
{state: "committed", committed_at} - For NER-held submissions:
{state: "ner_held_for_review"}— and on resolution:{state: "released" | "released_after_edit" | "discarded"}(per G.14, principle 1) - For artefacts pushed into quarantine by a validation's
injection_flag:{state: "quarantined", target_type, target_id}(validations only; relevant when a validation triggered quarantine of its target) - Otherwise 404
Status writeback (PATCH /<type>/{id}/status):
Used by the ner-on-commit GitHub Action and by maintainer-resolution tooling to update a submission's lifecycle state in KV so the status endpoint reflects the held/released/discarded outcome.
- Auth:
Authorization: Bearer <installation_token>where the token is minted by the Be Civic GitHub App. The Worker validates the token by callingGET https://api.github.com/appand confirming the returnedidmatchesGITHUB_APP_ID. No new secret required. - Body:
{"state": "<target>"}(JSON,Content-Type: application/json). - Allowed transitions (all others rejected with 400
invalid_transition):staged→ner_held_for_review(NER flags a staged submission)committed→ner_held_for_review(NER flags a just-committed submission)ner_held_for_review→released(maintainer: false positive)ner_held_for_review→discarded(maintainer: real PII)
- The endpoint MUST NOT allow
staged→committed(commit is cron-driven only). - On transition to
ner_held_for_review: the primary KV record in STAGED_SUBMISSIONS is re-PUT withoutexpirationTtl, making it permanent. This ensuresGET /<type>/{id}returns the held state after the original 48h staging window expires. - On
committed→ner_held_for_review: a synthetic StagedRecord is created in STAGED_SUBMISSIONS (permanent) from the COMMITTED_CACHE entry, since the original staged record no longer exists. - On
released/discarded: the state field is updated on the permanent record. - Response:
{state: "<new_state>", previous: "<old_state>"}with HTTP 200.
Commit / PR job (scheduled Worker, every 5 minutes):
- Read commit-eta index for entries where
commit_eta <= now - For each ready entry: read primary entry; populate Worker-set fields (
validated_at,regex_passes,cohort_anchor); route per submission type:- Concerns — INSERT into D1
concernstable (renamed fromobservationsper the 2026-05-15 amendment); D1 auto-assignscon-NNNNNuid; concern becomes visible via<Observations>aggregator on the next renderer build (or via fetch-time resolution if rendered on demand). The aggregator walks the skill body's<VV>/<Ref>/<Path>inline tags and surfaces concerns against the skill itself AND every catalogue / path / source uid the body cites (§6.10 (see schemas.md)). - Amendments —
target_type=skill— open a PR againstmainapplying the change toskills/<target_id>/canonical.md. PR-CI runs validators and orchestrates uid assignment for any newly-introduced<VV>or<Ref>tags. Auto-merge on green. - Amendments —
target_type=path | path_source— open a PR againstmainapplying the change tobc-docs/paths/index.json. PR-CI runs validators (path.schema.json/path-source.schema.json, source-class template check). Auto-merge on green. - Amendments —
target_type=volatile_value | reference— D1 INSERT-with-supersede directly; NO PR. The state-machine cron reads the threshold table (§9.2 (see lifecycle.md)) and either supersedes the prior row or rolls the amendment back. Fast-path semantics preserved. - Drafts —
target_type=skill— open a PR creatingskills/<proposed_id>/canonical.mdatstatus: alpha. PR-CI runs validators; the maintainer reviews and merges (S31). - Drafts —
target_type=path— open a PR inserting the new entry intobc-docs/paths/index.jsonunderpaths.<proposed_id>atstatus: alpha. PR-CI runs validators; the maintainer reviews and merges (parallel rule to S31). - Validations — INSERT into D1
validationstable immediately on submission (no 24h staging; see §6.2.3 (see schemas.md)). The state machine queries D1 aggregates on its next tick. - Feedback — INSERT into D1
feedback_channeltable at commit time. No PR; no public surface; the operator reads the triage queue out-of-band. - Ratings — INSERT into D1
ratingstable at commit time. Aggregates surface in<CohortStats>on the skill canonical (fortarget_type=skill) or in the operator-private analytics surface (fortarget_type=agent_protocol | session). See §6.2.7 (see schemas.md).
- Concerns — INSERT into D1
- On commit / PR open success: write
{<type>_id, committed_at_or_pr_opened_at}to the recently-committed cache (48h TTL); delete primary entry; delete commit-eta index entry. - On commit / PR-open failure: leave the entry in place; the next run picks it up. Stamp
last_commit_attempt. Retry indefinitely; alarm fires if commits stuck >24h beyondcommit_eta.
Concurrency. Cloudflare KV is eventually consistent globally, but the scheduled Worker runs in a single region; race conditions across Worker instances are unlikely. Idempotent commit semantics (dedup at submission, recently-committed cache at GET) make duplicate commits a non-issue.
KV TTL note. Cloudflare KV TTL is "soft" — entries are eventually purged but not at exactly TTL time. The commit-eta index pattern above does not rely on TTL expiry for triggering commits.
Bulk read of canonical files is via Git / GitHub Contents API on the relevant directory; the Worker's HTTP API is submission-only and does not expose listing.
8.4 Cloudflare reference implementation
The v1 staging service runs as a Cloudflare Worker (source in bc-infra/api/) for the four POST endpoints, plus a separate scheduled Worker (source in bc-infra/tools/staging-worker/) for the commit cron job. Both share a single Cloudflare account, D1 database, and GitHub App. The becivic.be/ apex is served by the renderer Worker (bc-infra/site/renderer/, per §20 (see website.md)); the apex router (bc-infra/site/router-worker.js) routes /api/* to the staging Worker and everything else to the renderer.
Renderer integration. The renderer pulls from the bc-docs source tree at deploy time, builds dist/, and is bound as Cloudflare Workers Static Assets:
becivic.be/— marketing landing rendered frombc-docs/index.mdxbecivic.be/agents— agent overview frombc-docs/agents.mdx(~40 lines after S52 implementation); per-endpoint pages at/agents/submit/*; machine-readable manifest at/agents/manifest.json(per §13.1 (see architecture.md), G.12)becivic.be/skills/<id>— skill bodies served fromskills/<id>/canonical.md. One canonical URL per skill across allstatusvalues; thestatusfrontmatter field drives an in-page banner (§6.1 (see schemas.md)) when the skill is atdraft,alpha, orbeta.becivic.be/docs/submission-contract-v<N>— submission contract pagebecivic.be/llms.txt,becivic.be/llms-full.txt— emitted by the renderer build pipeline;docs.json.descriptioninjects "AI agents: read /agents before anything else." (per G.4b)mcp.becivic.be— separate MCP Worker (§23 (see protocol.md)) exposing the API surface as ~6 intent-oriented tools
Non-stable skill pages carry noindex: true when status ≠ stable (the renderer build injects noindex: true into the rendered HTML head based on the frontmatter status field) so search engines index only stable content.
Cloudflare carries four roles: (a) the renderer Worker at bc-infra/site/renderer/ (Workers Static Assets binding) serves all human-facing paths; (b) the apex router Worker at bc-infra/site/router-worker.js path-routes between the renderer (default) and the staging Worker (/api/*); (c) the staging service Worker at bc-infra/api/ handles the four POST endpoints, the DELETE, and the GET status, plus D1 access for catalogues and signals; (d) the scheduled Worker at bc-infra/tools/staging-worker/ handles the commit cron job. The MCP Worker at mcp.becivic.be is independently routed via DNS subdomain.
Repo layout for serving:
site/ # Cloudflare Workers Static Assets — marketing landing + apex router
├── index.html # bespoke marketing landing (served at /)
├── style.css
├── fonts/ # self-hosted Manrope woff2 (700, 800), latin subset
├── logo/ # light.svg, dark.svg
├── favicon.svg
├── router-worker.js # routes /api/* to staging Worker; everything else to renderer Worker
└── wrangler.toml # Workers Static Assets binding + routes config
api/ # Cloudflare Worker — staging service for /api/*
├── worker.ts # entry point
├── wrangler.toml
└── routes/
├── observations/
│ ├── index.ts # POST handler
│ └── [id]/
│ ├── index.ts # DELETE handler
│ └── status.ts # GET status
├── skill-amendments/{...} # same pattern
├── skill-drafts/{...}
└── validations/{...}
tools/staging-worker/ # scheduled Worker for the commit job
├── worker.ts # cron handler
├── wrangler.toml
└── README.md
Amendment materialisation (fetch-then-materialise). When the cron commit job processes a skill_amendment record, it fetches the canonical skill body from GitHub (skills/<target_skill_id>/canonical.md via the Contents API with the installation token) before calling buildCommitTarget. The fetched content is passed as canonical_content, enabling buildCommitTarget to materialise the post-amendment proposal.md as a full renderable skill file (frontmatter + applied body), identical in shape to skill_draft proposals. The .meta.json sidecar preserves the original amendment payload (body_diff / frontmatter_change / references_change) for audit. If the canonical file is not found (404), the cron loop logs a structured error (canonical_not_found) and leaves the staged record untouched for operator investigation — no destructive deletion. Transient fetch failures are treated identically: the record stays in place and the next cron tick retries.
Workers Static Assets and the staging Worker auto-deploy from GitHub on push to main that touches site/ or api/ respectively. Scheduled Worker deploys via GitHub Action on push to tools/staging-worker/.
Secrets:
- Cloudflare-side:
GITHUB_APP_ID,GITHUB_APP_PRIVATE_KEY,GITHUB_APP_INSTALLATION_ID - GitHub-side (Worker deploy):
CLOUDFLARE_API_TOKEN,CLOUDFLARE_ACCOUNT_ID
The GitHub App has Contents: Read & Write permission only; installed once on the repo by the maintainer; private key in Cloudflare secrets. The Worker mints short-lived installation tokens (1h TTL) and commits via the GitHub API.
Rate-limit thresholds (initial values, tunable):
- All submission types combined: 50/IP/day (per G.6)
validationsubmissions: 10/IP/dayvalidationwithinjection_flag: true: 2/IP/day- Per-IP burst (any type): 60 submissions / hour rolling
- Worker-global submission rate: 1000 / hour (trip-wire for "something is wrong")
Counters live in KV namespace RATE_LIMITS, keyed by sha256(ip + daily_salt), with rolling-window TTL (per A.9 default).
URL structure:
https://becivic.be/ <- marketing landing (renderer; index.mdx)
https://becivic.be/agents <- agent entry overview (renderer; agents.mdx; ~40 lines + manifest.json + per-endpoint pages — §13.1)
https://becivic.be/agents/manifest.json <- machine-readable agent capability + endpoint manifest
https://becivic.be/agents/submit/<type> <- per-endpoint reference page (renderer)
https://becivic.be/skills/<id> <- skill body at its current status (renderer; canonical.md; banner inferred from `status` frontmatter when not stable)
https://becivic.be/docs/submission-contract-v<N> <- contract (renderer)
https://becivic.be/llms.txt <- renderer build emits
https://becivic.be/llms-full.txt <- renderer build emits
https://becivic.be/api/observations <- Cloudflare; POST submit observation
https://becivic.be/api/observations/<id> <- Cloudflare; DELETE / GET status
https://becivic.be/api/skill-amendments <- Cloudflare; POST
https://becivic.be/api/skill-drafts <- Cloudflare; POST
https://becivic.be/api/validations <- Cloudflare; POST
https://becivic.be/scrub-rules.json <- canonical regex-rules.json (renderer-served)
https://becivic.be/communes.json <- canonical data/communes.json
https://mcp.becivic.be/ <- MCP server (§23); ~6 intent-oriented tools
The skills index entry for each skill carries the commit field (git short SHA) for build-time reproducibility. Concerns no longer require this; the read-side discovery endpoint is GET /api/skills/<id>/concerns (renamed from /observations); GET /api/skills/<id>/history (already shipped) returns the commit timeline.
Substrate-agnosticism preserved. The protocol in §8.3 specifies the interface, not the implementation. Anyone running their own be-civic fork can swap the Worker for any other backend that implements the same endpoints; the rendering layer is replaceable by any static-site generator that supports content versioning and the Be Civic MDX subset. Skills reference the staging URL via a single declared constant in the contract.
8.4b Internal endpoint: artefact-stats (distinct-IP counts)
GET /api/_internal/artefact-stats?target_type=<t>&target_id=<id> — returns the number of distinct validator IPs recorded for an artefact, along with how many of those carried an injection flag.
- Auth: GitHub App installation token (same path as the status-writeback endpoint, §8.5). No new secret.
- Query parameters:
target_type ∈ {skill, volatile_value, reference, observation, path, path_source};target_idmatches the appropriate id format (<kebab>for skills + paths,<prefix>-NNNNNfor catalogue rows + concerns,<path_id>:<source_id>for path sources). Enum extended to includepathandpath_sourceper the 2026-05-15 amendment. - Response (200):
{ "distinct_ips": <number>, "distinct_ips_with_injection_flag": <number> } - Data source: D1
validationstable aggregated on(target_type, target_id)with the per-artefact-salted IP hashes for distinct counting. Per-artefact salt ensures the hash is stable across the artefact's lifetime but unlinkable to any other artefact or the daily rate-limit salt. - Consumed by:
tools/scripts/state-machine-tick.ts(the state-machine bot). If D1 is unreachable, the script logs a structured warning to stderr and falls back to the local per-row distinct count (conservative: never false-promotes).
8.5 Commit-side defense in depth (NER) — held-for-review path
Cloudflare Workers don't run Python/spaCy, so Presidio NER doesn't fit at the Worker layer. NER runs as a separate GitHub Action triggered on commits touching submission paths.
Per G.14 principle 1, NER detection is not auto-revert. It routes to the same human-review queue that handles injection-flag quarantines (§G.6). One review path, two flag types.
- The NER step runs in two contexts: (a) at D1 INSERT for concerns and validations and ratings (the Worker invokes the NER service before commit; on flag, the row is moved into a held-for-review table); (b) at PR-CI for
draftandamendmentsubmissions withtarget_type ∈ {skill, path}(the action runs Presidio on the changed prose; on flag, PR-CI fails the PR with a "held for review" status, and the operator-review queue picks it up). - Re-validates each newly-added record: schema + Presidio NER (multilingual FR/NL/DE/EN) on every freeform string field
- Flags PERSON entities; ORG/LOC are allowed (public entities). URL fields validated as URLs; URLs in submissions matched against the canonical allowlist (
schemas/source-classes.jsonprimary or secondary tier) - On NER fail: the just-committed file is not auto-reverted. Instead:
- The submission is moved to a
held-for-review/<type>/<id>/directory where<type>is one of the new feedback type names (concerns,amendments,validations,drafts,feedbacks,ratings) and<id>is the id-prefix-stripped uid (e.g.held-for-review/concerns/00873/for a concern uidcon-00873; renamed from the pre-amendmentheld-for-review/observations/<obs_uid>/shape — still inmain, still public — but flagged indocs.jsonso the renderer surfaces a "held for review" banner; agents that fetch via API seestate: "ner_held_for_review"from the status endpoint) - Status writeback: the
ner-on-commitworkflow callsPATCH /api/<type>/{id}/statuswith{"state": "held_for_review"}using a GitHub App installation token (minted fromSTAGING_APP_*secrets). This flips the KV record's state toner_held_for_reviewand re-PUTs it without TTL, ensuring the status endpoint returns the held state indefinitely (see §8.3 status writeback). - An entry is appended to
runtime/ner-review-queue.log(gitignored Action-side; uploaded as a Workflow Artifact, 90-day retention) - A maintainer-review issue is opened with a structured payload (no PII echoed in the issue title; the file path is enough to locate it)
- The submission is moved to a
Maintainer review outcomes (each triggers a PATCH /api/<type>/{id}/status call):
- Released — false positive (e.g., a Belgian person-shaped commune name like "Saint-Gilles"). File moved back to its canonical path; status flips to
releasedviaPATCHwith{"state": "released"}. - Released after edit — minor PII present that can be scrubbed cleanly without changing meaning. Maintainer edits, commits, status flips to
released_after_edit. (Note:released_after_edittransition is deferred to a follow-up; currently treated asreleased.) - Discarded — real PII. File deleted from
main; status flips todiscardedviaPATCHwith{"state": "discarded"}. Submitter learns via status endpoint. Public corpus is unaffected.
This makes the privacy claim structural, not promissory: PII never reaches the public corpus without human eyes when NER flags. (Per G.14.)
Counter-note: race window. A few seconds may elapse between the commit landing on main and the Action moving it to held-for-review/. This window is small and the file is still flagged proposed (no consumer would discover it in time). The renderer build only runs after the Action settles. Documented in docs/threat-model.md as an accepted residual risk; see also G.14 implications.
8.6 Incident response — PII slipped through despite all gates
If PII is later discovered in a committed file (made it past consumer-side scrub, Worker regex, NER on commit, AND maintainer review of NER-held), the only correct response is destructive history rewrite via git filter-repo. This is documented in docs/retraction-protocol.md, requires explicit maintainer acknowledgement, and is disruptive (force-push to main, all consumers must refresh). It is not an automated path. Pre-emption (the four scrub layers + maintainer review) is the protection; rewrite is the last resort.
For consumer-detected issues during the staging window, the consumer should call DELETE /<type>/{id} with the cancel_token — no incident response needed, the submission never reaches the repo.
8.7 Consumer-side state contract
8.7.1 Design principle
Consumer-side state is the mechanism by which Be Civic provides continuity across sessions without requiring any server-side user store. The privacy property is structural: because no state is server-held, there is no central store that could be subpoenaed, breached, or repurposed. The tradeoff is that state portability and backup are the customer's responsibility, not Be Civic's.
This section defines what MAY be stored locally, in what shape, and under what constraints. The harness implementation obligations (how to read, write, and scaffold this layout) belong to the C4/§15c amendment and are not repeated here.
What counts as consumer-side state. A storage location qualifies as consumer-side state for the purposes of §3 (see architecture.md) principle 11 only if all of the following hold:
- The customer can read the full contents directly with standard tools (text editor,
cat, file browser), without invoking Be Civic or any vendor-mediated UI. - The customer can delete the contents unilaterally as a single artifact (
rmthe file or directory), without friction and without intervention from Be Civic or any third party. - The harness agent can both READ and WRITE the contents in-session. Read-only access from the agent's side is insufficient: the customer-side state contract requires the agent to update
profile.jsonandmemory/files in place during a session, and the customer's expectation is that those updates persist.
A host filesystem directory (~/.be-civic/) satisfies all three clauses. Vendor-managed key-value stores [e.g., Project Memory in Anthropic platforms] do NOT satisfy clause 1 (the customer cannot inspect the store as a single artifact) and do NOT satisfy clause 2 (deletion is mediated by the vendor UI and may retain residual traces). Read-only file surfaces [e.g., free-tier Chat in Anthropic platforms, with the Be Civic Project installed] satisfy clauses 1 and 2 but FAIL clause 3 (the agent cannot write back to the files).
For v1, only T2 and T3 (Claude Desktop Cowork tab with ~/.be-civic/ as a connected folder, per C4 amendment §24.4 (see architecture.md)) provide qualifying consumer-side state. T0 and T1 are stateless from the customer's perspective and the harness MUST NOT promise cross-session memory at those tiers.
Paths-related state and the three-clause test. State derived from path traversal — for example, a customer's requires_paths-derived progress markers, or a record of which sources succeeded or failed for the customer — satisfies all three clauses of the test above: the customer can read the files with a text editor, delete them unilaterally, and the agent both reads and writes them in-session. Paths-related state lives in the existing memory/procedure_progress_<id>.md files (for progress within an active procedure that required path traversal) and in profile.json extensions (for persistent routing outcomes such as region). No new file types are required for paths state.
First-contact framing. When a customer shares a document with the harness at any capability tier, the harness MUST convey the following substance at the point of document intake (the exact phrasing may be adapted to the conversational context, but the substance MUST be preserved verbatim):
"If you share a document with me, I'll read it to find the parts that matter for your case — things like which commune issued it, what type of permit, and the months relevant to your timeline. The document file itself stays where you put it; I don't take a copy. What I do save into your profile is the categorical pieces I extracted (region, permit type, residence-period months), nothing more. Your profile lives on your computer and you can inspect or delete it whenever."
This wording avoids platform-specific disclosure (no reference to "cloud" or specific vendor infrastructure) in accordance with decision D26. Agent-platform privacy policies handle their own layer.
8.7.2 File layout (host filesystem available, T2+)
Concrete file layout depends on the harness. Under the Cowork plugin (V1+, per architecture.md §3 principle 13), <USER_DATA_DIR> resolves to a <user-picked-parent>/BeCivic/ folder under a user-picked parent path; the BeCivic root carries shared state (profile.json, MEMORY.md, privacy-attachment.md, .be-civic/ hidden subdirectory for system state) and per-procedure subfolders carry per-procedure state including a per-project CLAUDE.md. Detailed layout is documented in cowork-plugin.md §2.9.
Sibling harnesses (e.g. a future ChatGPT-app harness in chatgpt-app.md) have their own filesystem-or-not story; each harness spec is authoritative for its own on-disk shape. The universal privacy guarantees in §8.7.4 (profile schema) and §8.8 (memory cap rules) apply to whatever on-disk layout each harness adopts.
The legacy flat <USER_DATA_DIR>/ layout below is retained as the degraded-mode fallback for harnesses without a plugin (T0 paste-prompt sampler, etc.) and for reference. When the harness probe confirms that a host filesystem is writable but no plugin is active, all persistent customer state lives under <USER_DATA_DIR> (see §8.7.3 for path resolution; the default on POSIX systems is ~/.be-civic/). The directory MUST NOT be created without the customer's explicit consent at the first session. Once created, the layout is:
<USER_DATA_DIR>/
├── profile.json # enum routing fields only; see §8.7.4
├── memory/
│ ├── MEMORY.md # index, one line per entry, 200-line / 8KB cap
│ ├── customer_context.md # customer's self-reported civic situation (narrative)
│ ├── procedure_progress_<id>.md # one file per active procedure; see §8.8
│ ├── decision_log_<topic>.md # decisions the customer has taken during sessions
│ ├── document_reference_<id>.md # extracted routing fields from customer documents; see §8.9
│ ├── path_history_<id>.md # optional; one file per traversed path; see §8.7.2.2
│ └── archive/ # completed procedures past the active window; see §8.8
├── skills-cache/ # local cache of fetched skills; see §8.7.2.1 below
│ └── <skill-id>/
│ └── SKILL.md
├── sessions/
│ └── <session-id>/ # per-session ephemeral state; deleted on session close
│ ├── facts.json # structured facts surfaced during this session
│ ├── dossier-draft.md # working draft of any document the customer is assembling
│ └── observations-buffer.jsonl # submission items buffered for this session
├── submissions.jsonl # cumulative receipt log of all submitted items
└── analytics-outbox.jsonl # offline queue for analytics events; flushed at next session preamble
.gitignore note. Any harness writing to a project-scoped <USER_DATA_DIR> inside a git-tracked directory MUST add the directory name to the nearest .gitignore. This is a non-optional invariant: the harness MUST verify the ignore entry exists before writing the first file. On non-git systems the check is skipped.
8.7.2.1 skills-cache/
The skills-cache/ directory holds Be Civic skills the harness has fetched at runtime — typically via the §24.4.1 (see architecture.md) degradation chain fallback to web-fetch when MCP and HTTP API are both unreachable, or proactive caching of skills the customer is likely to need across sessions. Each cached skill lives at skills-cache/<skill-id>/SKILL.md carrying the canonical body and frontmatter exactly as published at becivic.be/skills/<skill-id>.
For a cached skill to be loaded by the consumer agent as an actual skill (Skill-tool routable, not just scratch markdown), it MUST be installed at the agent platform's skill-discovery path. The agent platform's skill-discovery path is platform-specific (the path used by Claude Desktop differs between macOS, Windows, and Claude Code). The harness MAY use a platform-aware symlink from the platform's skill-discovery path to <USER_DATA_DIR>/skills-cache/<skill-id>/ so that updates to the cached copy are visible without re-installation. Platform-specific paths are documented in bc-docs/CLAUDE.md "Skill loading paths" and updated as Claude Desktop versions evolve.
Cache invalidation. Cached skills carry a cached_at timestamp in a sidecar file (skills-cache/<skill-id>/.cached-at). The harness MAY refresh the cache on a session-start basis or on detection of an observation rejecting a value the cached skill claims; the refresh source is becivic.be/skills/<skill-id> over HTTPS. The harness MUST NOT serve cached content older than 30 days without re-fetching at least once.
Customer-side state qualification. The skills-cache/ directory satisfies the §8.7.1 three-clause customer-side state test: the customer can read each cached skill body with a text editor, can delete the directory unilaterally, and the agent both reads and writes it (read during procedure routing; write during cache refresh).
8.7.2.2 path_history/
The memory/ directory MAY carry an optional path_history_<id>.md file for each path the customer has traversed, where <id> is the path's catalogue ID (for example path_history_certificat-residence-historique.md). Each file records:
- Which source was attempted, in order, and the outcome for each attempt (
success,failed,skipped-ineligible,declined-by-customer). - The ISO 8601 date (YYYY-MM-DD) of the successful attempt, if one occurred.
- Whether the customer retains the delivered file:
yes,no, orunknown.
The file is plain markdown, written in the customer's preferred language, and is intended to be readable by the customer without assistance. Example frontmatter:
---
type: path_history
path_id: certificat-residence-historique
last_traversed: 2026-05-12
---
Path history files satisfy the §8.7.1 three-clause test. They MUST NOT record the document's content, its file name, or any field prohibited under §8.9.3. They are not required; if the harness does not write them, no behaviour is broken.
8.7.3 <USER_DATA_DIR> resolution
<USER_DATA_DIR> is resolved at harness initialisation, in platform-aware order. The harness uses the first option that applies:
- macOS —
~/Library/Application Support/be-civic/(the platform-conventional per-user data directory). - Windows —
%LOCALAPPDATA%\be-civic\(typicallyC:\Users\<user>\AppData\Local\be-civic\), the platform-conventional per-user data directory for non-roaming application state. - Linux / XDG-compliant —
$XDG_DATA_HOME/be-civic/when$XDG_DATA_HOMEis set; otherwise~/.local/share/be-civic/when~/.local/share/exists and is writable. - Fallback (all platforms) —
~/.be-civic/(POSIX-style home-directory dotfile, used when no platform-conventional path applies or is writable).
The resolution order MUST be applied uniformly by all conforming harness implementations. The harness logs the resolved path to <USER_DATA_DIR>/.location so that subsequent sessions on the same machine reuse the same directory even if the resolution rules would otherwise pick a different one. Cross-platform note: Claude Desktop is available on macOS and Windows as of the round-7 cutover; both must be supported for v1. Verification of the platform-specific Cowork connected-folder default is tracked as an open question (see cowork-plugin.md §4).
The resolved path is documented to the customer at session start in plain language: "I'll keep your notes at <path>. You can find them there if you want to inspect or back them up."
8.7.4 profile.json — shape and constraints
profile.json holds the routing fields that allow the harness to skip repeat questions across sessions. Every field is categorical or boolean. No field MAY hold a value that is or could derive from a real identifier. The complete set of fields for v1:
| Field | Type | Description |
|---|---|---|
region |
enum | Flanders / Wallonia / Brussels-Capital / German-speaking-community |
commune_nis5 |
string (5 digits) | NIS5 commune code only; no commune name, no address |
administration_language |
enum | NL / FR / DE — constrained to the commune's official languages. Filters by region per the form's pill-filter map (D26). When region is not-in-belgium-yet (D29), all three values are accepted and the form hint adapts. |
conversation_language |
string (free text, ≤32 chars) | The language the user wants the agent to communicate in. Free text (not enum) per D27 — any language the agent can speak works (English, French, Tagalog, Slovenian, etc.). Agent pre-fills detected language at runtime; user may override on the onboarding form. Renamed from / replaces the prior other_languages list. The legacy other_languages[0] shape is migrated to conversation_language on first write under V1 schema. |
civic_status |
enum | single / cohabitant-legal / married / divorced / widowed |
nationality_status |
enum | BE / EU / non-EU / multiple |
residency_status |
enum | registered / registering / EU-citizen / non-EU-permit / asylum / undocumented |
residency_history |
list of objects | Each object: {start, end, visa_type, permit_type, country_of_last_residence} — periods only, no document numbers. Dates are YYYY-MM strings (month-bucket precision); see "Date precision" below. |
dependents |
object | {minor_children_count, adult_dependents_count, spouse_abroad: bool} — counts and booleans only |
employment_history |
list of objects | Each object: {start, end, type, days_per_week, total_days_estimate} where type ∈ {FT, PT, self-employed, student, unemployed, retired} — no employer names, no ONSS numbers. Dates are YYYY-MM. |
education_history |
list of objects | Each object: {start, end, level, country_of_institution} — no institution names, no diploma numbers. Dates are YYYY-MM. |
document_inventory |
object of mixed types | has_id_card (enum: yes / not-yet-waiting / no / not-sure), plus booleans has_residence_card / has_work_permit / has_NN / has_passport_BE / has_passport_other, plus validity_end_<doc> as YYYY-MM for each document the customer holds. No document numbers, no copies, no exact expiry day. See has_id_card row below for the rename and rationale. |
has_id_card (inside document_inventory) |
enum | yes / not-yet-waiting / no / not-sure (D22, D23). Renamed from has_eID — the prior eID-vs-residence-card distinction is dropped because all Belgian-issued chip cards (eID and residence card) are functionally equivalent for itsme/identity purposes; the agent disambiguates card-type-specific path-source eligibility at path-traversal time, not at onboarding (D52). |
browser_driving_preference |
enum | drive-by-default / ask-each-time / never-drive — honoured at path-traversal time per architecture.md §24.9 (Chrome MCP handoff vs AUQ vs markdown-link). New field per D8. |
consent |
object (typed namespace) | Extensibility hook for consent metadata. The schema declares the namespace but specific keys are operational and vary by phase. Alpha-phase keys (e.g. alpha_bundle, signed_at, version) are documented in cowork-plugin.md §3.8. Post-alpha keys for granular per-stream opt-out will be documented when that posture lands. Consumers MUST tolerate unknown consent.* keys; the namespace is intentionally permissive. |
active_procedures |
list of skill IDs | Procedure-skill IDs currently in flight; cross-references into memory/procedure_progress_*.md. The list contains ALL ongoing procedures, not just the currently-focused one; the harness holds state for each in memory simultaneously and routes by customer cue. |
transitions_in_progress |
list of enum values | marriage-planned / divorce / address-change and equivalents |
has_id_card migration. Existing profile.json files carrying document_inventory.has_eID (boolean) are migrated on first read under V1 schema: has_eID: true → has_id_card: "yes"; has_eID: false → has_id_card: "no". There is no path to "not-yet-waiting" or "not-sure" from legacy data; those values originate only from V1+ onboarding forms.
other_languages → conversation_language migration. The legacy other_languages ordered list is superseded by the free-text conversation_language field. On first write under V1 schema, other_languages[0] (the prior harness communications language slot) is migrated to conversation_language; the remaining entries are dropped (they were not load-bearing under any v1 routing decision). Agents MUST tolerate legacy other_languages on read but MUST NOT write that field under V1.
Date precision. Every date field in profile.json is encoded as a YYYY-MM string (month-bucket precision). Day-level precision is not stored for any field. This applies to all of residency_history, employment_history, education_history, and every validity_end_<doc> field in document_inventory. The constraint exists for two reasons: (1) month-bucket precision is sufficient for every routing decision the harness makes at v1; (2) day-level precision narrows the de-anonymisation surface materially when combined with other fields (commune, employer-type-by-period, residence-permit-type-by-period). The harness MUST round customer-provided exact dates to month-bucket form before writing to profile.json. The harness MAY hold day-level precision in <USER_DATA_DIR>/sessions/<session-id>/facts.json for the duration of an active session (where it is needed for deadline reminders), but MUST NOT carry day-level precision into persistent state.
This rule REVERSES the 2026-05-11 operator override that permitted exact expiry dates. The v1 posture is intentionally tighter than the longer-term posture: as the customer-side state contract matures and additional safeguards land (encrypted-at-rest options, additional scrub layers, in-document tagging), v1.1+ MAY relax precision for specific fields where a customer-precision use case is demonstrated. The v1 default is YYYY-MM uniformly.
The design decisions record (2026-05-11, Cluster 7) identifies 14 named fields above. Two additional structural positions complete the 16-axis catalogue: profile_schema_version (string, schema version sentinel, written on first create and on schema upgrade) and last_updated_at (ISO 8601 timestamp, written on every write, for staleness detection). These two metadata fields are not routing axes and carry no identifying information; they are non-optional on every conforming profile.json.
What MUST NOT appear in profile.json:
- Any national identifier or derivative (NISS, NN, eID chip data, social security number, foreign tax ID)
- Any document number (passport number, residence card number, work permit number)
- Any name (given name, family name, alias)
- Any date of birth, place of birth, or biometric data
- Any full postal address (commune and region category are the finest granularity permitted)
- Any photograph or image reference
- Any narrative field (narrative content lives in
memory/customer_context.mdand related files)
The constraint is structural: any proposed v2 field that COULD hold identity in any realistic population of inputs MUST be rejected at the schema layer, not by policy alone.
profile.json MUST be valid against bc-docs/schemas/profile.schema.json on every write. The schema enforces the field-level constraints listed above. A harness that writes to profile.json without validating against the schema is non-conformant.
8.7.5 memory/ shape
MEMORY.md is the index: one line per memory entry, at most 200 lines, at most 8KB. On T2/T3 (Cowork tab in Claude Desktop), MEMORY.md is read at session start via explicit skill instructions. On T4 (Claude Code, or any environment that supports skill-frontmatter hooks), MEMORY.md is injected into context via the UserPromptSubmit hook on every turn. Cowork tab hook support is an open question (see §18); if confirmed, Cowork at T3 can be upgraded to T4 without re-architecting memory/ shape.
Per-topic files carry YAML frontmatter with at minimum name, description, and type. Permitted type values:
customer_context— customer's self-reported situation and background; free narrativeprocedure_progress— current state of a specific active procedure (step, outstanding documents, next action); one file perprocedure_progress_<id>.mddecision_log— decisions the customer has made with Be Civic's assistance (for example, "chose path B for language exam waiver")document_reference— routing fields extracted from a customer-supplied document; see §8.9 for content constraints
No type outside this list is valid in v1. New types require a Tier B amendment.
8.7.6 sessions/ directory
The sessions/<session-id>/ directory is ephemeral: it is created at session open and MUST be deleted at session close after the submission buffer has been flushed (or on session_outcome: abandoned_inferred submission for orphaned sessions, per §8.8.3). No session directory persists across session boundaries. This is non-negotiable: session state is never accumulated across sessions; only the extracted routing fields and narrative summaries in profile.json and memory/ carry forward.
8.7.7 submissions.jsonl and analytics-outbox.jsonl
submissions.jsonl is an append-only receipt log. Each line is a JSON object recording a submitted item: {submitted_at, session_id, type, id, cancel_token, commit_eta, status}. The type field carries one of the 2026-05-15 feedback type values (concern | amendment | validation | draft | feedback | rating); the id field is the matching <type>_id (concern_id / amendment_id / validation_id / draft_id / feedback_id / rating_id) — renamed from the pre-amendment observation_id / skill_amendment_id / skill_draft_id shape. The session_id field is retained (S61 reversal — the cluster-2 amendment had proposed dropping session_id from this log in favour of a recovery_token; the reversal restores the original shape). The harness appends a line on every successful mode: "stage" response. The customer can read this file to review and cancel pending submissions. The file is the customer's own record; Be Civic does not hold a copy.
analytics-outbox.jsonl is an offline queue for analytics events that could not be submitted during a session (network unavailable, scrub-rules fetch failed). Each line is an analytics event in the shape defined by POST /api/analytics. The harness MUST attempt to flush the outbox at the next session preamble before generating new events for that session. Flushing is a deterministic code path: no LLM involvement. Events in the outbox are discarded after 30 days without successful submission.
8.8 Retention and deletion semantics
8.8.1 Active procedure files
procedure_progress_<id>.md MUST be retained as long as the procedure is active (that is, <id> appears in profile.json active_procedures), or for 90 days after the file was last written, whichever is shorter. "Last written" is the file's mtime; the harness MUST NOT backdate mtimes.
When a procedure completes (the customer reaches a confirmed terminal step) or when the 90-day inactivity window expires, the harness MUST move the file to memory/archive/<id>.md and remove the procedure's ID from active_procedures in profile.json.
8.8.2 Archived procedure files
Files in memory/archive/ are retained for one year from their archive date (recorded in the file's frontmatter as archived_at). After one year, the harness SHOULD delete them. The harness MUST surface a deletion warning to the customer at session start if any archived file is within 30 days of its one-year mark, so the customer can export the content before it is removed.
8.8.3 Session buffers
sessions/<session-id>/ is deleted on session close after the submission buffer has been flushed to POST /api/feedback (or after the customer has explicitly declined submission). An orphaned session directory (no session-close event received, directory age greater than 72 hours) is cleaned up by the harness at the next session preamble. Before deleting an orphaned session directory, the harness MUST submit session_outcome: abandoned_inferred to POST /api/analytics if analytics opt-in is active.
8.8.4 Customer-initiated deletion
A customer MAY delete ~/.be-civic/ (or the equivalent <USER_DATA_DIR>/be-civic/) at any time, by any means, with no Be Civic consequence. The next session is treated as first_contact. The harness MUST NOT prevent, warn against, or create friction around customer-initiated deletion. The harness MUST NOT attempt to re-create deleted files from server-side state (because no server-side state exists).
There is no account to deactivate, no server-side deletion request to file, and no right-to-erasure workflow needed for the local state: deletion is the customer's unilateral act.
8.9 Document-content-discard rule
8.9.1 Scope
This section applies when a customer provides Be Civic with a copy, scan, photograph, or paste of a personal document — including but not limited to: national identity cards (eID), foreign national identity documents, passports, residence permit cards (A through M), work permits, diplomas, and official correspondence.
8.9.2 What the harness MUST extract and retain
From a customer-supplied document, the harness MUST extract only the routing fields needed to determine procedure eligibility or next-step routing. Examples:
- From a residence permit card:
permit_type(for example "F card"),validity_end(ISO 8601 date, for example "2028-06-15"),validity_start(ISO 8601 date, optional) - From a passport:
issuing_country,validity_end(ISO 8601 date) - From official correspondence:
issuing_authority_category(commune / CGVS / OCMW / other),subject_category(invitation / refusal / decision),deadline_date(ISO 8601 date) where the correspondence carries a deadline
Validity dates and other non-identifying dates printed on a document MAY be retained as exact dates. They are not identity-derivative: knowing a residence permit expires on a given date does not narrow the holder to a small population. Retaining the exact date enables the harness to provide proactive renewal warnings (for example, "your F card expires in 76 days; here are the renewal steps").
Extracted routing fields are written to memory/document_reference_<id>.md with provenance metadata: {source_category, extraction_date, fields_extracted: [list of field names]}.
8.9.3 What the harness MUST NOT retain
The following MUST NOT appear in profile.json, any memory/ file, sessions/, or any other persisted location:
- The document number, card number, passport number, or any other identifier printed on the document
- The customer's full name, given name, or family name as it appears on the document
- The customer's date of birth or place of birth (dates of birth are identity-derivative; combined with commune, they narrow to a small population)
- Any photograph or biometric data
- The full address as printed on the document (commune and region category are permitted)
- Any text block from the document beyond the specific fields enumerated in §8.9.2
Document validity dates (issue date, expiry date) and other non-identifying temporal fields MAY be retained per §8.9.2. Date of birth is the one date type that is identity-derivative and remains prohibited.
8.9.4 Scrub verification
The Layer 1 consumer-side scrub (regex plus LLM contextual pass) MUST run against the memory/document_reference_<id>.md content on every write, not only on submission buffer writes. The scrub rules fetched at session start apply. If any scrub rule fires on a document reference file, the harness MUST abort the write, log a structured warning to sessions/<id>/scrub-warnings.jsonl (not to the submission buffer), and prompt the customer to confirm that the field should be omitted.
8.9.5 Original content
The original document content (the customer's paste, the OCR output, the image data) MUST NOT be written to any file in ~/.be-civic/. It exists only in the active session context and is discarded when the session ends. The harness MUST NOT store the original content in sessions/<id>/dossier-draft.md or any other session file: dossier-draft is for documents the customer is assembling for submission to authorities, not for copies of documents already held by the customer.
8.9.6 Document delivery via paths
When a path delivers a document to the customer (for example, a Brussels Tier-1 quickLink generates a PDF, or the customer downloads a residence certificate via a federal portal), the document file is stored where the customer puts it (their connected folder, their Downloads directory, or wherever they choose to save it). The harness MUST NOT relocate or copy the file. The discard rule in §8.9.3 applies to what the harness extracts from the document into customer-side state: only the categorical routing fields named by the procedure skill's inputs or the path's outputs schema are written to profile.json or memory/<id>.md. Nothing else from the document enters customer-side state, regardless of whether the harness "saw" the document content in its conversation window during the delivery session.
8.10 Anonymous-by-construction — structural reinforcement
8.10.1 No identifier derivatives
The following are unconditionally prohibited in any Be Civic system, including consumer-side local state:
- The NISS / national registration number, or any hash, truncation, or transformation of it
- Any email address hash
- Any device fingerprint or hardware identifier
- Any purpose-generated derivative of a real identifier (including partial NISS, date-of-birth-derived token, or document-number-derived token)
If a session-level correlation token is needed (for example, for linking submission buffer entries), it MUST be a randomly generated UUIDv7 with no relationship to any real identifier. session_id (ses_<UUIDv7>) satisfies this requirement. It MUST NOT be seeded from or mixed with any customer attribute.
(Per the 2026-05-15 S61 reversal, session_id is the recovery key end-to-end; the recovery_token concept proposed in the 2026-05-11 Cluster 2 amendment is dropped from the spec. The Worker echoes the agent-provided session_id back in the response body alongside concern_id / amendment_id / etc. and cancel_token; the recovery endpoint is GET /api/feedback/sessions/<session_id>. D1's validations.session_id column persists the agent-provided value; the prior-proposed recovery_token column on concerns was never created and is permanently dropped from the migration sequence.)
8.10.2 Categorical fields are a structural constraint, not a policy choice
The requirement that profile.json fields be categorical or boolean (§8.7.4) is not a policy choice made for compliance reasons. It is a structural constraint that ensures the profile cannot re-identify the customer even if the file is read by a third party. Any proposed v2 field that would require a precise numeric value, a name, or a date string MUST be redesigned as a categorical field before the proposal advances to Tier B amendment review.
8.10.3 Vendor memory degrades gracefully
On T0 and T1 (no host filesystem), Be Civic MAY degrade to in-memory session state and rely on the customer's AI vendor account memory (vendor key-value stores [e.g., Project Memory in Anthropic platforms], ChatGPT memory, and equivalent) for cross-session persistence. The harness MUST NOT write anything to vendor memory that would violate the field-level constraints of §8.7.4. The vendor memory path is a capability degradation, not a separate data regime with lower privacy standards.
8.10.4 Paths are anonymous-by-construction
The Path Directory NEVER carries customer-identifying state. A path entry is the same catalogue object for every customer; it describes a route to a document or tool, not anything about any individual who has used it. Per-customer eligibility evaluation happens at traversal time, in the harness, against the customer's local profile.json — not server-side. No customer attribute is transmitted to the catalogue server as part of path resolution. The path_history_<id>.md files (§8.7.2.2) are local-only; they are not submitted to any Be Civic endpoint and are not part of the submission protocol. A customer who has traversed fifty paths has left no identifying trace on the Path Directory beyond aggregated, salted validation submissions (§9.5 (see lifecycle.md)) that are subject to the same per-artefact-salted IP-hash anonymisation as skill validations.
Cross-references
Cross-doc references are inlined throughout this document in the form §X.Y (see
- §3 (Non-negotiable principles, including principle 11 customer-side state) — see
architecture.md§3 - §6.2 (Submission schemas, identity-field bans, free-text caps) — see
schemas.md§6.2 - §6.7 (Agent capability requirements per submission type) — see
schemas.md§6.7 - §6.8 (Scrub rules file) — see
schemas.md§6.8 - §6.11 (Catalogue UID convention, PR-CI uid assignment) — see
schemas.md§6.11 - §6.12 (Path Directory schema) — see
schemas.md§6.12 - §7 (Trust model / maintainer-review queue) — see
protocol.md§7 - §9 (State-machine promotion) — see
lifecycle.md§9 - §9.2 (Promotion thresholds) — see
lifecycle.md§9.2 - §9.5 (Path and path-source lifecycle) — see
lifecycle.md§9.5 - §11.1 (Source rot) — see
lifecycle.md§11.1 - §13.1 (Agent interface manifest page) — see
architecture.md§13.1 - §15.7 (Harness consumer obligations) — see
skills.md§15.7 - §15.8 (Conversation invariants — plain-language obligations) — see
skills.md§15.8 - §18 (Open questions, Cowork hook support) — see
architecture.md§18 - §20 (Website rendering / renderer Worker) — see
website.md§20 - §23 (MCP server) — see
protocol.md§23 - §24.4 (Capability tiers) — see
architecture.md§24.4 - §24.5 (Three-tier returning-user adaptation) — see
architecture.md§24.5