Technical overview

How CiviQ is built.

A capability and architecture overview, what each layer does and the guarantees it gives. It is intentionally not a reproduction blueprint.

01 · Architecture

Surfaces in, governed core, evidence out.

One governed core sits behind every channel and in front of a pluggable model. The core is where accountability lives; the model is a swappable component.

Layer 1

Surfaces / channels

Web, embed, SMS, messaging, USSD, voice - thin adapters over one contract.

Layer 2 · governed core

Policy → Skills → Retrieval → Generation → Verification

The accountable request lifecycle. Policy decides whether to engage; verification decides whether to speak. Everything in between is grounded in your content.

Layer 3

Evidence & audit

Sources, checks, and a hash-chained record attached to every decision.

Layer 4 · backend

Pluggable model backend

Extractive (no model), a local LLM, or an opt-in remote model selected at installation, swapped without touching the core.

02 · Lifecycle

Nine stages, described as guarantees.

What each stage guarantees - not how it decides. The how is deliberately withheld.

01
Safety & intent guards
guarantees out-of-scope and unsafe requests are caught before any work begins
02
Query understanding
guarantees the request is normalised before retrieval
03
Hybrid retrieval
guarantees candidate evidence is drawn only from your content
04
Ground-or-refuse gate gate
guarantees no generation without supporting content
05
Grounded generation
guarantees the draft is constrained to retrieved passages
06
Adversarial self-check + confidence
guarantees the draft is challenged and scored before release
07
Post-generation refuse gate gate
guarantees a weak or unsupported answer is discarded, not shipped
08
Accessibility check
guarantees the wording meets a plain-language bar
09
Cited answer or named handover
guarantees an outcome: an evidenced answer, or a routed handover — never a dead end

03 · Policy engine

A gate that runs before retrieval.

The policy engine evaluates each request before any content is fetched. Institutions configure the actions - answer, refuse, escalate, human-review, or block and the engine attaches its policy-check evidence to every response. Policy is versioned and editable in-app, so a change is deliberate, attributed, and reversible.

Configurable actions

answerrefuseescalatehuman-reviewblock

Each response carries the policy decision that produced it.

04 · Skills layer

Deterministic answers before RAG, where appropriate.

For a class of requests - glossary terms, contact details, institution-registerable tools a deterministic skill answers directly, carrying its provenance, rather than reaching for retrieval and generation. The purpose is correctness and speed on the things that shouldn't be left to a model. How a skill is selected and arbitrated is part of the implementation we don't disclose.

05 · Retrieval & ingestion

Structure-preserving in, hybrid out.

Ingestion preserves document structure headings and tables and chunks along the nested-heading hierarchy, so a passage keeps the context it lives in. Retrieval is hybrid: lexical BM25, local embeddings, and a reranking stage, with recency awareness so newer guidance wins. A lexical fallback keeps the zero-dependency install working even with no model present.

Components are named for credibility. Tuning, weights, and the ranking recipe are withheld.

Named components

  • Structure-preserving ingestion (headings + tables)
  • Nested-heading chunking
  • Lexical BM25
  • Local embeddings
  • Reranking stage
  • Recency awareness + lexical fallback

06 · Refusal-first & verification

Guarantees that bound hallucination and overconfidence.

Verification is layered so a weak answer has several ways to be stopped and only one way to ship by passing all of them.

Ground-or-refuse gating

No generation without supporting content. Bounds answers to what your content can back.

Adversarial self-check

The draft is challenged against its own evidence before release.

Calibratable confidence

A confidence signal the institution can tune the threshold on without seeing the recipe.

Post-generation refuse gate

A final stop: weak or unsupported drafts are discarded, not shipped.

Numeric anti-fabrication

Guards against invented figures, dates, and amounts that aren't in source.

Plain-language check

An accessibility bar on the wording before it reaches a resident.

07 · Evidence & audit

Glass-box answers, tamper-evident record.

Every answer ships its sources, the basis for its confidence, and the policy checks it passed. Every decision is written to a hash-chained, tamper-evident audit log, with the reason recorded. Exports are signed and verifiable. A retention purge re-chains the log, so erasure doesn't break verifiability.

Properties

  • Sources + confidence basis on every answer
  • Hash-chained, tamper-evident entries
  • Reason recorded for every decision
  • Signed, verifiable export
  • Retention purge re-chains the log

08 · Trust & security

Security as a default, not a setting.

Passwordless auth

Email-OTP sign-in - no stored passwords to leak.

4-tier RBAC

Auditor, officer, admin, superadmin - least privilege by role.

Hashed API keys

Keys are stored hashed, never in plaintext.

Hardening

CSRF protection, security headers, rate limiting, and an input-size guard.

PII handling

PII scrubbing, retention purge, and right-to-erasure under UK GDPR.

Operational guards

Production-config guards plus health and metrics endpoints.

09 · Governance & versioned config

Cloudflare deployment inspired versioned configuration.

Configuration is immutable, timestamped, and attributed every version records who changed what, and when. One version is active at a time. An in-app policy editor validates a change, deploys it live, and can roll it back. Multi-employee accounts are gated by a superadmin.

Edit → validate → deploy → rollback

v17 active · deployed 14:02 · by admin@trust
v16 · validated · rolled back 13:51
v15 · superseded · 09 Mar
immutable · attributed · one active version

10 · Channels

One governed core, thin output adapters.

Every channel is an output adapter over one webhook contract, built and unit-tested against simulated provider payloads, not yet wired to a live carrier. The adapter translates the provider's format; the governed engine never changes. Adding a surface is an adapter, not a new build.

WebEmbed / iframeSMSWhatsAppTelegramShort-code / USSDVoice / IVR

11 · Intelligence & quality metrics

The audit log is also the dataset.

CiX scoring turns the audit log into a citizen-experience read and a content/automation backlog. Demand and gaps are clustered to show what residents ask and where content falls short. Quality metrics, grounded-ness, citation quality, hallucination rate, escalation quality, are computed over the same log, so the numbers come from the system's own record.

Computed over the audit log

Groundedness
Citation quality
Hallucination rate
Escalation quality

live values populate from real traffic - withheld until a pilot

12 · Model & deployment

Pluggable backends, zero-egress by default.

Choose a backend at install: extractive with no model, a local LLM, or an opt-in remote. The default is Llama 3.2 3B - roughly 2 GB and CPU-capable. A one-command install ships with a model picker. Docker and docker-compose (with a TLS profile) and Helm cover deployment. Nothing egresses unless you choose it.

Deployment

  • Extractive (no model) / local LLM / opt-in remote
  • Default: Llama 3.2 3B (~2 GB, CPU-capable)
  • One-command install with model picker
  • Docker · docker-compose (TLS) · Helm
  • Zero-egress by default

13 · Quality engineering

A green gate on every change.

222
Automated tests
green
Accuracy / refusal eval gate
every PR
Gate must run green
tagged
Release workflow

CI runs on every change

linttestseval gateSBOMdependency auditsecret scanningcontainer scan

Eval gate, green: 15/15 answerable, 7/7 correct refusals, 3/3 out-of-scope, 0 false refusals, 0 confident-wrong - on a 27-item set. The gate measures correctness and refusal; it asserts no headline accuracy percentage.

14 · Standards & compliance

Mapped, with sign-off honestly pending.

The build is mapped against recognised standards. That's readiness, not certification - external sign-off is still ahead, and we say so.

Cyber Essentials

Mapped · readiness

ISO 27001

Annex A themes mapped

WCAG 2.2 AA

Mapped · audit pending

UK GDPR

DPIA pack · DPIA signing pending

Readiness, external sign-off pending - stated plainly.

15 · The moat

What we don't disclose and why.

This page describes what each layer does and what it guarantees. It deliberately stops short of how. The exact prompts, thresholds, confidence calibration, ranking recipe, and decision heuristics are withheld.

That isn't evasion; it's the contribution. The value of CiviQ is the system design and the discipline around it, the part that turns a small model into an accountable system. Naming the standard components (BM25, local embeddings, hash-chaining) earns credibility. Keeping the recipe is the moat.

Withheld by design

  • ·Exact prompt text and strategies
  • ·Confidence formulas and thresholds
  • ·Calibration and tuning values
  • ·The reranking recipe
  • ·Decision heuristics

The source is open. The judgement encoded in it is earned.