Reference

Accountability scoring methodology

Every accountability score on this site is computed by deterministic rules — no learned model. Below is a visual, interactive walk-through of those rules; the precise specification follows it.

The four dimensions

Four separate scores — never blended into one number

A government can be diligent but undelivering, or fund things well without fulfilling its mandates. Squashing four very different signals into one “score out of 10” hides exactly the information that matters.

Delivery

Are promises being kept?

Scored on: Government · party · body · officeholder

Government has 66 live Programme-for-Government commitments.
Coverage: 100% of those are backed by evidence.
Each commitment is weighted by significance (flagship × 3, standard × 1, minor × 0.5).

Right now: 33% of government commitments delivered (weighted)

Diligence

Do officeholders show up to vote?

Scored on: Every officeholder with in-term divisions

254 officeholders scored.
A vote is “participated in” if a ta / nil / staon vote was recorded.
Officeholders are never penalised for divisions outside their term.

Right now: 52% average participation across all officeholders

Mandate fulfilment

Are bodies fulfilling their statutory mandates?

Scored on: Every body with mandate-linked commitments

15 bodies scored.
Only counts commitments tied (via mandateId) to a statutory mandate.
Uses the same status × significance weighting as Delivery.

Right now: 31% average fulfilment across scored bodies

Fiscal stewardship

Does spend track allocation?

Scored on: Every body with at least one budget vote

19 bodies have budget votes.
Linear penalty: 0% off allocation → 1.0, 20% or more off → 0.0.
Outturn is rarely published yet, so coverage is 0 today — an honest “not yet verifiable”, not a zero verdict.

Right now: n/a actual-vs-allocated stewardship

Delivery dimension

How a delivery rate is built up from individual promises

Three ingredients: a status for each commitment, a significance weight, and a time-awareness rule that keeps promises-not-yet-due out of the failure column.

Step 1 · Status maps to a value 0–1

delivered

1.0

partially-delivered

0.5

in-progress

0.3

stalled

0.1

broken

0.0

abandoned

0.0

promised

excluded — see on-track rate

—

deliveredPromise kept in full.
partially-deliveredSome but not all of the promise has shipped.
in-progressActive work demonstrably under way.
stalledStarted, then went quiet.
brokenReversed or contradicted.
abandonedDropped from the plan.
promisedMade but not yet due → excluded from the delivery rate. Counts toward onTrackRate instead.

Step 2 · Significance sets the weight

flagship

× 3

standard

× 1

minor

× 0.5

Flagship promises move the dial three times as much as a standard one. Missing a minor tweak hurts much less than missing a flagship.

Try it

Interactive delivery calculator

Adjust the statuses below to see how the delivery rate moves. The maths is weighted: value × significance, divided by the total significance weight.

Value	Weight	Contribution
0.3	× 3	0.90
0.5	× 3	1.50
1.0	× 1	1.00
0.1	× 0.5	0.05
—	—	excluded

4 counted commitments · 1 promised (excluded)

Calculated delivery rate

deliveryRate = 3.45 ÷ 7.50 = 0.460

Counted commitments are all non-promised entries. Promised ones (still in their grace period) drop into the separate on-track rate: 100%.

Diligence dimension

Did this officeholder turn up to vote?

Every recorded division during an officeholder's term either has a vote row (they participated) or doesn't (they were absent). Term scoping means nobody is punished for votes before they took office or after they left.

Worked example

Six divisions in one term

ta / nil / staon = participation (value 1)no vote row = absent (value 0)

participationRate = 4 ÷ 6 = 0.67 (67%)

Divisions held before they took office or after they left never count. Out-of-term votes are simply not in the denominator.

Every officeholder page on the accountability ledger lists their full division-by-division record with each individual vote linked.

Mandate-fulfilment dimension

Is each body delivering on its statutory mandates?

Uses the same status × significance maths as Delivery, but only on commitments tied to a body's statutory mandates — a much narrower lens.

How the join works

Commitment → mandate → body

Only commitments whose mandateId resolves to a real mandate are included. The score belongs to the mandate's body — which can differ from the commitment's directly-responsible body.

Browse any body page on the accountability ledger to walk this join with real records.

Fiscal-stewardship dimension

Does spend track the budget it was allocated?

Once a fiscal year closes, every budget vote can be compared to its outturn. The further the spend strays from allocation — either way — the lower the stewardship value.

Try it

Variance → stewardship value

Drag the slider to see how a single vote scores against its allocation. The curve is linear from 1.0 (perfect) down to 0.0 at 20% off, in either direction.

Variance7%

Allocation€500m

Direction

outturn = €535m

value = max(0, 1 − 0.07 ÷ 0.20) = 0.65

The 20% tolerance band is the single tunable parameter of this dimension. Either side of perfect — overspending or underspending — loses points equally.

Coverage

Every rate ships with its data completeness

A high rate built on thin data is a much weaker signal than the same rate built on complete data. The site never blends those together.

Why coverage is reported alongside every rate

Same headline number, very different confidence

Body A — high coverage

Rate (delivery)

78%

Coverage92%

78% delivery, 92% of commitments evidence-backed. Trustworthy.

Body B — low coverage

Rate (delivery)

78%

Coverage15%

78% delivery, but only 15% evidence-backed. The rate isn't wrong — it's just standing on very thin data.

Coverage is never folded into the rate — it sits beside it so a confident-looking number on thin data is always visible. A score with 0% coverage is rendered as not yet verifiable rather than as a zero verdict.

Snapshot as of 2026-05-23 · methodology version delivery-2.0.0

Full methodology reference (source markdown)

The source of this document is data/accountability/scoring-methodology.md. The interactive section above renders the same rules visually.

Accountability scoring methodology

Methodology version: `accountability-2.0.0` (dataset envelope). Per-dimension stamps: `delivery-2.0.0`, others at accountability-2.0.0.

This document describes the accountability scoring dimensions produced by scripts/compute-scores.mjs and written to data/accountability/scores.json.

Every score carries its own methodologyVersion. The Delivery dimension is versioned independently of the dataset envelope: a bump to Delivery does not silently restamp scores from other dimensions, and historical scores keep the methodology stamp that produced them.

There are four dimensions: Delivery, Diligence, Mandate-fulfilment and Fiscal-stewardship. Each is computed and reported on its own. They are never blended into a single composite number.

No black box, fully decomposable

Every scorer is a deterministic rules function, not a learned model. The same inputs always produce the same outputs, and the output order is stable. Every AccountabilityScore carries a breakdown[] array — one row per underlying record — so any rate can be expanded back into the exact records, weights and values that produced it. No record-level data is hidden inside an aggregate.

Every score also carries a `coverage` figure — data completeness — and it is never omitted. Coverage is reported independently of the headline rate so a high rate built on thin data is always visible.

Delivery dimension

Weighted commitment delivery for an entity. Methodology version `delivery-2.0.0`. The pure delivery-1.0.0 rate is preserved alongside the new outcome-aware rate; the gap between them is the absorbed-reform signal.

Status → delivery value

Each commitment's CommitmentStatus maps to a delivery value in [0,1]:

Status	Value
`delivered`	1.0
`partially-delivered`	0.5
`in-progress`	0.3
`stalled`	0.1
`broken`	0.0
`abandoned`	0.0
`promised`	excluded — see Time-awareness

Significance weighting

Each commitment is weighted by its significance field:

Significance	Weight
`flagship`	3
`standard`	1
`minor`	0.5

If significance is absent, the commitment defaults to `standard` (weight 1).

deliveryRate is the weight-weighted mean of the delivery values of all counted (due) commitments:

deliveryRate = Σ(value · weight) / Σ(weight)   over counted commitments

Time-awareness

A promised commitment that is not yet due is not a failure. It is excluded from deliveryRate (its ScoreContribution has counted: false) and counted instead toward onTrackRate:

onTrackRate = (promised commitments still on track) / (all promised commitments)

A promised commitment is "on track" unless its expectedDeliveryBy date has already passed relative to the score's asOf date. With no expectedDeliveryBy it is treated as on track.

Delivery coverage

coverage = (commitments backed by ≥1 high/medium-confidence evidence source)
           / (all of the entity's commitments)

Evidence sources with confidence of low (or absent) do not count.

Derived fields: numeric target and deadline

hasNumericTarget and hasDeadline are used for downstream analysis. When a commitment file does not set them they are derived in-memory from the commitment text (the committed JSON is not modified):

`hasNumericTarget` — true when the title/description contains a number

followed by a unit keyword (homes, units, beds, jobs, MW, staff, …), a percentage, a monetary figure (€/$/£), or a bare 3+ digit number.

`hasDeadline` — true when expectedDeliveryBy is set, or the text mentions

a target year/quarter (by 2030, by the end of 2027, Q3, mid-2026) or a relative window (within 5 years).

These are simple, transparent heuristics; they do not feed the delivery rate.

Delivery entities scored

One AccountabilityScore (dimension: "delivery") is emitted per:

`government` — all Programme for Government commitments, collectively.
`body` — grouped by each PfG commitment's responsibleBodyId.
`officeholder` — grouped by each PfG commitment's responsibleOfficeholderId.
`party` — grouped by each GE2024 manifesto commitment's partyId.

Delivery 2.0.0 — outcome-aware adjustment

The Delivery dimension exposes two rates side by side:

deliveryRate — pure delivery-1.0.0 value (no outcome adjustment).

Preserved exactly so existing readers stay stable and historical comparison is apples-to-apples.

outcomeAdjustedRate — delivery-2.0.0 value with the rule below applied.

The gap between the two rates is the absorbed-reform signal. If a government delivers a lot of delivered commitments whose substantive outcomeStatus is outcome-unchanged, the two rates diverge: deliveryRate rewards the paperwork, outcomeAdjustedRate discounts it. This is the "delivered vs delivered-and-worked" distinction from [docs/power-and-blockers.md](../../docs/power-and-blockers.md) section "Absorbed reform — outcome status separate from delivery status".

When the adjustment fires

Only on rows whose CommitmentStatus is delivered or partially-delivered. Every other status is untouched (the adjustment is a no-op for promised, in-progress, stalled, broken, abandoned).

Status × outcomeStatus → value table

Delivery status	outcomeStatus	Value applied to row	Note on row
`delivered`	`undefined` / `not-applicable`	1.0 (unchanged)	none
`delivered`	`outcome-improved`	1.0 (unchanged)	none
`delivered`	`outcome-unchanged`	0.5 (half value)	"outcome-unchanged: value halved per delivery-2.0.0 …"
`delivered`	`outcome-worsened`	0.0 (zero)	"outcome-worsened: value zeroed per delivery-2.0.0 …"
`delivered`	`contested`	0.5 (half value)	"contested: value halved per delivery-2.0.0 …"
`partially-delivered`	`undefined` / `not-applicable`	0.5 (unchanged)	none
`partially-delivered`	`outcome-improved`	0.5 (unchanged)	none
`partially-delivered`	`outcome-unchanged`	0.25 (half of 0.5)	"outcome-unchanged: value halved per delivery-2.0.0 …"
`partially-delivered`	`outcome-worsened`	0.0	"outcome-worsened: value zeroed per delivery-2.0.0 …"
`partially-delivered`	`contested`	0.25	"contested: value halved per delivery-2.0.0 …"
every other status	(any)	unchanged (`delivery-1.0.0` mapping)	none

Why `undefined` is treated as "no adjustment", not as `outcome-unchanged`

Absence of outcome data is not absence of outcome. We do not assume the world stood still simply because nobody has yet sourced the outcome metric. Penalising unfilled outcome fields would incentivise leaving them blank, and would conflate "we have not measured" with "we measured no change".

Instead, the absence is surfaced honestly through coverage (the share of commitments backed by ≥1 high/medium-confidence source). The Delivery dimension keeps the same coverage definition; a separate outcome-coverage signal can be layered in additively later without changing the rates above.

not-applicable is treated the same as undefined for the same reason: an administrative commitment with no measurable outcome should not be punished for the lack of one.

Relationship between `deliveryRate` and `outcomeAdjustedRate`

deliveryRate         = Σ(pureValue   · weight) / Σ(weight)   over counted commitments
outcomeAdjustedRate  = Σ(adjustedVal · weight) / Σ(weight)   over counted commitments

Both rates use the same counted set (time-aware promised exclusion is identical) and the same significance weights. The only difference is the per-row value: outcomeAdjustedRate uses the table above, deliveryRate uses the pure delivery-1.0.0 mapping.

When no commitment in scope carries an actionable outcomeStatus (everything is undefined or not-applicable), the two rates are identical by construction. The new field activates as commitments are tagged in future PRs.

Decomposability

Every breakdown row reflects the outcome-adjusted (delivery-2.0.0) value, so outcomeAdjustedRate decomposes back to the exact records that produced it. When the adjustment fires, the row records:

outcomeStatusApplied — the outcomeStatus value the scorer read.
note — a short human-readable explanation of what changed and why

(e.g. "outcome-unchanged: value halved per delivery-2.0.0 (absorbed-reform haircut)").

If a row has no outcomeStatusApplied, no adjustment was considered (the commitment had no outcomeStatus). If a row has outcomeStatusApplied but no adjustment note, the outcome status existed but did not change the value (e.g. outcome-improved or not-applicable).

The pure deliveryRate field is computed in a parallel pass over the same source data and is not decomposed in breakdown[]; the breakdown is the audit trail for the new headline (outcomeAdjustedRate). To reproduce deliveryRate from the breakdown, replace each row's adjusted value with the unadjusted mapping for its status (e.g. delivered → 1.0 regardless of outcomeStatusApplied) and re-weight.

Diligence dimension

Per officeholder parliamentary participation. Measures how reliably an officeholder turns up to recorded votes (divisions).

Inputs

divisions.json (every recorded division) joined to member-votes.json (one row per officeholder per division they were present for). Absence is represented by the lack of a member-vote row, not by an explicit record.

Term scoping

A division counts toward an officeholder only if its date falls within one of the officeholder's terms (from/to window, an open to meaning still in office). An officeholder is never penalised for divisions held before they took office or after they left.

Participation value

For each in-term division, the officeholder's vote is one of ta, nil, staon or absent. A vote of ta/nil/staon is participation (value 1); absent (no member-vote row) is value 0.

participationRate = (divisions voted in: ta/nil/staon)
                     / (divisions held during the officeholder's term)

Every in-term division is counted; the score decomposes by division in breakdown[] (a DivisionContribution per division).

Diligence coverage

coverage = (term divisions with a member-vote row of any kind)
           / (all term divisions)

With the current data every member-vote row is a present vote, so coverage equals participationRate; the field is kept distinct so that if upstream ever records explicit absent rows, coverage and participation diverge correctly.

Diligence entities scored

One score (dimension: "diligence") per officeholder who had at least one in-term division. Officeholders whose terms cover no division are not scored.

Mandate-fulfilment dimension

Per body: how well the commitments linked to that body's statutory mandates are being delivered.

Inputs and join

commitments.json → mandateId → mandates.json → bodyId. Only PfG commitments that carry a non-null mandateId resolving to a known mandate are included. The body of the score is the mandate's bodyId, which may differ from the commitment's responsibleBodyId.

Fulfilment value

fulfilmentRate reuses the Delivery status→value mapping and significance weights exactly (see Delivery above), applied only to the body's mandate-linked commitments:

fulfilmentRate = Σ(value · weight) / Σ(weight)   over counted mandate-linked commitments

promised commitments are excluded from the rate (counted: false), consistent with the Delivery dimension.

Mandate-fulfilment coverage

coverage = (mandate-linked commitments backed by ≥1 high/medium-confidence source)
           / (all of the body's mandate-linked commitments)

mandateCount reports how many distinct mandates of the body have at least one linked commitment behind the score.

Mandate-fulfilment entities scored

One score (dimension: "mandate-fulfilment") per body that has at least one mandate-linked commitment.

Fiscal-stewardship dimension

Per body: how closely actual spend tracks the budget that was allocated.

Inputs

spending.json budget votes ({ votes, programmes, subheads }). Each BudgetVote carries a grossAllocation (always present) and, once a fiscal year closes, an outturn (actual spend — usually absent). Scores decompose by budget vote (BudgetContribution per vote).

Variance → stewardship value

For a vote with a published outturn:

variance = |grossAllocation − outturn| / grossAllocation
value    = max(0, 1 − variance / 0.20)

A vote spent exactly to allocation scores 1.0; a vote 20% or more off allocation (over or under) scores 0.0; in between the value falls linearly. The 20% tolerance band is the single tunable parameter of this dimension.

stewardshipRate is the allocation-weighted mean of the values of votes that have an outturn, so larger votes dominate the body's score:

stewardshipRate = Σ(value · allocation) / Σ(allocation)   over votes with an outturn

Honest handling of missing outturn

Most votes have no `outturn` yet. Outturn is never fabricated. A vote with no published outturn is counted: false, contributes value: 0, carries variance: null / outturn: null, and is excluded from stewardshipRate. Instead it lowers coverage, so a body whose budget is mostly unverifiable shows a low coverage rather than a misleadingly confident rate.

Fiscal-stewardship coverage

Coverage is the allocation share of the body's budget that is backed by a published outturn:

coverage = Σ(allocation of votes with an outturn) / Σ(allocation of all votes)

outturnVoteCount reports how many of the body's votes have an outturn. With the current data no outturn is published, so every fiscal-stewardship score has coverage: 0, stewardshipRate: 0 and outturnVoteCount: 0 — an honest "not yet verifiable" signal, not a zero-performance verdict.

Fiscal-stewardship entities scored

One score (dimension: "fiscal-stewardship") per body that has at least one budget vote in spending.json.

Score shape

AccountabilityScore is a discriminated union on dimension. Every member shares a common envelope (entityType, entityId, coverage, methodologyVersion, asOf); each dimension then adds its own metric fields and its own typed breakdown[]:

delivery — deliveryRate, onTrackRate, commitmentCount, ScoreContribution[]
diligence — participationRate, divisionCount, votedCount, DivisionContribution[]
mandate-fulfilment — fulfilmentRate, mandateCount, commitmentCount, ScoreContribution[]
fiscal-stewardship — stewardshipRate, voteCount, outturnVoteCount, BudgetContribution[]

The delivery member is byte-compatible with methodology version delivery-1.0.0, so existing Delivery scores and their consumers are unaffected.

Versioning

methodologyVersion (accountability-2.0.0) is stamped on the dataset and on every score. Bump it whenever any dimension's status/variance/participation mapping, weights, scoping rule or coverage definition changes, so historical scores remain interpretable against the rules that produced them.

Summaries layer

Methodology version: summaries-1.0.0 (independent of the scoring methodology above). Lives in data/accountability/summaries.json, regenerated by pnpm data:summaries. Read at build time and embedded into every prerendered entity page.

What a summary is

A small set of plain-language bullet points (3 to 7) that describe one entity plus three structured impact analyses:

Direct impact: one-hop relationships derived from the entity's own

foreign keys (commitment.responsibleBodyId, officeholder terms[].bodyId, edges on the typed-edge layer, etc.).

Indirect impact: two-hop traversal from each direct target, deduped and

ranked, capped at ten entries to keep the panel readable.

Leverage points: cross-references into the systems layer, surfacing the

Meadows leverage level for every system step the entity participates in.

How bullets are produced

Two backends, depending on entity kind:

Officeholders (1202 records): template-based. Bullets are assembled

deterministically from structured fields (current role, party, level, constituency, civil-service grade, term count, Diligence score, owned commitments). No model is involved.

Bills, parties, bodies, mandates, commitments, divisions: generated from

the source records by a language model run locally during the build (a free-tier OpenRouter model, default google/gemma-4-31b-it:free). This pass runs once on a contributor's machine; the resulting JSON is committed and Vercel deployments read the static file. There is no per-visit model call.

Sourcing rule (anti-invention)

Every bullet must cite at least one Source whose URL appears verbatim on the entity's underlying records. The generator validates this on the way out and the data validator (pnpm data:validate) rechecks it on the way in. A bullet that cites a URL outside the entity's record set fails validation and the dataset is rejected. There is no path by which a fabricated citation can land in production.

Caching

The generator hashes each entity's record plus its direct-impact set. On a re-run, an entity whose hash is unchanged and whose previous summary is at the current methodology version is reused verbatim from the existing summaries.json. Changes to the records trigger a regeneration.

Source of truth

Summaries are derivative. The underlying records are the ground truth. A summary may be incomplete or out of date relative to the records it cites; in all cases, follow the [source] links on each bullet to the primary documents. If a summary contradicts the records, the records win.

Are promises being kept?

Do officeholders show up to vote?

Are bodies fulfilling their statutory mandates?

Does spend track allocation?

Interactive delivery calculator

Six divisions in one term

Commitment → mandate → body

Variance → stewardship value

Same headline number, very different confidence

Accountability scoring methodology

No black box, fully decomposable

Delivery dimension

Status → delivery value

Significance weighting

Time-awareness

Delivery coverage

Derived fields: numeric target and deadline

Delivery entities scored

Delivery 2.0.0 — outcome-aware adjustment

When the adjustment fires

Status × outcomeStatus → value table

Why undefined is treated as "no adjustment", not as outcome-unchanged

Relationship between deliveryRate and outcomeAdjustedRate

Decomposability

Diligence dimension

Inputs

Term scoping

Participation value

Diligence coverage

Diligence entities scored

Mandate-fulfilment dimension

Inputs and join

Fulfilment value

Mandate-fulfilment coverage

Mandate-fulfilment entities scored

Fiscal-stewardship dimension

Inputs

Variance → stewardship value

Honest handling of missing outturn

Fiscal-stewardship coverage

Fiscal-stewardship entities scored

Score shape

Versioning

Summaries layer

What a summary is

How bullets are produced

Sourcing rule (anti-invention)

Caching

Source of truth

Why `undefined` is treated as "no adjustment", not as `outcome-unchanged`

Relationship between `deliveryRate` and `outcomeAdjustedRate`