Methodology — Parliament

This page documents every formula, source, and editorial choice on the /parliament surface. Read it before you cite us. If you find an error, mail felix.sargent@gmail.com — we’ll publish a correction with date, diff, and impact.

What FPTP distortion is

Under First Past the Post, the candidate with the largest pile of votes in a constituency wins the seat — no majority required. Aggregated across hundreds of constituencies, that rule routinely converts a party’s national vote share into a seat share that looks nothing like it. The same mechanism is the subject of the council audit at /councils/methodology; differences between the two surfaces are domain-specific, not methodological. The voting method is the subject of analysis on every page here — never any individual candidate.

Parliamentary scope & coverage

The parliamentary surface currently covers the 2024 UK general election. Coverage will expand as we ingest earlier elections; each year ships as an independent data artefact under src/lib/data/parliament/<year>/ so older years can be added without touching the published numbers for newer ones (see “Reproducibility” below).

Every contest is a UK Parliament constituency election. The parliamentary audit is structurally distinct from the council audit: constituencies are single-member, the electorate is national, and boundary reviews redraw the map every few elections. Comparisons across boundary sets are flagged with the boundary-comparability-limited caveat (see §Data caveats).

Winning vote share

For a single-member constituency contest, the winning vote share is:

winning share = winning candidate’s votes ÷ total valid votes cast in the contest

Winning shares below 50% indicate that the elected candidate took less than a majority of the votes that were cast. Under FPTP this is unremarkable mechanically — the rules only require a plurality — but it’s the headline number on every audit page because it’s the most direct measure of how often the system seats a candidate whom most voters did not back.

Gallagher disproportionality index

The Gallagher index (sometimes called the least-squares index) is the academic standard for measuring how far an election’s seat shares diverged from its vote shares. It is computed nationally over every party that contested the election:

Gallagher = √( ½ × Σ (vi − si)2 ) × 100

where vi is party i’s share of valid national votes (as a fraction) and si is its share of total seats. The result scales 0–100; 0 is perfectly proportional, higher is more distorted. Numbers above 15 are unusual in established democracies.

Worked example

A toy three-party election awards 10 seats. Party A takes 50% of votes and 8 seats (80%); Party B takes 30% of votes and 2 seats (20%); Party C takes 20% of votes and 0 seats (0%). The squared differences are (0.50 − 0.80)2 = 0.09, (0.30 − 0.20)2 = 0.01, (0.20 − 0.00)2 = 0.04. Sum = 0.14; half of that = 0.07; √0.07 ≈ 0.265. Multiplied by 100, Gallagher ≈ 26.5. The 2024 UK general election scored 23.7 by this measure — close to the toy example, and one of the highest values on record for a UK general election.

Minority-winner count

The minority-winner count is the number of constituencies in which the elected candidate’s winning share was strictly below 50%. Single-member contests only; multi-member historical contests are excluded.

The denominator is total seats audited. The count is precomputed by the ETL so the same value appears on every page that cites it (overview, year audit, methodology). For the 2024 general election the count was 554 of 649 seats.

Data caveats & tokens

Every contest and candidate row carries a caveats[] field (possibly empty, never null). The tokens are stable strings so downstream pages, exports, and machine consumers can branch on them without parsing prose. The canonical list:

TokenMeaningEffect on headline metrics
uncontestedSingle candidate on the ballot. No valid runner-up share to report.Excluded from the minority-winner count and the low-winning-share leaderboard.
speakerThe Speaker of the House of Commons stood as “Speaker seeking re-election,” not under a party label. Major parties traditionally do not contest the seat.Counted in seat totals; excluded from the disproportionality index party-by-party calculation as a Speaker-bucket row.
missing-turnoutSource dataset does not report electorate or valid-votes for this contest, so turnout cannot be computed.The contest still appears with its candidate-level result; turnout-derived numbers render as “Not reported.”
multi-member-historicalPre-1950 multi-member constituency — voters had multiple votes and there were multiple winners per contest. Modern single-member metrics do not generalise cleanly.Excluded from single-member-only metrics (minority-winner count, low-winning-share leaderboard).
boundary-comparability-limitedThe constituency’s geography changed in the most recent boundary review. A like-named seat in an earlier election may cover a different area.Headline numbers are unaffected; the caveat surfaces in the UI to discourage casual time-series comparison.
source-discrepancySource data contains an internal inconsistency (e.g. candidate vote total doesn’t reconcile with declared turnout). The ETL takes the source row as-is and tags it.The row remains in the dataset; downstream consumers can filter out as needed. Headline metrics include it.

Source lineage & attribution

The 2024 general-election ingest is built from the House of Commons Library general election results 2024 (CBP-10009) dataset, published under the Open Parliament Licence v3.0. The raw source file lives under source-data/parliament/2024/ in the repository; the ETL never mutates it.

Source attribution travels with every published artefact — the per-year manifest.json embeds source name, URL, licence, retrieval date, publication date, ETL run timestamp, and ETL version. Per-page footers cite the same manifest.

Reproducibility

Every published number is reproducible from the source files in this repository:

  1. Clone the repo and check out the version tag that produced the page you’re citing (the manifest etlVersion identifies it).
  2. Re-run the parliament ETL: bun run refresh-data:parliament. The script reads source-data/parliament/<year>/ and writes src/lib/data/parliament/<year>/ — same files as shipped.
  3. Diff the regenerated JSON against the committed JSON. Identical output is the reproducibility guarantee.

For analyst workflows (pandas, R, Excel) the same numbers ship as CSV exports under /data/parliament/<year>/; column mappings are documented at /parliament/data.

CSV column mapping

CSV downloads under /parliament/data use snake_case headers — the analyst convention in pandas, R, and Excel — distinct from the camelCase JSON served to the SvelteKit loaders. The translation is mechanical: every JSON partyDisplayName becomes a CSV party_display_name column, isWinner becomes is_winner (rendered as 1 / 0, matching the council CSV convention), caveats[] becomes a semicolon-joined string in a caveats column.

Three CSV files per ingested year:

Full column reference (every column, every type, every source field) is in docs/parliament-schema.md in the repository — the same document the ETL author and UI loaders both reference.

Errata & corrections

When a published parliamentary metric needs correcting we document it here, in this section, with:

No parliamentary errata to date. Errors are inevitable — this section is here so corrections are first-class, not back-of- page footnotes.