DATA MODEL
Schéma Postgres complet pour politikar. Postgres 15+ requis (cible : Supabase Pro, currently Postgres 15.x). Toute évolution doit passer par une migration numérotée et idempotente dans
packages/db/migrations/.
1. Conventions
- Casse :
snake_casepartout (tables, colonnes, vues, index). - Clés primaires :
id uuid primary key default gen_random_uuid()sauf cas exceptionnel justifié. - FK :
<table_singulier>_id(par exemplepolitician_idréférencepoliticians.id). - Timestamps :
timestamptztoujours, jamaistimestampsans tz. - Tableaux :
text[]pour listes courtes énumérables,jsonbpour structures variables. - Soft delete : colonne
deleted_at timestamptz null. Filtré par défaut dans les vues_public_*. - Audit : colonnes
created_at,updated_atsur chaque table mutable, triggerset_updated_atauto. - Append-only pour
verifications: pas d'UPDATE direct, utilisation desuperseded_by_verification_idpour gérer les révisions. - Multi-pays : champ
country(ISO 3166-1 alpha-2 lowercase, par exemplefr,de,it) surpoliticians,parties,functions,regions. Index surcountrypartout. - Index naming :
ix_<table>_<col1>_<col2>ouux_<table>_<col>pour unique.
2. Extensions requises
create extension if not exists "uuid-ossp";
create extension if not exists "pgcrypto";
create extension if not exists "pg_trgm";
create extension if not exists "vector";
vector provient de pgvector (préinstallé sur Supabase Pro, version >= 0.7).
3. Tables référentiel
3.1 regions
create table regions (
id uuid primary key default gen_random_uuid(),
country text not null check (country ~ '^[a-z]{2}$'),
level text not null check (level in ('country', 'region', 'department', 'city', 'european')),
name text not null,
slug text not null,
insee_code text null,
parent_region_id uuid null references regions(id) on delete set null,
geojson jsonb null,
created_at timestamptz not null default now(),
updated_at timestamptz not null default now(),
deleted_at timestamptz null,
constraint ux_regions_country_slug unique (country, slug)
);
create index ix_regions_country_level on regions (country, level) where deleted_at is null;
create index ix_regions_parent on regions (parent_region_id) where deleted_at is null;
3.2 parties
create table parties (
id uuid primary key default gen_random_uuid(),
country text not null check (country ~ '^[a-z]{2}$'),
name text not null,
short_name text null,
slug text not null,
political_family text null check (
political_family is null or political_family in (
'far_left', 'left', 'green', 'center_left',
'center', 'center_right', 'right', 'far_right',
'regionalist', 'other'
)
),
founded_at date null,
dissolved_at date null,
parent_party_id uuid null references parties(id) on delete set null,
color_hex text null check (color_hex is null or color_hex ~ '^#[0-9a-fA-F]{6}$'),
wikidata_id text null,
created_at timestamptz not null default now(),
updated_at timestamptz not null default now(),
deleted_at timestamptz null,
constraint ux_parties_country_slug unique (country, slug)
);
create index ix_parties_country_family on parties (country, political_family) where deleted_at is null;
create index ix_parties_wikidata on parties (wikidata_id) where wikidata_id is not null;
3.3 functions
create table functions (
id uuid primary key default gen_random_uuid(),
country text not null check (country ~ '^[a-z]{2}$'),
name text not null,
slug text not null,
level text not null check (level in (
'national_executive', 'national_legislative',
'european', 'regional', 'departmental', 'municipal',
'party_leadership', 'judicial'
)),
scope_region_id uuid null references regions(id) on delete set null,
created_at timestamptz not null default now(),
updated_at timestamptz not null default now(),
deleted_at timestamptz null,
constraint ux_functions_country_slug unique (country, slug)
);
create index ix_functions_country_level on functions (country, level) where deleted_at is null;
3.4 politicians
create table politicians (
id uuid primary key default gen_random_uuid(),
country text not null check (country ~ '^[a-z]{2}$'),
full_name text not null,
given_name text null,
family_name text null,
slug text not null,
birth_date date null,
gender text null check (gender is null or gender in ('male', 'female', 'other', 'unspecified')),
current_party_id uuid null references parties(id) on delete set null,
current_function_id uuid null references functions(id) on delete set null,
current_region_id uuid null references regions(id) on delete set null,
party_history jsonb not null default '[]'::jsonb,
function_history jsonb not null default '[]'::jsonb,
photo_url text null,
wikidata_id text null,
official_urls jsonb not null default '{}'::jsonb,
social_handles jsonb not null default '{}'::jsonb,
tier text not null default 'P2' check (tier in ('P0', 'P1', 'P2')),
created_at timestamptz not null default now(),
updated_at timestamptz not null default now(),
deleted_at timestamptz null,
constraint ux_politicians_country_slug unique (country, slug)
);
create index ix_politicians_country_tier on politicians (country, tier) where deleted_at is null;
create index ix_politicians_party on politicians (current_party_id) where deleted_at is null;
create index ix_politicians_function on politicians (current_function_id) where deleted_at is null;
create index ix_politicians_region on politicians (current_region_id) where deleted_at is null;
create index ix_politicians_wikidata on politicians (wikidata_id) where wikidata_id is not null;
create index ix_politicians_name_trgm on politicians using gin (full_name gin_trgm_ops) where deleted_at is null;
tier permet de prioriser : P0 = top 50 figures nationales (cible MVP, revue humaine obligatoire), P1 = parlementaires actifs, P2 = autres.
3.5 mandates
create table mandates (
id uuid primary key default gen_random_uuid(),
politician_id uuid not null references politicians(id) on delete cascade,
function_id uuid not null references functions(id) on delete restrict,
region_id uuid null references regions(id) on delete set null,
party_at_election_id uuid null references parties(id) on delete set null,
start_date date not null,
end_date date null,
is_active boolean generated always as (end_date is null or end_date >= current_date) stored,
election_program_url text null,
election_program_text text null,
created_at timestamptz not null default now(),
updated_at timestamptz not null default now(),
deleted_at timestamptz null,
check (end_date is null or end_date >= start_date)
);
create index ix_mandates_politician_active on mandates (politician_id, is_active) where deleted_at is null;
create index ix_mandates_function on mandates (function_id) where deleted_at is null;
4. Tables ingestion et claims
4.1 sources
create table sources (
id uuid primary key default gen_random_uuid(),
source_type text not null check (source_type in (
'speech', 'interview', 'tv_debate', 'press_conference',
'tweet', 'official_statement', 'parliamentary_question',
'parliamentary_intervention', 'campaign_program',
'mandate_review', 'op_ed', 'video_clip'
)),
politician_id uuid null references politicians(id) on delete set null,
mandate_id uuid null references mandates(id) on delete set null,
occurred_at timestamptz not null,
url text null,
raw_text text null,
raw_text_sha256 text null,
raw_audio_storage_path text null,
raw_video_url text null,
transcript text null,
transcript_quality text null check (transcript_quality is null or transcript_quality in ('low', 'medium', 'high')),
language text not null default 'fr',
channel text null,
source_credibility int not null default 3 check (source_credibility between 1 and 5),
scraped_at timestamptz not null default now(),
ingested_by_run_id text null,
created_at timestamptz not null default now(),
updated_at timestamptz not null default now(),
deleted_at timestamptz null
);
create index ix_sources_politician_date on sources (politician_id, occurred_at desc) where deleted_at is null;
create index ix_sources_type_date on sources (source_type, occurred_at desc) where deleted_at is null;
create unique index ux_sources_sha256 on sources (raw_text_sha256) where raw_text_sha256 is not null and deleted_at is null;
create index ix_sources_run on sources (ingested_by_run_id) where ingested_by_run_id is not null;
raw_text_sha256 calculé à l'ingestion garantit l'idempotence (même contenu = même row). ingested_by_run_id permet de retrouver tout ce qu'un cron a ingéré.
4.2 claims
create table claims (
id uuid primary key default gen_random_uuid(),
source_id uuid not null references sources(id) on delete cascade,
politician_id uuid not null references politicians(id) on delete cascade,
claim_text text not null,
claim_normalized text not null,
claim_type text not null check (claim_type in (
'factual_assertion', 'promise', 'opinion',
'rhetorical', 'prediction', 'normative_statement'
)),
topic_tags text[] not null default '{}',
numeric_values jsonb not null default '[]'::jsonb,
named_entities jsonb not null default '[]'::jsonb,
embedding vector(1024) null,
extraction_confidence numeric(3, 2) not null check (extraction_confidence between 0 and 1),
extracted_at timestamptz not null default now(),
context_window text null,
duplicate_of_claim_id uuid null references claims(id) on delete set null,
prompt_version text not null,
llm_model text not null,
created_at timestamptz not null default now(),
updated_at timestamptz not null default now(),
deleted_at timestamptz null
);
create index ix_claims_politician_type on claims (politician_id, claim_type) where deleted_at is null and duplicate_of_claim_id is null;
create index ix_claims_source on claims (source_id) where deleted_at is null;
create index ix_claims_topic_tags on claims using gin (topic_tags) where deleted_at is null;
create index ix_claims_numeric on claims using gin (numeric_values jsonb_path_ops) where deleted_at is null;
create index ix_claims_extracted_at on claims (extracted_at desc) where deleted_at is null;
create index ix_claims_embedding on claims using ivfflat (embedding vector_cosine_ops) with (lists = 100) where deleted_at is null and embedding is not null;
claim_normalized est la version canonique (ponctuation normalisée, accents standardisés, casse standard) utilisée pour comparaison textuelle exacte. topic_tags sont contraints à un vocabulaire défini en application (économie, sécurité, immigration, santé, éducation, etc.).
4.3 verifications
create table verifications (
id uuid primary key default gen_random_uuid(),
claim_id uuid not null references claims(id) on delete cascade,
verdict text not null check (verdict in (
'true', 'mostly_true', 'mixed', 'mostly_false', 'false',
'unverifiable', 'misleading_context',
'kept', 'partially_kept', 'broken', 'in_progress', 'abandoned', 'too_early'
)),
confidence_score numeric(3, 2) not null check (confidence_score between 0 and 1),
reasoning text not null,
evidence jsonb not null default '[]'::jsonb,
data_sources_used text[] not null default '{}',
llm_model text not null,
prompt_version text not null,
human_reviewed boolean not null default false,
human_reviewer_id uuid null,
human_reviewer_note text null,
superseded_by_verification_id uuid null references verifications(id) on delete set null,
is_published boolean not null default false,
created_at timestamptz not null default now()
);
create index ix_verifications_claim_active on verifications (claim_id) where superseded_by_verification_id is null;
create index ix_verifications_verdict on verifications (verdict) where superseded_by_verification_id is null;
create index ix_verifications_published on verifications (is_published, created_at desc) where superseded_by_verification_id is null;
create index ix_verifications_data_sources on verifications using gin (data_sources_used) where superseded_by_verification_id is null;
Append-only : pas de trigger updated_at, pas d'UPDATE attendu (sauf superseded_by_verification_id qui se renseigne par une UPDATE ciblée). Toute correction crée une nouvelle row qui pointe vers l'ancienne.
evidence structure attendue :
[
{
"source_url": "https://insee.fr/...",
"source_label": "INSEE BDM série T_1234",
"quote": "Taux de chômage 7,1 % au T4 2025",
"weight": 0.9,
"kind": "data_api"
}
]
4.4 claim_review_queue
create table claim_review_queue (
id uuid primary key default gen_random_uuid(),
claim_id uuid not null unique references claims(id) on delete cascade,
queued_for_reason text not null check (queued_for_reason in (
'low_confidence', 'p0_politician', 'sensitive_topic',
'fact_checker_disagreement', 'manual_flag'
)),
status text not null default 'pending' check (status in (
'pending', 'in_review', 'approved', 'rejected', 'corrected'
)),
assigned_to_user_id uuid null,
reviewer_note text null,
created_at timestamptz not null default now(),
updated_at timestamptz not null default now(),
resolved_at timestamptz null
);
create index ix_review_queue_status_created on claim_review_queue (status, created_at) where status in ('pending', 'in_review');
4.5 press_coverage (stub MVP, scrapers post-MVP)
create table press_coverage (
id uuid primary key default gen_random_uuid(),
claim_id uuid not null references claims(id) on delete cascade,
media_outlet text not null,
media_outlet_country text null check (media_outlet_country is null or media_outlet_country ~ '^[a-z]{2}$'),
article_url text not null,
article_title text null,
published_at timestamptz null,
treatment_type text null check (treatment_type is null or treatment_type in (
'verbatim_repeat', 'critical_analysis', 'fact_check',
'ignored', 'amplified', 'contextualized'
)),
framing_score numeric(3, 2) null check (framing_score is null or framing_score between -1 and 1),
reach_estimate bigint null,
scraped_at timestamptz not null default now(),
created_at timestamptz not null default now(),
updated_at timestamptz not null default now(),
deleted_at timestamptz null,
constraint ux_press_coverage_claim_url unique (claim_id, article_url)
);
create index ix_press_coverage_outlet on press_coverage (media_outlet) where deleted_at is null;
create index ix_press_coverage_published on press_coverage (published_at desc) where deleted_at is null;
Table créée dès le MVP pour stabiliser le schéma, mais aucun scraper ne la peuplera avant le post-MVP (cf. ARCHITECTURE module presse).
5. Tables transverses
5.1 prompt_versions
create table prompt_versions (
id text primary key,
purpose text not null check (purpose in (
'claim_extraction', 'claim_classification',
'speaker_identification', 'verification_cascade',
'promise_tracking', 'reasoning_generation'
)),
version text not null,
system_prompt text not null,
tool_schema jsonb null,
llm_model text not null,
notes text null,
created_at timestamptz not null default now(),
retired_at timestamptz null,
constraint ux_prompt_versions_purpose_version unique (purpose, version)
);
id est de la forme claim_extraction@v1.0.0. Référencé par claims.prompt_version et verifications.prompt_version.
5.2 audit_log
create table audit_log (
id bigserial primary key,
occurred_at timestamptz not null default now(),
actor_type text not null check (actor_type in ('user', 'service', 'cron', 'inngest')),
actor_id text null,
action text not null,
entity_type text not null,
entity_id text not null,
payload jsonb not null default '{}'::jsonb,
correlation_id text null
);
create index ix_audit_entity on audit_log (entity_type, entity_id, occurred_at desc);
create index ix_audit_correlation on audit_log (correlation_id) where correlation_id is not null;
create index ix_audit_occurred_at on audit_log (occurred_at desc);
Pas de FK : la table doit survivre à la suppression d'entités.
6. Vues matérialisées d'agrégations
6.1 politician_stats
create materialized view politician_stats as
with verified as (
select
c.politician_id,
v.verdict,
v.confidence_score,
c.claim_type,
c.created_at as claim_created_at
from claims c
join verifications v on v.claim_id = c.id
and v.superseded_by_verification_id is null
and v.is_published = true
where c.deleted_at is null
and c.duplicate_of_claim_id is null
)
select
politician_id,
count(*) filter (where claim_type = 'factual_assertion') as factual_claims_count,
count(*) filter (where claim_type = 'factual_assertion' and verdict = 'true') as true_count,
count(*) filter (where claim_type = 'factual_assertion' and verdict = 'mostly_true') as mostly_true_count,
count(*) filter (where claim_type = 'factual_assertion' and verdict = 'mixed') as mixed_count,
count(*) filter (where claim_type = 'factual_assertion' and verdict = 'mostly_false') as mostly_false_count,
count(*) filter (where claim_type = 'factual_assertion' and verdict = 'false') as false_count,
count(*) filter (where claim_type = 'factual_assertion' and verdict = 'unverifiable') as unverifiable_count,
count(*) filter (where claim_type = 'factual_assertion' and verdict = 'misleading_context') as misleading_count,
count(*) filter (where claim_type = 'promise') as promises_made,
count(*) filter (where claim_type = 'promise' and verdict = 'kept') as promises_kept,
count(*) filter (where claim_type = 'promise' and verdict = 'partially_kept') as promises_partially_kept,
count(*) filter (where claim_type = 'promise' and verdict = 'broken') as promises_broken,
avg(confidence_score) filter (where claim_type = 'factual_assertion') as avg_confidence_factual,
now() as computed_at
from verified
group by politician_id;
create unique index ux_politician_stats_id on politician_stats (politician_id);
create index ix_politician_stats_factual_count on politician_stats (factual_claims_count desc);
6.2 party_stats, function_stats, region_stats
Mêmes patterns, agrégations par parties.id, functions.id, regions.id via les FK courantes des politicians. Définitions complètes en migration 004_aggregations.sql.
6.3 Refresh
create or replace function refresh_all_stats() returns void language plpgsql as $$
begin
refresh materialized view concurrently politician_stats;
refresh materialized view concurrently party_stats;
refresh materialized view concurrently function_stats;
refresh materialized view concurrently region_stats;
end;
$$;
Appelée par un cron Inngest horaire. Les calculs de truthfulness_score (cf. SCORING.md) ne sont pas dans la vue matérialisée mais dans une table politician_truth_scores calculée au même moment côté Python (formules complexes, bootstrap CI). Schéma :
create table politician_truth_scores (
politician_id uuid primary key references politicians(id) on delete cascade,
truth_score numeric(5, 2) null,
factual_score numeric(5, 2) null,
promise_score numeric(5, 2) null,
transparency_score numeric(5, 2) null,
ci_low numeric(5, 2) null,
ci_high numeric(5, 2) null,
topic_breakdown jsonb not null default '{}'::jsonb,
is_significant boolean not null default false,
computed_at timestamptz not null default now()
);
create index ix_truth_scores_significant_score on politician_truth_scores (is_significant, truth_score desc);
7. Vues publiques (_public_*)
Vues filtrées exposées en SELECT à anon pour la consommation par l'API publique Next.js.
create view _public_politicians as
select
id, country, full_name, slug, given_name, family_name,
birth_date, gender, current_party_id, current_function_id,
current_region_id, photo_url, wikidata_id, tier
from politicians
where deleted_at is null;
create view _public_claims as
select
c.id, c.politician_id, c.source_id, c.claim_text, c.claim_type,
c.topic_tags, c.numeric_values, c.extracted_at, c.created_at
from claims c
join verifications v on v.claim_id = c.id
and v.superseded_by_verification_id is null
and v.is_published = true
where c.deleted_at is null
and c.duplicate_of_claim_id is null;
create view _public_verifications as
select
v.id, v.claim_id, v.verdict, v.confidence_score,
v.reasoning, v.evidence, v.data_sources_used,
v.human_reviewed, v.created_at
from verifications v
where v.superseded_by_verification_id is null
and v.is_published = true;
Ces vues sont les seuls endpoints SELECT autorisés à anon via RLS. Les tables brutes restent accessibles uniquement au service role.
8. RLS policies
alter table politicians enable row level security;
alter table parties enable row level security;
alter table functions enable row level security;
alter table regions enable row level security;
alter table mandates enable row level security;
alter table sources enable row level security;
alter table claims enable row level security;
alter table verifications enable row level security;
alter table press_coverage enable row level security;
alter table claim_review_queue enable row level security;
alter table prompt_versions enable row level security;
alter table audit_log enable row level security;
alter table politician_truth_scores enable row level security;
create policy svc_full_access on politicians
for all to service_role using (true) with check (true);
-- répété pour chaque table
create policy admin_full_access on politicians
for all to authenticated
using (auth.jwt() ->> 'app_role' = 'admin')
with check (auth.jwt() ->> 'app_role' = 'admin');
-- répété
anon n'a aucune policy directe sur les tables. Lecture publique = uniquement via les vues _public_* (les vues n'ont pas RLS, c'est leur définition qui filtre).
9. Triggers utilitaires
create or replace function set_updated_at() returns trigger language plpgsql as $$
begin
new.updated_at := now();
return new;
end;
$$;
create trigger trg_politicians_updated_at before update on politicians
for each row execute function set_updated_at();
-- répété pour toutes les tables avec updated_at
10. Migrations
Numérotation NNN_description.sql. Idempotentes (if not exists). Règle : aucune migration ne supprime ni ne renomme de colonne sans migration suivante (NNN_drop_*) après backfill validé.
Plan initial :
| Migration | Contenu |
|---|---|
001_init.sql | Extensions, fonctions utilitaires, tables référentiel (regions, parties, functions, politicians, mandates) |
002_sources_claims.sql | Tables sources, claims, verifications, claim_review_queue, press_coverage |
003_transverse.sql | prompt_versions, audit_log, politician_truth_scores |
004_aggregations.sql | Vues matérialisées *_stats, fonction refresh_all_stats, indexes uniques |
005_public_views.sql | Vues _public_* |
006_rls.sql | RLS policies sur toutes les tables |
007_seed_meta.sql | Seed minimal des prompt_versions initiaux et functions standards France |
Outillage : Supabase CLI (supabase migration new, supabase db reset en local Postgres 15 dans Docker).
11. Tests d'invariants
Suite pytest côté Python qui frappe une DB Supabase locale et vérifie :
- Insertion d'un
claimsansverificationsn'apparaît pas dans_public_claims. superseded_by_verification_idrend la verification précédente invisible des vues_public_verifications.duplicate_of_claim_idexclut le claim despolitician_stats.- Soft delete d'un
politicianretire ses claims des vues publiques. - RLS bloque toute lecture
anonsur tables brutes (test via client Supabase anon key). refresh_all_stats()est idempotent (deux appels successifs donnent le même état).- Insertion d'un même
raw_text_sha256deux fois lève l'unique constraint.
12. Estimation volumétrique
| Entité | Volume MVP (an 1) | Volume cible (an 3) |
|---|---|---|
politicians | 50 | 2 000 |
parties | 30 | 100 |
regions | 200 | 5 000 |
mandates | 100 | 5 000 |
sources | 5 000 | 200 000 |
claims | 20 000 | 1 000 000 |
verifications | 25 000 | 1 500 000 |
audit_log | 100 000 | 10 000 000 |
L'index IVFFlat lists=100 reste pertinent jusqu'à ~100k claims ; au-delà, basculer en HNSW (mention en RISKS technique).
13. Questions ouvertes pour relecture
- Vocabulaire
topic_tags: faut-il une tabletopicsséparée pour normaliser, ou rester entext[]et appliquer un Pydantic enum côté application ? Position actuelle : enum côté app, plus simple, on migre si besoin. - Gestion des élections : prévoir une table
electionsou rester implicite viamandates? Position : différé en post-MVP. - Versioning des entités politiciens (changement de nom, transition) : table
politician_aliasesà prévoir ? Probablement utile dès Phase 1. - Politique géolocalisation des claims : si un politicien parle au Sénat vs sur sa circonscription, on capture où dans
sources.channel. Suffisant ? - Index
pg_trgmsurclaim_normalized: utile pour recherche texte côté admin. À ajouter en Phase 3.