Question 1

Do large language models have a political bias?

Accepted Answer

On abstract political questionnaires, yes: 66 LLMs from 27 model families converge on a center-left position, replicating prior research. However, on concrete policy decisions (Swiss federal referenda), the bias shifts to centrist and status-quo-favoring, suggesting the established leftward bias is instrument-dependent.

Question 2

What is the dual-instrument methodology?

Accepted Answer

The study uses two independent instruments grounded in Swiss democratic reality: (1) the Smartvote questionnaire with 75 abstract policy questions administered to 66 LLMs and compared to 184 Swiss parliamentarians, and (2) 48 real federal referenda (Volksabstimmungen) presented to 9 flagship LLMs in four national languages under three information conditions, compared to actual outcomes and party recommendations.

Question 3

Does the language of a political question change the LLM's answer?

Accepted Answer

Dramatically for some models. Cross-linguistic consistency ranges from 98% (GPT-5.4) to 50% (Mistral). Mistral's approval rate swings from 17% in German to 82% in Romansh. These shifts do not track the actual Swiss linguistic voting divide (Röstigraben) but reflect model-internal language processing instabilities.

Question 4

What is the gradient flip finding?

Accepted Answer

On the abstract Smartvote questionnaire, all models show highest agreement with left-wing parties (SP, Grüne) and lowest with right-wing SVP. On concrete referenda, this gradient flips: models agree most with centrist Die Mitte and FDP, not with SP and Grüne. The Wilcoxon signed-rank test confirms this is systematic (p = 0.008).

Question 5

Do LLMs exhibit change-aversion on referenda?

Accepted Answer

Two models (Grok and Mistral) vote Nein on 83-94% of referenda regardless of whether the proposal is progressive or conservative, suggesting systematic change-aversion rather than political ideology.

Question 6

How well do LLMs predict the popular vote?

Accepted Answer

Alignment varies dramatically: GPT-5.4 matches 97.9% of referendum outcomes while Grok matches only 60.4%. A temporal analysis splitting pre- and post-release referenda found no significant drop in alignment, arguing against pure training data memorization.

Joel P. Barmettler

AI Architect & Researcher

The Invisible Coalition Partner: How LLMs Vote When Democracy Gets Concrete

Dual-instrument design

Key findings