Joel P. Barmettler

AI Architect & Researcher

< Back
2024·Master's ThesisGrade 6.0Published at WWW'26

ConceptFormer: Graph-native grounding of LLMs via latent concept injection

I developed ConceptFormer, a neuro-symbolic approach grounding large language models in structured knowledge graphs without architectural modifications or textual linearization. Published at the GLOW workshop at WWW'26, the system addresses trustworthiness and factuality challenges in retrieval augmented generation by preserving knowledge graph topology in latent space.

Architecture and concept vectors

ConceptFormer operates in the LLM embedding vector space, creating concept vectors that encapsulate the topological structure of knowledge graph nodes from the Web of Data (Wikidata). I trained the system in conjunction with a frozen LLM (GPT-2), generating a comprehensive lookup table mapping KG nodes to concept vectors. This approach avoids lossy linearization and context saturation inherent in graph textification methods.

Performance and efficiency

Experiments demonstrate that injecting concept vectors into GPT-2 0.1B increases factual recall ability (Hit@10) by up to 272% on Wikipedia sentences and 348% on synthetic sentences. Even single concept vector injection achieves 213% improvement over baseline, significantly outperforming RAG with graph textification while reducing token consumption by 130x. This demonstrates that preserving topological structure in latent space surpasses textual linearization for factual grounding.

Published at: GLOW Workshop (Graph-enhanced LLMs for trustwOrthy Web data management), The ACM Web Conference (WWW'26)

Authors: Joel Barmettler (University of Zurich), Abraham Bernstein (University of Zurich), Luca Rossetto (Dublin City University)

What is ConceptFormer?

ConceptFormer is a neuro-symbolic approach to ground LLMs in structured knowledge graphs from the Web of Data without altering their internal structure or relying on textual input. It operates in the LLM embedding space, creating and injecting concept vectors that encapsulate KG topological structure directly.

How does ConceptFormer improve factual recall?

ConceptFormer achieves up to 272% improvement in factual recall (Hit@10) on Wikipedia sentences and 348% on synthetic sentences when adding concept vectors to GPT-2 0.1B. Even a single concept vector injection improves recall by 213%, significantly outperforming RAG with graph textification.

How does ConceptFormer differ from traditional RAG?

Unlike RAG methods that textify knowledge graphs resulting in lossy linearization and context saturation, ConceptFormer preserves topological structure in latent space. This approach is more effective for factuality than textual linearization while reducing token consumption by 130x.

What are concept vectors?

Concept vectors are embeddings that encapsulate the topological structure of knowledge graph nodes directly in the LLM embedding vector space. They are generated by ConceptFormer, trained in conjunction with a frozen LLM, and mapped to KG nodes through a comprehensive lookup table.

Does ConceptFormer modify the LLM architecture?

No, ConceptFormer does not alter the internal structure of pre-trained language models. It works with frozen LLMs and operates entirely in the embedding vector space, making it compatible with existing models without architectural modifications.

What knowledge sources does ConceptFormer use?

ConceptFormer grounds LLMs in structured knowledge from the Web of Data, specifically leveraging knowledge graphs like Wikidata. This provides access to massive structured world knowledge while maintaining graph topology.

Where was ConceptFormer published?

ConceptFormer was published at the GLOW (Graph-enhanced LLMs for trustwOrthy Web data management) workshop held as part of The ACM Web Conference (WWW'26) under the title 'ConceptFormer: Towards Graph-Native Grounding of Large Language Models via Latent Concept Injection'.


< Back

.

Copyright 2026 - Joel P. Barmettler