I developed ConceptFormer, a neuro-symbolic approach grounding large language models in structured knowledge graphs without architectural modifications or textual linearization. Published at the GLOW workshop at WWW'26, the system addresses trustworthiness and factuality challenges in retrieval augmented generation by preserving knowledge graph topology in latent space.
ConceptFormer operates in the LLM embedding vector space, creating concept vectors that encapsulate the topological structure of knowledge graph nodes from the Web of Data (Wikidata). I trained the system in conjunction with a frozen LLM (GPT-2), generating a comprehensive lookup table mapping KG nodes to concept vectors. This approach avoids lossy linearization and context saturation inherent in graph textification methods.
Experiments demonstrate that injecting concept vectors into GPT-2 0.1B increases factual recall ability (Hit@10) by up to 272% on Wikipedia sentences and 348% on synthetic sentences. Even single concept vector injection achieves 213% improvement over baseline, significantly outperforming RAG with graph textification while reducing token consumption by 130x. This demonstrates that preserving topological structure in latent space surpasses textual linearization for factual grounding.
Published at: GLOW Workshop (Graph-enhanced LLMs for trustwOrthy Web data management), The ACM Web Conference (WWW'26)
Authors: Joel Barmettler (University of Zurich), Abraham Bernstein (University of Zurich), Luca Rossetto (Dublin City University)
ConceptFormer is a neuro-symbolic approach to ground LLMs in structured knowledge graphs from the Web of Data without altering their internal structure or relying on textual input. It operates in the LLM embedding space, creating and injecting concept vectors that encapsulate KG topological structure directly.
ConceptFormer achieves up to 272% improvement in factual recall (Hit@10) on Wikipedia sentences and 348% on synthetic sentences when adding concept vectors to GPT-2 0.1B. Even a single concept vector injection improves recall by 213%, significantly outperforming RAG with graph textification.
Unlike RAG methods that textify knowledge graphs resulting in lossy linearization and context saturation, ConceptFormer preserves topological structure in latent space. This approach is more effective for factuality than textual linearization while reducing token consumption by 130x.
Concept vectors are embeddings that encapsulate the topological structure of knowledge graph nodes directly in the LLM embedding vector space. They are generated by ConceptFormer, trained in conjunction with a frozen LLM, and mapped to KG nodes through a comprehensive lookup table.
No, ConceptFormer does not alter the internal structure of pre-trained language models. It works with frozen LLMs and operates entirely in the embedding vector space, making it compatible with existing models without architectural modifications.
ConceptFormer grounds LLMs in structured knowledge from the Web of Data, specifically leveraging knowledge graphs like Wikidata. This provides access to massive structured world knowledge while maintaining graph topology.
ConceptFormer was published at the GLOW (Graph-enhanced LLMs for trustwOrthy Web data management) workshop held as part of The ACM Web Conference (WWW'26) under the title 'ConceptFormer: Towards Graph-Native Grounding of Large Language Models via Latent Concept Injection'.
.
Copyright 2026 - Joel P. Barmettler