The problems I keep coming back to share a thread: what does it take to make AI systems reliable enough to deploy, and what happens when they are?
Language models are fluent but factually unstable. They produce plausible text without reliable access to the knowledge that would make it true. My master's thesis addressed this from the structured side: ConceptFormer injects knowledge graph topology directly into the LLM embedding space, bypassing the lossy text serialization that retrieval-augmented generation typically requires. A single concept vector improves factual recall by 213%. Preserving graph structure in latent space outperforms converting it to text and uses 130x fewer tokens.
At bbv, I work the unstructured side of the same problem. The Swiss AI Hub processes enterprise documents through retrieval pipelines (chunking, embedding, indexing, retrieval, reranking) to give models access to organizational knowledge they weren't trained on. The engineering is different. The question is the same: how do you give a model access to knowledge it can use faithfully?
Structured and unstructured knowledge grounding are usually treated as separate fields. I think they converge. A production system will need graph-native representations for facts with clear relational structure and retrieval pipelines for everything else. Where that boundary sits, and how to move between the two, is what I want to research further.
→ ConceptFormer paper · Political orientation study
Most enterprise AI deployments depend on a handful of API providers. The model, the infrastructure, and the data pipeline all live behind someone else's authentication wall. For organizations handling sensitive data or operating in regulated industries, this is a strategic vulnerability.
The Swiss AI Hub is how I've approached this at bbv. It's an open-source platform built on open-weight models that organizations deploy in their own infrastructure. LiteLLM for model routing, Milvus for vector search, LlamaIndex for retrieval, full observability stack. No SaaS subscriptions. One command provisions the entire system.
Sovereignty is only real if you can run capable models on hardware you control. This is why I care about quantization, small specialized models, and parameter-efficient fine-tuning. BitNets reducing weights to single bits. LoRA adapters that fine-tune a fraction of the parameters. GRPO applied to low-rank matrices to train reasoning into compact models. These techniques determine whether "run it yourself" is a realistic option or a talking point. The Swiss AI Hub depends on this work, and the research community keeps delivering.
→ More on bbv and the Swiss AI Hub
A language model on its own is a text completion engine. What makes it useful inside an organization is the scaffolding around it: tool access, memory, planning, orchestration, access controls. This is the agentic layer, and it's where theoretical capabilities meet operational reality.
Building this layer is a large part of my work at bbv. An agent that can draft a document, query a database, send an email, and decide which of those a task requires faces challenges beyond model intelligence. They're about system reliability. Can it handle authentication? Does it fail gracefully? Can you trace why it made a specific decision? Can you audit it?
Tools like Claude Code show what happens when this scaffolding works. A model trained to understand code becomes genuinely productive because someone built the infrastructure that lets it read files, run tests, and iterate on its output. The gap between "the model can do this in principle" and "this works in your organization" is almost entirely an infrastructure and integration problem. That's the problem I find most worth solving.
→ Podcast on AI agents · Netzwoche article on enterprise agents
My study on political orientation in language models tested 21 models against Swiss SmartVote voting recommendations. Most cluster around progressive, socially liberal positions, with agreement rates between 60% and 90% depending on the party. This isn't a normative claim about what models should believe. It's an empirical observation about what they've absorbed from training data, and it matters because these systems increasingly mediate how people access information.
The podcast covers adjacent territory. I've recorded episodes on AI in military applications and the base-rate math that makes "90% accurate" target selection catastrophic, on predictive policing feedback loops that encode the biases they claim to eliminate, on what AI does to education when institutions respond with prohibition instead of adaptation, on the labor market effects that follow when knowledge work becomes automatable.
If I'm going to spend my career making AI systems more capable and more deployable, I want to understand what that means for the people who live with them.
Knowledge grounding addresses how language models access knowledge faithfully. ConceptFormer injects knowledge graph topology into LLM embedding space, improving factual recall by 213% while using 130x fewer tokens than traditional retrieval-augmented generation. Production systems need both graph-native representations for structured facts and retrieval pipelines for unstructured knowledge.
AI sovereignty means organizations can run capable AI systems on infrastructure they control, avoiding strategic dependency on external API providers. The Swiss AI Hub demonstrates this through open-source platforms built on open-weight models deployed in organizational infrastructure, using techniques like quantization and parameter-efficient fine-tuning to make self-hosted models viable.
The agentic layer is the scaffolding around language models that makes them useful inside organizations: tool access, memory, planning, orchestration, and access controls. Real challenges aren't about model intelligence but system reliability: authentication, graceful failure, decision traceability, and auditability. The gap between theoretical capabilities and production deployment is primarily an infrastructure problem.
Testing 21 language models against Swiss SmartVote voting recommendations revealed most cluster around progressive, socially liberal positions with 60-90% agreement rates depending on the party. This empirical observation about what models absorb from training data matters because these systems increasingly mediate information access.
ConceptFormer is a method from Joel's master's thesis that grounds language models in knowledge graphs by injecting graph topology directly into LLM embedding space, bypassing lossy text serialization. A single concept vector improves factual recall by 213%, and preserving graph structure in latent space outperforms text conversion while using 130x fewer tokens. Published at WWW'26 GLOW workshop.
.
Copyright 2026 - Joel P. Barmettler