Joel P. Barmettler

AI Architect & Researcher

< Back
2023·Webinar

ChatGPT demystified

Show video

This webinar strips ChatGPT down to its mathematical foundations. The goal is to show that behind the apparent magic sits statistics, albeit very clever statistics.

Machine learning fundamentals

Every ML system is a mathematical function mapping input to output. The simplest case is linear regression, y = ax + b, where training means adjusting a and b to fit known data points. That is the entire conceptual core; everything else is scale and architecture.

From ML to AI

Neural networks generalize this idea. They chain many simple functions (neurons) together in layers, and they can approximate virtually any mathematical function without anyone specifying its form in advance. The network discovers the mapping from data alone.

Large language model architecture

ChatGPT is a large language model whose job is to predict the most probable next token given a sequence of prior tokens. It operates over a vocabulary of roughly 32,000 tokens. The key architectural innovation is the attention mechanism, which lets the model weigh which input tokens matter most for each prediction. When the input contains "On a bank you can...", attention resolves whether "bank" means a bench or a financial institution from surrounding context.

Training and data sources

The model was trained on scientific papers, books, web pages, and source code. To turn a raw language model into a chatbot, OpenAI added conversational data from Reddit and from dialogues manually written by paid annotators. The choice of which data to include directly shapes the model's behavior: including or excluding certain Reddit communities shifts how the system responds to political and social questions. Analyses show that ChatGPT leans liberal-progressive, reflecting Silicon Valley's prevailing culture.

Practical takeaways

ChatGPT is a mathematical function. It does not think or feel. It cannot learn during a conversation; every response draws on the original training. Its core operation is text-to-text transformation; features like PDF processing are application-layer additions. Outputs should always be verified, especially on contested topics, because the model reproduces biases encoded in its training data.

What is the fundamental mechanism behind machine learning?

Machine learning is based on mathematical functions that transform input into output. The simplest form is linear regression (y = ax + b), where parameters are optimized during training. More complex systems like neural networks use many such functions connected together.

How does ChatGPT work technically?

ChatGPT is a large language model that converts words into numerical tokens (about 32,000) and predicts the most probable next token. It uses an attention mechanism to identify and process relevant context information.

How was ChatGPT trained?

ChatGPT was trained on enormous amounts of text, including scientific papers, books, websites, and source code. Additionally, conversational data from sources like Reddit and manually created dialogues by paid workers were used.

What are the key limitations of ChatGPT?

ChatGPT cannot 'think' or 'feel' - it is a mathematical function. It cannot learn during interaction, and its responses are based solely on its original training. Its core function is text-to-text transformation.

How does training data influence ChatGPT's responses?

Data selection during training directly influences the system's behavior. The choice of training data determines the political orientation and values reflected in responses. ChatGPT tends toward liberal-progressive positions, similar to Silicon Valley culture.

What is the attention mechanism?

The attention mechanism is a key component of modern language models that enables the system to recognize which words in the input are most important for predicting the next word. It allows context-dependent interpretation of words.

How does ChatGPT differ from simple machine learning models?

ChatGPT uses complex neural networks instead of simple linear functions. It can understand and process context through its attention mechanism and was trained on enormous amounts of data, enabling it to generate human-like text.

Why should one critically evaluate ChatGPT's outputs?

ChatGPT's responses are based on training data and reflect its inherent biases. The system cannot distinguish between truth and fiction and should be critically evaluated especially on controversial topics.


< Back

.

Copyright 2026 - Joel P. Barmettler