This webinar strips ChatGPT down to its mathematical foundations. The goal is to show that behind the apparent magic sits statistics, albeit very clever statistics.
Every ML system is a mathematical function mapping input to output. The simplest case is linear regression, y = ax + b, where training means adjusting a and b to fit known data points. That is the entire conceptual core; everything else is scale and architecture.
Neural networks generalize this idea. They chain many simple functions (neurons) together in layers, and they can approximate virtually any mathematical function without anyone specifying its form in advance. The network discovers the mapping from data alone.
ChatGPT is a large language model whose job is to predict the most probable next token given a sequence of prior tokens. It operates over a vocabulary of roughly 32,000 tokens. The key architectural innovation is the attention mechanism, which lets the model weigh which input tokens matter most for each prediction. When the input contains "On a bank you can...", attention resolves whether "bank" means a bench or a financial institution from surrounding context.
The model was trained on scientific papers, books, web pages, and source code. To turn a raw language model into a chatbot, OpenAI added conversational data from Reddit and from dialogues manually written by paid annotators. The choice of which data to include directly shapes the model's behavior: including or excluding certain Reddit communities shifts how the system responds to political and social questions. Analyses show that ChatGPT leans liberal-progressive, reflecting Silicon Valley's prevailing culture.
ChatGPT is a mathematical function. It does not think or feel. It cannot learn during a conversation; every response draws on the original training. Its core operation is text-to-text transformation; features like PDF processing are application-layer additions. Outputs should always be verified, especially on contested topics, because the model reproduces biases encoded in its training data.
Machine learning is based on mathematical functions that transform input into output. The simplest form is linear regression (y = ax + b), where parameters are optimized during training. More complex systems like neural networks use many such functions connected together.
ChatGPT is a large language model that converts words into numerical tokens (about 32,000) and predicts the most probable next token. It uses an attention mechanism to identify and process relevant context information.
ChatGPT was trained on enormous amounts of text, including scientific papers, books, websites, and source code. Additionally, conversational data from sources like Reddit and manually created dialogues by paid workers were used.
ChatGPT cannot 'think' or 'feel' - it is a mathematical function. It cannot learn during interaction, and its responses are based solely on its original training. Its core function is text-to-text transformation.
Data selection during training directly influences the system's behavior. The choice of training data determines the political orientation and values reflected in responses. ChatGPT tends toward liberal-progressive positions, similar to Silicon Valley culture.
The attention mechanism is a key component of modern language models that enables the system to recognize which words in the input are most important for predicting the next word. It allows context-dependent interpretation of words.
ChatGPT uses complex neural networks instead of simple linear functions. It can understand and process context through its attention mechanism and was trained on enormous amounts of data, enabling it to generate human-like text.
ChatGPT's responses are based on training data and reflect its inherent biases. The system cannot distinguish between truth and fiction and should be critically evaluated especially on controversial topics.
.
Copyright 2026 - Joel P. Barmettler