📖 Reference

AI Glossary —
Key Terms Explained

50+ essential AI and machine learning terms, explained simply. Whether you're just starting out or brushing up before a job interview — this is your reference.

📘 50+ terms defined 🔍 Instant search ⚡ Beginner-friendly 🔗 Cross-referenced
No terms found Try a different keyword — or check the AI Guide for concept explanations.

— Terms Starting With A

AI Agent AGENT
An autonomous AI system that perceives its environment, makes decisions, and takes actions to achieve specific goals without constant human oversight. Examples range from customer service bots to fully autonomous coding assistants. Agents often combine planning, memory, and tool use.
Artificial Intelligence AI
The broad field of computer science dedicated to building systems that can perform tasks typically requiring human intelligence — including learning, reasoning, problem-solving, language understanding, and perception. Encompasses ML, deep learning, NLP, and computer vision.
Attention Mechanism
A technique that lets AI models focus on the most relevant parts of input data when generating outputs. Like highlighting key sentences in a textbook — the model learns which words or tokens matter most for the task. The backbone of modern transformer models.
Autoregressive Model
A model that generates sequences one step at a time, using all previously generated tokens as context for each next token. GPT-style language models are autoregressive — they generate text left-to-right, one word at a time.

— Terms Starting With B

Backpropagation
The core algorithm for training neural networks. After each prediction, the error is calculated and propagated backward through the network — adjusting each weight in proportion to how much it contributed to the mistake. Repeated thousands of times until the model improves.
BERT BERT
Bidirectional Encoder Representations from Transformers. A language model by Google that reads text in both directions simultaneously to better grasp context. Powers Google Search's ability to understand nuanced queries. Great for classification and Q&A tasks.
Bias (ML Bias)
Systematic errors or unfair skews in AI model outputs, usually stemming from biased training data or flawed design choices. A hiring model trained on historical data may disadvantage certain demographics if past hiring was biased. Detecting and mitigating bias is a core AI ethics challenge.
Benchmark
A standardized test used to evaluate and compare AI model performance. Common benchmarks include MMLU (reasoning), HumanEval (coding), and HELM (language tasks). Benchmarks let researchers measure progress objectively, though models can be "over-optimized" for specific benchmarks.

— Terms Starting With C

Chain-of-Thought CoT
A prompting technique that encourages AI to "think step by step" before answering. Instead of jumping straight to a conclusion, the model reasons through intermediate steps — significantly improving accuracy on math problems, logic puzzles, and complex questions.
Classification
A supervised learning task where the model assigns input data to predefined categories. Binary classification: spam vs. not spam. Multi-class: identify a handwritten digit (0–9). Real examples: sentiment analysis, image labeling, disease diagnosis.
CNN CNN
Convolutional Neural Network. A deep learning architecture designed for grid-like data (images, video). CNNs apply filters that detect edges, textures, and shapes across the image — then combine detections into higher-level features. Foundation of most computer vision systems.
Computer Vision CV
An AI field enabling computers to understand visual data — images, video, live camera feeds. Key tasks include object detection, face recognition, image segmentation, and optical character recognition (OCR). Powers self-driving cars, medical imaging, and security systems.
Context Window
The maximum number of tokens (words/characters) a model can "see" in a single prompt or conversation. Larger context windows allow models to reference earlier parts of long documents or conversations. GPT-4 Turbo supports ~128k tokens; Claude supports up to 200k.
Clustering
An unsupervised learning technique that groups similar data points together without predefined labels. K-means is a classic algorithm. Use cases: customer segmentation, anomaly detection, document grouping, and gene expression analysis.

— Terms Starting With D

Deep Learning DL
A subset of machine learning using multi-layered neural networks to learn complex patterns from large datasets. The "deep" refers to many hidden layers. Drives modern AI capabilities including speech recognition, translation, image generation, and large language models.
Diffusion Model
A generative model that creates images by learning to reverse a noise-adding process. Training: gradually add noise to images until pure static. Generation: start from noise and iteratively denoise. Powers Stable Diffusion, DALL-E 3, and Midjourney.
Dropout
A regularization technique where random neurons are temporarily "switched off" during training. Forces the network to learn more robust, distributed representations instead of relying on specific neurons. Reduces overfitting significantly.
Data Augmentation
Artificially expanding a training dataset by creating modified versions of existing data (e.g., flipping or rotating images, paraphrasing sentences). Helps models generalize better without requiring more labeled data — especially valuable when real data is scarce or expensive.

— Terms Starting With E

Embedding
A dense numerical representation (vector) of data — words, images, users, or documents — that captures semantic meaning. Words with similar meanings have similar embeddings. Embeddings power search, recommendations, and RAG systems. OpenAI's text-embedding-3 is a widely used embedding model.
Epoch
One complete pass through the entire training dataset. Training typically requires multiple epochs. More epochs = more learning, but too many can cause overfitting. Early stopping monitors validation loss to halt training at the optimal point.
AI Ethics
The field examining moral principles in AI design, development, and deployment — covering fairness, transparency, privacy, accountability, and societal impact. Key questions: Who is harmed by biased outputs? Who is accountable when AI makes a mistake? How do we maintain human oversight?
Encoder-Decoder
An architecture where an encoder compresses input into a fixed representation, and a decoder generates output from that representation. Used in translation (input sentence → encoded meaning → translated sentence), image captioning, and summarization.

— Terms Starting With F

Few-Shot Learning
The ability to learn a new task from just a few examples. In LLMs, this means providing 2–5 example input-output pairs in the prompt so the model understands the desired format or task. Dramatically reduces the need for expensive fine-tuning.
Fine-Tuning
Continuing to train a pre-trained model on a smaller, domain-specific dataset to specialize it for a particular task. Fine-tuning GPT-3 on medical data creates a medical assistant. Far more efficient than training from scratch. LoRA is a popular efficient fine-tuning technique.
Foundation Model
A large AI model trained on broad data at massive scale, designed to be adapted for many downstream tasks. GPT-4, Claude, Gemini, and DALL-E are foundation models. The term was coined by Stanford's HAI in 2021. Foundation models have fundamentally shifted how AI is developed.

— Terms Starting With G

Generative AI
AI systems that create new, original content — text, images, audio, video, or code. Unlike discriminative AI that classifies or predicts, generative AI produces novel outputs. Examples: ChatGPT (text), Midjourney (images), Suno (music), GitHub Copilot (code).
GPT GPT
Generative Pre-trained Transformer. OpenAI's family of large language models. Pre-trained on vast internet text, then fine-tuned with human feedback (RLHF). GPT-3.5 powers basic ChatGPT; GPT-4 is the flagship. "Pre-trained" means it learned general knowledge before specializing.
Gradient Descent
The optimization algorithm that trains most neural networks. Imagine descending a mountain in fog — you take steps in the direction that slopes most steeply downward. Each "step" is a weight update proportional to the gradient of the loss function. Adam and SGD are popular variants.
GAN GAN
Generative Adversarial Network. Two networks compete: a Generator creates fake images, a Discriminator tries to distinguish real from fake. Training makes both better. GANs produced the first "photorealistic" AI faces (StyleGAN). Largely superseded by diffusion models for image generation.

— Terms Starting With H

Hallucination
When an AI model generates confident, plausible-sounding but factually incorrect information. A model might fabricate a citation, invent a historical event, or state incorrect statistics. RAG reduces hallucination by grounding responses in retrieved documents. A core limitation of current LLMs.
Hyperparameter
Configuration values set before training that control how a model learns — not learned from data. Key hyperparameters: learning rate (how big each update step is), batch size (how many examples per step), and number of layers. Hyperparameter tuning is a major part of ML engineering.
Human Feedback (RLHF)
Reinforcement Learning from Human Feedback. After pre-training, human raters rank model outputs; a reward model learns these preferences; the language model is fine-tuned to maximize rewards. RLHF made ChatGPT dramatically more helpful and less harmful than base GPT-3.

— Terms Starting With I

Inference
Using a trained model to make predictions on new data. Training is expensive and done once; inference happens in production, continuously. Inference optimization (quantization, batching, caching) is critical for cost and latency in deployed AI products.
In-Context Learning
A capability of large language models to learn from examples provided directly in the prompt — without updating model weights. Provide a few input-output examples and the model infers the pattern. Enables rapid prototyping of new AI tasks without fine-tuning.

— Terms Starting With L

Large Language Model LLM
A neural network with billions or trillions of parameters, trained on massive text corpora to understand and generate human language. Examples: GPT-4, Claude 3, Gemini, Llama. LLMs can answer questions, write code, summarize documents, and perform tasks they were never explicitly trained on.
Learning Rate
A hyperparameter controlling how much model weights change in each training step. Too high: training is unstable and overshoots. Too low: training is glacially slow or gets stuck. Learning rate schedules (warmup → decay) help find the sweet spot during training.
Loss Function
A mathematical function measuring how wrong a model's predictions are. The lower the loss, the better. Training minimizes the loss function via gradient descent. Cross-entropy loss is common for classification; MSE for regression. The loss function defines what "correct" means for your model.
LoRA LoRA
Low-Rank Adaptation. An efficient fine-tuning technique that adds small trainable matrices to frozen model weights rather than updating the entire model. Reduces memory and compute requirements by 10–100x while achieving comparable results. The dominant method for fine-tuning LLMs on consumer hardware.

— Terms Starting With M

Machine Learning ML
A subfield of AI where systems learn from data to make predictions or decisions without being explicitly programmed for each scenario. The model finds patterns in training examples and generalizes to new inputs. Three main types: supervised, unsupervised, and reinforcement learning.
Model (AI Model)
The mathematical representation resulting from training — a set of learned parameters (weights) that transform inputs into outputs. "Deploying a model" means making these learned parameters available for inference in production. Model size is often measured in number of parameters.
Multimodal AI
AI systems that process and understand multiple input types simultaneously — text, images, audio, and video. GPT-4o and Gemini Ultra are multimodal — you can describe an image, ask about a photo of a dish, or have a voice conversation. Multimodality is the direction all major AI labs are moving.

— Terms Starting With N

NLP NLP
Natural Language Processing. The branch of AI enabling computers to understand, interpret, and generate human language. Subtasks include tokenization, parsing, named entity recognition, sentiment analysis, translation, and question answering. Modern NLP is dominated by transformer-based LLMs.
Neural Network
A computing system loosely inspired by the brain, composed of interconnected nodes (neurons) organized in layers. Each neuron applies a weighted sum + activation function. Layers progressively learn more abstract features. Deep networks (many layers) = deep learning. The foundation of modern AI.

— Terms Starting With O

Overfitting
When a model learns the training data too precisely — including noise and irrelevant details — and fails to generalize to new examples. Analogy: memorizing answers vs. understanding concepts. Fix: more data, dropout, regularization, or early stopping.
Open-Source AI
AI models whose weights and architecture are publicly released. Meta's Llama 2/3, Mistral, and Falcon are major open-source LLMs. Open models can be run locally, fine-tuned privately, and deployed without API costs. Trade-off vs. proprietary models: often slightly less capable but much more flexible.

— Terms Starting With P

Parameter
A learned numerical weight inside a neural network, adjusted during training. A model with "70 billion parameters" has 70B such values. More parameters → more capacity to model complex patterns, but also higher compute and memory requirements. Efficient models aim for maximum capability per parameter.
Prompt
The text input given to an AI model to guide its output. A prompt can be a question, instruction, few-shot examples, or structured template. The quality of the prompt directly determines the quality of the output — "garbage in, garbage out" applies here. Prompting is a learnable skill.
Prompt Engineering
The practice of crafting and optimizing prompts to reliably get desired outputs from AI models. Techniques include: role-play ("You are an expert..."), chain-of-thought reasoning, few-shot examples, output formatting instructions, and temperature control. A core skill for working with LLMs effectively.
Pre-training
The initial phase of training large models on massive, general datasets (billions of web pages, books, code). Pre-training gives models broad knowledge before task-specific fine-tuning. It's extremely expensive — GPT-4 pre-training reportedly cost ~$100M. Most practitioners fine-tune pre-trained models rather than training from scratch.

— Terms Starting With Q

Quantization
Compressing model size by reducing the precision of weights (e.g., from 32-bit floats to 4-bit integers). 4-bit quantized models run at 8x less memory with minimal quality loss. Enables running large models on consumer GPUs. GGUF format (used by llama.cpp) relies heavily on quantization.
Q-Learning
A foundational reinforcement learning algorithm where an agent learns action values (Q-values) for each state-action pair. Used in game-playing AI, robotics, and recommendation systems. Deep Q-Networks (DQN) combine Q-learning with neural networks — how DeepMind's Atari-playing AI worked.

— Terms Starting With R

RAG RAG
Retrieval-Augmented Generation. Enhances LLM responses by first retrieving relevant documents from a knowledge base, then generating a response grounded in those documents. Reduces hallucination and enables LLMs to access up-to-date or private information. Built with vector databases and embedding models.
Regression
A supervised learning task predicting continuous numerical outputs. Predict house price from features, forecast tomorrow's temperature, estimate customer lifetime value. Linear regression is the simplest form; neural networks handle non-linear relationships. Distinct from classification (categories vs. numbers).
Reinforcement Learning RL
Learning through interaction with an environment and receiving reward signals. No labeled data — the agent discovers what works by trial and error. How AlphaGo mastered Go, how ChatGPT was aligned with RLHF, and how robots learn to walk. Formalized as a Markov Decision Process (MDP).
RNN RNN
Recurrent Neural Network. Designed for sequential data where order matters — processes inputs one step at a time, maintaining a hidden state. LSTM and GRU are improved variants solving the vanishing gradient problem. Largely replaced by transformers for language tasks, but still used in time series and audio.

— Terms Starting With S

Supervised Learning
Training on labeled input-output pairs where the correct answer is known. The model learns to map inputs to outputs by studying examples. Most commercial ML is supervised: image classifiers, spam filters, recommendation systems. Contrast with unsupervised learning (no labels) and RL (reward signals).
Sentiment Analysis
An NLP task determining the emotional tone of text — positive, negative, neutral, or fine-grained emotions. Used to analyze customer reviews, monitor brand reputation, and gauge public opinion on social media. Pre-transformer sentiment models used LSTM/BERT; now GPT-based models handle nuanced sentiment.
System Prompt
A special instruction given to an LLM before the user conversation begins, setting its persona, behavior rules, and constraints. "You are a helpful customer service agent for Acme Corp. Only answer questions about our products." System prompts shape how the model behaves throughout the entire interaction.

— Terms Starting With T

Temperature
A parameter controlling output randomness. Temperature 0: always picks the most likely next token (deterministic, factual). Temperature 1: sampling from the distribution (creative, varied). Temperature >1: more random/chaotic. Set low for code/facts; higher for creative writing.
Token
The basic unit of text an LLM processes. Roughly 1 token ≈ ¾ of a word in English. "Tokenization" is not equivalent to "word": "running" might be 1 token, "tokenization" might be 2–3. Model pricing, context limits, and latency are all measured in tokens. ~$0.01–$0.03 per 1,000 tokens for GPT-4.
Training Data
The examples used to teach a model. Data quality matters more than quantity for most tasks. Common issues: distribution shift (training data doesn't reflect real-world), label noise, imbalanced classes. "Data-centric AI" emphasizes improving data over tuning model architecture.
Transfer Learning
Adapting a pre-trained model to a new domain or task using less data and compute than training from scratch. The model "transfers" knowledge learned during pre-training. The dominant paradigm in modern AI — almost no one trains from scratch anymore. LoRA is the most efficient transfer learning technique for LLMs.
Transformer
The neural network architecture introduced in "Attention Is All You Need" (Vaswani et al., 2017) that revolutionized AI. Replaces sequential processing with parallel self-attention, enabling models to process entire sequences simultaneously. Powers GPT, BERT, Claude, Gemini, and essentially all modern LLMs.

— Terms Starting With U

Underfitting
When a model is too simple to capture patterns in the data — high error on both training and test sets. Opposite of overfitting. Causes: model too small, not enough training, too much regularization. Fix: use a larger model, train longer, or reduce regularization.
Unsupervised Learning
Learning patterns in data without labeled examples. The model discovers structure on its own. Key tasks: clustering (grouping similar items), dimensionality reduction (PCA, UMAP), density estimation, and anomaly detection. Autoencoders and GANs are deep unsupervised learning architectures.

— Terms Starting With V

Vector Database
A database optimized for storing and querying high-dimensional embedding vectors. Instead of exact keyword matching, vector DBs find semantically similar items by computing cosine similarity between vectors. Powers RAG systems, semantic search, and recommendation engines. Popular options: Pinecone, Weaviate, Qdrant, pgvector.

— Terms Starting With Z

Zero-Shot Learning
Performing a task from pure instruction, with no example inputs or outputs in the prompt. Modern LLMs can often classify, translate, or reason about topics they weren't explicitly fine-tuned for. Contrast: few-shot provides 2–5 examples; zero-shot uses none. Demonstrates genuine generalization capability.

How well do you know these terms?

Take our free AI Skills Assessment quiz and find out exactly where you stand — and what to learn next.

Take Free Quiz → Read the AI Guide →

Continue Learning