LLM Cheatsheet: Essential Concepts and Architectures

Posted on Nov 18, 2025 in Mathematics and Computer Science

LLM Cheatsheet: Essential Concepts

Everything you need to recall about Large Language Models (LLMs) — summarized on one page.

What is a Large Language Model (LLM)?

A Large Language Model (LLM) is a deep learning model trained on vast amounts of text data to understand, generate, and manipulate human language.

Examples: GPT-4, Claude, LLaMA, Gemini, Mistral, Falcon
Core Technology: The Transformer Architecture

LLM Core Components

Component	Description
Tokenizer	Converts text into tokens (subwords, words) for numerical input.
Embedding	Maps tokens to vectors in high-dimensional space.
Transformer	Uses attention mechanisms to understand context and sequence relationships.
Decoder	Generates the next tokens based on context (e.g., in GPT models).

Key Concepts in LLM Architecture

Term	Meaning
Attention	Focuses on relevant parts of the input (via learned weights).
Self-Attention	Mechanism where each word attends to every other word in the sequence.
Positional Encoding	Adds order information to input tokens (since Transformers are permutation-invariant).
Parameters	Weights learned during training (e.g., GPT-3: 175B).
Context Window	Maximum number of tokens the model can process simultaneously (e.g., GPT-4: ~128k tokens).

How Large Language Models Work (Simplified)

Input text is tokenized.
Tokens are converted to embeddings.
Embeddings are passed through the multi-layer Transformer block.
The model predicts the next token (autoregressive generation).
The process repeats until a stop condition or maximum length is reached.

Common LLM Architectures

Decoder-only: Used primarily for generation (e.g., GPT, LLaMA).
Encoder-only: Used for understanding tasks like classification or search (e.g., BERT).
Encoder-Decoder: Used for sequence-to-sequence tasks like translation or summarization (e.g., T5, BART).

LLM Training Phases

Phase	Goal
Pretraining	Unsupervised learning on massive text datasets (web, books, code).
Finetuning	Task-specific tuning using labeled data (e.g., summarization, Q&A).
RLHF (Reinforcement Learning with Human Feedback)	Improves model behavior and alignment based on human preferences.

Important Generation Parameters

Hyperparameter	Role
Temperature	Controls randomness (0 = deterministic/precise, 1 = creative/diverse).
Top-K	The model selects the next token only from the top K most probable tokens.
Top-P (Nucleus Sampling)	The model selects the next token from the smallest set of tokens whose cumulative probability exceeds P%.
Max Tokens	Sets the limit on the response length generated by the model.
Stop Tokens	Specific tokens that, when generated, immediately end the sequence.

Prompt Engineering Techniques

Zero-shot Prompting: Direct task instruction without providing any examples.
Few-shot Prompting: Providing the task instruction along with several input/output examples.
Chain-of-Thought (CoT): Instructing the model to reason step-by-step before answering.
ReAct: Combining Reasoning (CoT) and Action (using external tools).
System Prompt: Setting the initial behavior, persona, and constraints for the model.
Temperature: Lower temperature yields precise results; higher temperature yields creative results.

LLM Customization Methods: Fine-tuning, RAG, Prompting

Method	Primary Use Case
Prompting	Using the base model effectively via clever input design.
Fine-tuning	Training new model weights on specific, task-oriented data.
RAG (Retrieval-Augmented Generation)	Combining the LLM with external, up-to-date knowledge sources (e.g., documents, databases) via search.

Understanding Memory in LLMs

Stateless: The model has no memory of previous chat turns outside the current prompt.
Contextual Memory: Past conversation history is included within the current prompt (limited by context window size).
Long-term Memory: Achieved via custom extensions or tools that store and retrieve past interactions (e.g., vector databases).

Key Limitations of LLMs

Hallucinations: Generating confident but factually incorrect information.
Lack of real-world awareness (unless external tools or RAG are utilized).
Can reflect biases present in the training data.
Costly to train and slower performance with longer context windows.

LLM Evaluation Metrics

Metric	Description
Perplexity	Measures how well the model predicts the next token (lower is better).
BLEU/ROUGE	Used to measure the similarity of generated text to reference text (common in translation/summarization).
Accuracy/F1 Score	Standard metrics used for classification and structured prediction tasks.
Human Evaluation	Actual human quality rating based on relevance, coherence, and helpfulness.

Practical LLM Use Cases

Chatbots and Writing Assistants
Code Generation (e.g., using models like Codex or DeepSeek)
Document Summarization and Information Extraction
Search and Retrieval Systems
Autonomous Agents and Workflow Automation
Data Analysis, SQL Generation, and Mathematical Reasoning

LLM Ecosystem: Popular Tools and Frameworks

Tool/Library	Primary Use Case
HuggingFace	Platform for LLM hosting, sharing, and training.
LangChain	Framework for connecting LLMs with tools, memory, and data sources.
LLamaIndex	Specialized framework for Retrieval-Augmented Generation (RAG).
OpenAI API	Programmatic access to models like GPT-3.5 and GPT-4.
Transformers (HF)	Python library for using pretrained models in PyTorch, TensorFlow, or JAX.
Gradio/Streamlit	Tools for quickly building interactive LLM user interfaces (UIs).

LLM Agents: Combining LLMs with Tools

Agents utilize LLMs to make complex decisions, plan workflows, and take actions in dynamic environments.

Agents use external tools (e.g., calculators, web search APIs).
They follow structured plans (e.g., ReAct, Tree-of-Thought (ToT)).
They can chain multiple steps, utilize memory, and incorporate feedback loops.

Quick Recap Mnemonic: TAP-TOP-RAG

A simple way to remember key LLM concepts:

Tokenizer → Attention → Prompts
Temperature → Output control → Plans (ReAct)
Retrieval → Agents → Generation

LLM Cheatsheet: Essential Concepts and Architectures

LLM Cheatsheet: Essential Concepts

What is a Large Language Model (LLM)?

LLM Core Components

Key Concepts in LLM Architecture

How Large Language Models Work (Simplified)

Common LLM Architectures

LLM Training Phases

Important Generation Parameters

Prompt Engineering Techniques

LLM Customization Methods: Fine-tuning, RAG, Prompting

Understanding Memory in LLMs

Key Limitations of LLMs

LLM Evaluation Metrics

Practical LLM Use Cases

LLM Ecosystem: Popular Tools and Frameworks

LLM Agents: Combining LLMs with Tools

Quick Recap Mnemonic: TAP-TOP-RAG

Recent Notes

Subjects

Publicidad