LLM Cheatsheet: Essential Concepts and Architectures

LLM Cheatsheet: Essential Concepts

Everything you need to recall about Large Language Models (LLMs) — summarized on one page.


What is a Large Language Model (LLM)?

A Large Language Model (LLM) is a deep learning model trained on vast amounts of text data to understand, generate, and manipulate human language.

  • Examples: GPT-4, Claude, LLaMA, Gemini, Mistral, Falcon
  • Core Technology: The Transformer Architecture

LLM Core Components

ComponentDescription
TokenizerConverts text into tokens (subwords, words) for numerical input.
EmbeddingMaps tokens to vectors in high-dimensional space.
TransformerUses attention mechanisms to understand context and sequence relationships.
DecoderGenerates the next tokens based on context (e.g., in GPT models).

Key Concepts in LLM Architecture

TermMeaning
AttentionFocuses on relevant parts of the input (via learned weights).
Self-AttentionMechanism where each word attends to every other word in the sequence.
Positional EncodingAdds order information to input tokens (since Transformers are permutation-invariant).
ParametersWeights learned during training (e.g., GPT-3: 175B).
Context WindowMaximum number of tokens the model can process simultaneously (e.g., GPT-4: ~128k tokens).

How Large Language Models Work (Simplified)

  1. Input text is tokenized.
  2. Tokens are converted to embeddings.
  3. Embeddings are passed through the multi-layer Transformer block.
  4. The model predicts the next token (autoregressive generation).
  5. The process repeats until a stop condition or maximum length is reached.

Common LLM Architectures

  • Decoder-only: Used primarily for generation (e.g., GPT, LLaMA).
  • Encoder-only: Used for understanding tasks like classification or search (e.g., BERT).
  • Encoder-Decoder: Used for sequence-to-sequence tasks like translation or summarization (e.g., T5, BART).

LLM Training Phases

PhaseGoal
PretrainingUnsupervised learning on massive text datasets (web, books, code).
FinetuningTask-specific tuning using labeled data (e.g., summarization, Q&A).
RLHF (Reinforcement Learning with Human Feedback)Improves model behavior and alignment based on human preferences.

Important Generation Parameters

HyperparameterRole
TemperatureControls randomness (0 = deterministic/precise, 1 = creative/diverse).
Top-KThe model selects the next token only from the top K most probable tokens.
Top-P (Nucleus Sampling)The model selects the next token from the smallest set of tokens whose cumulative probability exceeds P%.
Max TokensSets the limit on the response length generated by the model.
Stop TokensSpecific tokens that, when generated, immediately end the sequence.

Prompt Engineering Techniques

  • Zero-shot Prompting: Direct task instruction without providing any examples.
  • Few-shot Prompting: Providing the task instruction along with several input/output examples.
  • Chain-of-Thought (CoT): Instructing the model to reason step-by-step before answering.
  • ReAct: Combining Reasoning (CoT) and Action (using external tools).
  • System Prompt: Setting the initial behavior, persona, and constraints for the model.
  • Temperature: Lower temperature yields precise results; higher temperature yields creative results.

LLM Customization Methods: Fine-tuning, RAG, Prompting

MethodPrimary Use Case
PromptingUsing the base model effectively via clever input design.
Fine-tuningTraining new model weights on specific, task-oriented data.
RAG (Retrieval-Augmented Generation)Combining the LLM with external, up-to-date knowledge sources (e.g., documents, databases) via search.

Understanding Memory in LLMs

  • Stateless: The model has no memory of previous chat turns outside the current prompt.
  • Contextual Memory: Past conversation history is included within the current prompt (limited by context window size).
  • Long-term Memory: Achieved via custom extensions or tools that store and retrieve past interactions (e.g., vector databases).

Key Limitations of LLMs

  • Hallucinations: Generating confident but factually incorrect information.
  • Lack of real-world awareness (unless external tools or RAG are utilized).
  • Can reflect biases present in the training data.
  • Costly to train and slower performance with longer context windows.

LLM Evaluation Metrics

MetricDescription
PerplexityMeasures how well the model predicts the next token (lower is better).
BLEU/ROUGEUsed to measure the similarity of generated text to reference text (common in translation/summarization).
Accuracy/F1 ScoreStandard metrics used for classification and structured prediction tasks.
Human EvaluationActual human quality rating based on relevance, coherence, and helpfulness.

Practical LLM Use Cases

  • Chatbots and Writing Assistants
  • Code Generation (e.g., using models like Codex or DeepSeek)
  • Document Summarization and Information Extraction
  • Search and Retrieval Systems
  • Autonomous Agents and Workflow Automation
  • Data Analysis, SQL Generation, and Mathematical Reasoning

LLM Ecosystem: Popular Tools and Frameworks

Tool/LibraryPrimary Use Case
HuggingFacePlatform for LLM hosting, sharing, and training.
LangChainFramework for connecting LLMs with tools, memory, and data sources.
LLamaIndexSpecialized framework for Retrieval-Augmented Generation (RAG).
OpenAI APIProgrammatic access to models like GPT-3.5 and GPT-4.
Transformers (HF)Python library for using pretrained models in PyTorch, TensorFlow, or JAX.
Gradio/StreamlitTools for quickly building interactive LLM user interfaces (UIs).

LLM Agents: Combining LLMs with Tools

Agents utilize LLMs to make complex decisions, plan workflows, and take actions in dynamic environments.

  • Agents use external tools (e.g., calculators, web search APIs).
  • They follow structured plans (e.g., ReAct, Tree-of-Thought (ToT)).
  • They can chain multiple steps, utilize memory, and incorporate feedback loops.

Quick Recap Mnemonic: TAP-TOP-RAG

A simple way to remember key LLM concepts:

  • Tokenizer → Attention → Prompts
  • Temperature → Output control → Plans (ReAct)
  • Retrieval → Agents → Generation