The Only LLM Cheat Sheet You’ll Need (June 2025 Edition)

Jun 25, 2025

How I choose between GPT-4o / GPT-4.1, Claude 4, Gemini 2.5, Copilot & the “classic” models

People keep asking, “Which model should I enable for my team?”
The answer flips depending on two axes:

Is the task code-heavy or not?
How much context (tokens, tooling, modality) do you actually need?

Below I walk through today’s main line-ups — including Gemini and GitHub Copilot — then finish with a copy-pastable cheat-sheet you can pin to Slack or Notion.

The Modern Coding Stack

Claude Opus 4
Anthropic’s flagship (May-2025). Two-hundred-thousand-token window, a plan → generate → run → fix loop baked in, and roughly 72 % on SWE-Bench Verified when tool-use is allowed. The “maximum reasoning, maximum bill” option — pull it out for bugs that span ten layers of abstraction.
Claude Sonnet 4
Same 200 k context but cheaper. My default for repo-wide refactors, bulk scaffolding, and long migrations where sheer breadth beats depth.
GPT-4.1
OpenAI’s April-2025 release. One-million-token window and 54 % on SWE-Bench Verified with the stock agent. Best OpenAI pick when “merge a CI-green PR” is the definition of done.
Gemini 2.5 Pro
Two-million-token context — the current record — and about 64 % on SWE-Bench with Google’s reference agent. Ideal for monorepos or data rooms that dwarf even 4.1’s window.
GPT-4o
One-hundred-twenty-eight-thousand tokens, near-real-time latency, full multimodal I/O (text + vision + voice). Feels like a senior engineer who types at 400 WPM.
Llama 3 / Code Llama (open source)
≈67 % on HumanEval and self-hostable. If your lawyers frown on cloud APIs, spin this up behind the firewall.

Two “meta” layers you should know

Gemini Code Assist
180 k free completions / month on the individual tier — enough for weekend hacks or student projects. blog.google devops.com
GitHub Copilot
It’s not one model: Copilot routes calls to GPT-4.1, Sonnet 4 or Gemini depending on prompt + cost, then adds its own IDE & PR automation. The new “Coding Agent” can spin up a VM, run tests, and open a PR by itself. github.blog theverge.com

Everyday Knowledge-work & Multimodal

GPT-4o — Real-time voice/vision chat; feels like FaceTime with an expert. openai.com
Gemini 2.5 Pro — Drop a 1 GB PDF or a thousand-page contract and ask “Summarise every compliance risk.”blog.google
Claude Sonnet 4 — Most steerable tone; my default for strategy docs or investor memos. anthropic.com
Perplexity AI — Web-search-first assistant with inline citations — great for fact checking.
Llama 3 local — Offline summarisation and translation when data can’t leave the subnet.

Model Lineage at a Glance

OpenAI — GPT-3.5 → GPT-4 (32 k) → o3 → GPT-4o (128 k, multimodal) → GPT-4.1 (1 M).
Anthropic — Claude Instant → Claude Sonnet 4 → Claude Opus 4 (200 k, highest agentic score).
Google DeepMind — Gemini 1.x → Gemini 1.5 Pro → Gemini 2.5 Pro (2 M tokens).
Meta OSS — Code Llama → Llama 3 (70 B).
GitHub Copilot — A service layer that orchestrates several of the above and now ships a hands-free “Coding Agent”.

Cheat-Sheet — “Use X When…”

🟢 Tiny bug or regex
→ GPT-4o (fastest chat)

🟢 Repo-wide refactor (hundreds of files)
→ Claude Sonnet 4 (200 k context)

🟢 Nasty multi-step bug, need max reasoning
→ Claude Opus 4 (~72 % SWE-Bench)

🟢 CI-gated, real-world bug fix
→ GPT-4.1 (best OpenAI pass rate)

🟢 Ultra-long spec (500 k+ tokens) or data room
→ Gemini 2.5 Pro (2 M context)

🟢 Live voice / screen-share Q&A
→ GPT-4o (sub-second multimodal)

🟢 Free weekend hack
→ Gemini Code Assist free tier

🟢 Air-gapped server or strict NDA
→ Self-host Llama 3 / Code Llama

🟢 “Open a branch, run tests, push PR for me”
→ GitHub Copilot Coding Agent

Final take

Proto-builders & refactor ninjas → Claude Sonnet 4
Spin up an MVP, wipe out boilerplate, or refactor hundreds of files in one prompt.
Bug-hunters on nightmare tickets → Claude Opus 4
Its deeper reasoning and “plan → run → fix” loop catch multi-layer, cross-repo defects.

For everything else, match the model to context length, latency, cost, and data policy — then switch the moment the leaderboard flips.

Roast Your Reality - Spice Up Your Journey, Savor Your Life

Discussion about this post