Real bug-fix benchmarks matter, but this category mixes agent harness results, model-only results, and vendor-reported scores. This pass keeps the scoreboard because it is useful, but it now treats the numbers as directional rather than perfectly comparable.
★ = strongest currently published signal in this page’s scope. † = Terminal-Bench 2.0 rather than SWE-bench Verified. ‡ = vendor/model claim rather than a directly comparable agent harness result. When a product does not publish a rigorous public benchmark, the page weights workflow fit, packaging, deployment model, and current product maturity more heavily.
Entries stay in the original structure, but copy now separates hard evidence from category judgment more clearly. Use the controls above to narrow by workflow instead of reading top-to-bottom.
These tools compete with each other, not with Claude Code or Cursor. They serve non-developers, rapid prototypers, and designers. Judged on: zero-to-deployed speed, UX, and target audience. Not benchmarked against SWE-Bench.
This matrix is intentionally coarse. It is for narrowing candidates, not settling edge-case procurement decisions. Product pages should still be checked before buying or standardizing.
| Tool (Company) | Category | SWE-Bench | Free | Async | Multi-IDE | Terminal | MCP | Multi-Agent | Context | Tier |
|---|---|---|---|---|---|---|---|---|---|---|
| Claude Code (Anthropic) | Terminal/CLI | 80.84%official agent result | No free tier | Sync-first | IDE + app | ✓ | ✓ | ✓ | Up to 1M | S |
| Cline (OSS) | Extension / OSS | 80.8%vendor-reported | ✓ BYOK | Sync-first | VS Code + JB + CLI | via CLI / tools | ✓ | depends on setup | Model-dependent | A |
| Codex (OpenAI) | CLI / app / web | 77.3%Terminal-Bench 2.0 | Included on plans | Some async features | editor + app | ✓ | partial | ✓ | Model-dependent | A |
| Cursor (Anysphere) | AI IDE | N/A public | Hobby free | Mostly sync | VS Code family | Not terminal-first | ✓ | ✓ | Project-scoped | A |
| Windsurf (Cognition) | AI IDE | 40.08%vendor model result | Free tier | Mostly sync | ✓ broad IDE support | Not terminal-first | ✓ | ✓ | Persistent project memory | A |
| GitHub Copilot (GitHub) | Extension | N/A public | ✓ Free tier | some agent flows | ✓ broad IDE support | CLI support | ✓ | basic-to-mid | Standard project scope | B |
| Jules (Google Labs) | Cloud / Async | N/A public | Beta free limits | ✓ async-first | GitHub-centered | CLI/API adj. | N/A focus | parallel queues | Task scoped | B |
| Gemini CLI (Google) | Terminal/CLI | Gemini 3: 76.2%model result context | ✓ generous entry | Sync-first | CLI + Cloud Shell | ✓ | ✓ | tool-dependent | 1M | B |
| Devin (Cognition) | Cloud / Autonomous | N/A public | Paid usage | ✓ | Cloud only | sandbox terminal | Not core story | ✓ | Repo/task scoped | B |
| Aider (OSS) | Terminal/CLI | N/A public | ✓ BYOK | Sync-first | terminal only | ✓ | tooling-dependent | No native multi-agent | Model-dependent | B |
| OpenCode (OSS) | Terminal/CLI | N/A public | ✓ BYOK | Sync-first | terminal + managed path | ✓ | varies | emerging | Model-dependent | B |
| Antigravity (Google) | Preview IDE | Gemini 3 context onlynot product score | ✓ preview | evolving | preview surface | not terminal-first | evolving | unknown | Large-model context | B |
| Zed (Zed) | AI IDE | N/A public | ✓ free editor | Sync-first | cross-platform editor | not terminal-first | partial | No | Project/editor scoped | B |
| Augment Code | Extension / enterprise | N/A public | Paid plans | Mostly sync | VS Code + JetBrains | CLI + review flows | ✓ | task/orchestration | Context Engine | B |
| Tabnine | Extension / enterprise | N/A public | Enterprise-oriented | Mostly sync | broad IDE support | No | No | No | Enterprise context | C* |
| Continue (OSS) | Extension / CLI | N/A public | ✓ self-host/open | depends on setup | VS Code + JetBrains | CLI available | No native MCP focus | policy/check based | Model/self-hosted | B |
| Amazon Q Developer (AWS) | Extension | N/A public | ✓ free tier | Mostly sync | limited | not terminal-first | No | No | AWS-native context | C* |
| Junie (JetBrains) | JetBrains-native | N/A public | via AI plans | Mostly sync | JetBrains native | not terminal-first | partial | evolving | IDE/project scoped | C |
| Kilo Code (OSS) | Extension / CLI | N/A public | ✓ OSS/BYOK | Mostly sync | VS Code + JB + CLI | CLI | varies | evolving | Model-dependent | C |
| — BROWSER / VIBE BUILDER TIER — separate comparison axis — | ||||||||||
| Bolt.new (StackBlitz) | Browser builder | N/A | ✓ free tier | Interactive | browser only | browser terminal | No | No | Prompt/app scoped | V |
| Lovable | Browser builder | N/A | ✓ free tier | Interactive | browser only | No | No | No | Prompt/app scoped | V |
| Replit | Browser IDE | N/A | ✓ free tier | long-running cloud tasks | browser only | browser terminal | No | platform features | Workspace scoped | V |
| v0 (Vercel) | Browser/UI | N/A | ✓ free tier | Interactive | browser only | No | No | No | UI/component scoped | V |
| Google Opal (Google) | Browser mini-apps | N/A | ✓ preview | Interactive | browser only | No | No | agent steps emerging | Mini-app scoped | V |
C* = can rise dramatically for compliance-driven orgs. This table is deliberately lossy: it compresses packaging, deployment, and maturity into one view so you can narrow choices fast.
These picks were rewritten to emphasize actual deployment fit instead of absolutist winners. Several categories now reflect tied or conditional outcomes more honestly.
Claude Code still wins the technical-heavyweight lane. The strongest reason is the simplest one: it has the clearest top-end public agent benchmark signal in this page, and its product architecture still aligns best with difficult, ambiguous, multi-file work.
The more honest conclusion is that the market no longer has a single universal winner. Cursor and Windsurf can be better daily editors. Jules can be better for async backlog throughput. Codex can be the best value if you already live inside ChatGPT. Gemini CLI is the best free terminal on-ramp. Tabnine can be the right answer when deployment policy dominates raw capability.
So the right takeaway is stack design, not tool worship. Pick one deep reasoning agent, one editor surface you actually enjoy living in, and one async or browser layer only if your workflow benefits from it.
If you must choose one flagship recommendation for hard engineering work, it is still Claude Code. The revised page simply makes the tradeoffs clearer, and it strips away a few market-share and ranking claims that were weaker than the rest of the evidence.
This pass prioritized official pricing pages, official product docs, official GitHub repositories, and first-party launch/update posts. Where a performance number came from a vendor rather than a neutral benchmark page, the wording was softened accordingly.