New ask Hacker News story: Ask HN: What benchmarks are you using to judge AI models?

April 30, 2025

New ask Hacker News story: Ask HN: What benchmarks are you using to judge AI models?

Ask HN: What benchmarks are you using to judge AI models?
2 by cowpig | 0 comments on Hacker News.
There are so many models, and so many new ones being released all the time, that I have a hard time knowing which ones to prioritize testing anecdotally. What benchmarks have you found to be especially indicative of real-world performance? I use: * Aider's Polyglot benchmark seems to be a decent indicator of which models are going to be good at coding: https://ift.tt/9nXxHe0 * I generally assume OpenRouter usage to be an indicator of a model's popularity, and by proxy, utility: https://ift.tt/wPHbLrs * LLM-Stats has a lot of charts of benchmarks that I look at: https://llm-stats.com/

Search This Blog

We with the world...

New ask Hacker News story: Ask HN: What benchmarks are you using to judge AI models?

Comments

Post a Comment

Popular Posts

New ask Hacker News story: Ask HN: No-tracking, no BS Newsletter Service

New ask Hacker News story: Ask HN: How to find a bizdev partner in the US?