New ask Hacker News story: I just spent the past 5 hours comparing LLMs

May 18, 2024

New ask Hacker News story: I just spent the past 5 hours comparing LLMs

I just spent the past 5 hours comparing LLMs
5 by mlashuel | 0 comments on Hacker News.
The top 2 takeaways I took from this is that. 1. No one should still be using GPT 3.5 or Gemini 1.0, They may get the job done, but 98% of the time one of these LLMs (Gemini 1.5 Pro, Bing Copilot, ChatGPT4o, and Perplexity) will give you a better response. 2. The best way to always get the best response is by always comparing multiple AI Chatbots and choosing the best answer. This is because If you are only using ChatGPT-3.5, llama3, claude sonnet, or mistral 100% of the time there is another LLM with a better answer. If you are only using Gemini 1.0 96% of the time there is another LLM with a better answer. If you are only using perplexity 86% of the time there is another LLM with a better answer. If you are only using ChatGPT-4o 73% of the time there is another LLM with a better answer. If you are only using Gemini 1.5 Pro 73% of the time there is another LLM with a better answer. If you are only using Bing Copilot 73% of the time there is another LLM with a better answer. I personally used chatplayground.ai to compare all of them. I stopped using chatgpt 3.5 a long time ago because this gives you all the pro llms for the same price as gpt4 It is very important to note this research was done on a very small dataset of 22 questions. The best LLM answer for each question was decided by me simply putting myself in the choose of the person asking that question and deciding which output is the most helpful. Perplexity is good at giving you a completely answer, most llms will give you bullet point answers, perplexity perfers to write out a complete answer rather than just giving bullet points Bing Copilot is great because it cites its resources, will even recommend videos to help you ChatGPT-4o has really descriptive and usually longer answers, but prefers to answer in bullet points Gemini 1.5 pro is great because it feels like it understands the context of your question more by having a conversational tone Bing copilot can be great but 20-30% of the time the answers it gives are not even usable Because Bing copilot is using sources to give you answers the answers feel much more human like and at times more useful then other LLMs that just list a bunch of basic bullet points Their should be no reason anyone is still using ChatGPT 3.5 today Gemini 1.0 gives good answers but not as detailed and helpful as Gemini 1.5 Pro ChatGPT-4o was able to generate really nice data tables, that Gemini 1.5 Pro wasn't able to. ChatGPT-3.5 ChatGPT-4o Gemini 1.0 Gemini 1.5 Pro Bing Copilot Claude Sonnet Llama 3 Mixtral 8x7b Mistral Large Perplexity

Search This Blog

We with the world...

New ask Hacker News story: I just spent the past 5 hours comparing LLMs

Comments

Post a Comment

Popular Posts

New ask Hacker News story: Ask HN: No-tracking, no BS Newsletter Service

New ask Hacker News story: Ask HN: How to find a bizdev partner in the US?