New ask Hacker News story: The AI Reproducibility Crisis

The AI Reproducibility Crisis
2 by ocolegro | 2 comments on Hacker News.
I've really been struggling of late to replicate recent findings in research that has built on top of GPT-3.5/GPT-4. This leads me to believe there is growing yet largely unnoticed issue is taking root in recent AI research. I've termed this the "AI Reproducibility Crisis". The principle is simple, if accessible private models are silently changing in time, then previous results cannot be replicated. *Key Issues*: 1. Users have reported significant performance shifts post the May release. 2. Beyond community discussions, academic studies are showing differences in performance across time https://ift.tt/iIOrERG. 3. It appears difficult to replicate previous benchmark evals, see our effort here - https://ift.tt/IM85Dw7. 4. The centralized approach of major providers amplifies these concerns, underscoring the essential need for research autonomy. *Proposed Solutions*: - Lean towards open-source foundational models for transparency. - Clearly annotate the date of model access when relying on private providers. Push these providers to provide model / inference specifiers to delineate any changes on their end. - Advocate for continuous third-party benchmarking of LLM providers to monitor model changes. This is an initiative we're currently starting w/ some help from a great academic group. This extends beyond mere performance dips or changes in GPT. It's a call for long-term transparency, scientific rigor, and sustained progress. Failing to properly address this issue will create major headwinds for the long-term progress in AI research, as our future work will not be able to build upon past work.

Comments