New ask Hacker News story: Ask HN: What's the consensus on "unit" testing LLM prompts?

Ask HN: What's the consensus on "unit" testing LLM prompts?
2 by thiht | 0 comments on Hacker News.
LLMs are notoriously non deterministic, which makes it hard for us developers to trust them as a tool in a backend, where we usually expect determinism. I’m in a situation where using an LLM makes sense from a technical perspective, but I’m wondering if there are good practice on testing, besides manual testing: - I want to ensure my prompt does what I want 100% of the time - I want to ensure I don’t get regressions as my prompt evolve, or when updating the version of the LLM I use, or even if I switch to another LLM The ideas I have in mind are: - forcing the LLM to return JSON with a strict definition - running a fixed set of tests periodically with my prompt and checking I get the expected result Are there specificities with LLM prompt testing I should be aware of? Are some good practices emerging?

Comments