New ask Hacker News story: Ask HN: What are the data compression characteristics of LLMs?
Ask HN: What are the data compression characteristics of LLMs?
3 by DietaryNonsense | 2 comments on Hacker News.
Disclaimer: I have only shallow knowledge of LLMs and machine learning algorithms and architecture in general. Once a model has been trained, the totality of it's knowledge is presumably encoded in it's weights, architecture, hyper-parameters, and so on. The size of all of this presumably being measurable in terms of number of bits. Accepting that the total "useful information" encoded may come with caveats about how to effectively query the model, in principal it seems like we can measure the amount of useful information that's encoded and retrievable from the model. I do sense a challenge in equating the "raw" and "useful" forms of information in this context. An English, text-only wikipedia article about "Shitake Mushrooms" may be 30kb but we could imagine that not all of that needs to be encoded in an LLM that accurately encodes the "useful information" about Shitake mushrooms. The LLM might be able to reproduce all the facts about Shitakes that the article contained but not be able to reproduce the article itself. So in some ontologically sensitive way, the LLM performs a lossy transformation during the learning and encoding process. I'm wondering what we know about the data storage characteristics of the useful information encoded by a given model. Is there a way in which we can measure or estimate the amount of useful information encoded by a LLM? If some LLM is trained on Wikipedia, what is the relationship between the amount of useful information it can reliably reproduce versus the size of the model relative to the source material? In the case of the model being substantially larger than the source, can I feel metaphorically justified in likening the model to being both "tables and indices"? If the model is smaller than the source, can I feel justified in wrapping the whole operation in a "this is fancy compression" metaphor?
3 by DietaryNonsense | 2 comments on Hacker News.
Disclaimer: I have only shallow knowledge of LLMs and machine learning algorithms and architecture in general. Once a model has been trained, the totality of it's knowledge is presumably encoded in it's weights, architecture, hyper-parameters, and so on. The size of all of this presumably being measurable in terms of number of bits. Accepting that the total "useful information" encoded may come with caveats about how to effectively query the model, in principal it seems like we can measure the amount of useful information that's encoded and retrievable from the model. I do sense a challenge in equating the "raw" and "useful" forms of information in this context. An English, text-only wikipedia article about "Shitake Mushrooms" may be 30kb but we could imagine that not all of that needs to be encoded in an LLM that accurately encodes the "useful information" about Shitake mushrooms. The LLM might be able to reproduce all the facts about Shitakes that the article contained but not be able to reproduce the article itself. So in some ontologically sensitive way, the LLM performs a lossy transformation during the learning and encoding process. I'm wondering what we know about the data storage characteristics of the useful information encoded by a given model. Is there a way in which we can measure or estimate the amount of useful information encoded by a LLM? If some LLM is trained on Wikipedia, what is the relationship between the amount of useful information it can reliably reproduce versus the size of the model relative to the source material? In the case of the model being substantially larger than the source, can I feel metaphorically justified in likening the model to being both "tables and indices"? If the model is smaller than the source, can I feel justified in wrapping the whole operation in a "this is fancy compression" metaphor?
Comments
Post a Comment