New ask Hacker News story: Ask HN: Could we skip speech to text using vector databases?
Ask HN: Could we skip speech to text using vector databases?
3 by andrewoodleyjr | 1 comments on Hacker News.
Nearly all voice applications translate speech to text. However, while voice applications need to do multiple things analyzing text we still return to the audio itself for deeper processing. For example we can analyze text to get intent analysis but we need to analyze the wave patterns of the audio file itself to get "real" sentiment analysis. But I wonder if there is a better one approach. I am learning Spanish and right now I translate the words I hear into English for further processing. This is a multi-step process that is time consuming and delays my response. A friend said he didn't really learn English until he challenged himself to stop translating in his head and instead associated the words with the objects itself. At that point he could skip translating and respond extremely fast because he essentially learned the language - we do this with our native tongue. What if we do the same approach using audio in voice ai applications. We could remove translating speech to text - understanding what is being said by analyzing the audio itself and comparing it to pass records using a vector database and pass records of audio, translations, intent, speech to text, etc. If we don't have a similar record translate speech to text (aka inquire). If this were to work over time it would reduce cost and time required for voice ai applications to understand and respond. What is also interesting is it follows the human way of learning. We need to be exposed to things, directed and corrected for a certain amount of time to understand languages.
3 by andrewoodleyjr | 1 comments on Hacker News.
Nearly all voice applications translate speech to text. However, while voice applications need to do multiple things analyzing text we still return to the audio itself for deeper processing. For example we can analyze text to get intent analysis but we need to analyze the wave patterns of the audio file itself to get "real" sentiment analysis. But I wonder if there is a better one approach. I am learning Spanish and right now I translate the words I hear into English for further processing. This is a multi-step process that is time consuming and delays my response. A friend said he didn't really learn English until he challenged himself to stop translating in his head and instead associated the words with the objects itself. At that point he could skip translating and respond extremely fast because he essentially learned the language - we do this with our native tongue. What if we do the same approach using audio in voice ai applications. We could remove translating speech to text - understanding what is being said by analyzing the audio itself and comparing it to pass records using a vector database and pass records of audio, translations, intent, speech to text, etc. If we don't have a similar record translate speech to text (aka inquire). If this were to work over time it would reduce cost and time required for voice ai applications to understand and respond. What is also interesting is it follows the human way of learning. We need to be exposed to things, directed and corrected for a certain amount of time to understand languages.
Comments
Post a Comment