New ask Hacker News story: Ask HN: Full text search engine in JavaScript for English and Chinese?
Ask HN: Full text search engine in JavaScript for English and Chinese?
2 by graderjs | 0 comments on Hacker News.
I'm interested in providing the full text search capability for product 22120 which has a web archive capability. I've investigated a few such as those based on solr but my concern is that they do not handle multiple human languages with minimal configuration. Ideally something that has the stemming and tokenization for multiple languages including East Asian languages such as Chinese Japanese Korean and ideally South Asian languages like Hindi and Urdu as well. Unfortunately a great many search engines out there seem to have a stemming and tokenization available for romance languages such as Latin derived or Germanic languages. People from everywhere like to archive web content that they browse and the content is in multiple human languages so to provide a good full text search I need something that can handle multiple human languages. I considered maybe writing something myself such as a simple Trie, but I think the rabbit hole of creating a good full text search is a very very long and convoluted one so preferable to plug in something that already exists. I really love what flexsearch is doing especially how they are using signals from context I think that's the future. But I'm concerned how basic their support for stemming and tokenization is for example: https://ift.tt/3CxvwfH
2 by graderjs | 0 comments on Hacker News.
I'm interested in providing the full text search capability for product 22120 which has a web archive capability. I've investigated a few such as those based on solr but my concern is that they do not handle multiple human languages with minimal configuration. Ideally something that has the stemming and tokenization for multiple languages including East Asian languages such as Chinese Japanese Korean and ideally South Asian languages like Hindi and Urdu as well. Unfortunately a great many search engines out there seem to have a stemming and tokenization available for romance languages such as Latin derived or Germanic languages. People from everywhere like to archive web content that they browse and the content is in multiple human languages so to provide a good full text search I need something that can handle multiple human languages. I considered maybe writing something myself such as a simple Trie, but I think the rabbit hole of creating a good full text search is a very very long and convoluted one so preferable to plug in something that already exists. I really love what flexsearch is doing especially how they are using signals from context I think that's the future. But I'm concerned how basic their support for stemming and tokenization is for example: https://ift.tt/3CxvwfH
Comments
Post a Comment