Samir Samat of Mohomine discusses a pretty disturbing topic:search algorithms
by Paola Di Maio
14 March 2001, 3 pm GMT
The limitations of keyword searches reflect the inherent problems of language representation that have bugged computer scientists for over a century.
Grammatical analysis methods such as syntax, semiotics, and semantics, originally derived from disciplines such as philosophy and languages, are now increasingly adopted by computer science, in order to equip mathematical modelling and artificial intelligence with the advanced qualitative and cognitive abilities typical (in best cases) of humans.
But next-generation search technology will be more sophisticated than keyword searching and will offer better understanding of language and meaning.
Information retrieval technologies are becoming increasingly vital as the amount of data available grows exponentially.
Offering maximum relevance and contextualized search results is a most precious asset for an online enterprise, together with the ability to target and profile users. Relevance and recall - the breadth of the search - generally vary inversely: the higher the precision of the search result, the lower the recall, and vice versa.
Searching by its very nature is not a binary 'yes' or 'no' exercise, says a recent paper by Mohomine, but a logical navigation through complex and infinite semantic networks.
Even Bayesian logic, a relatively modern method of computing dynamic pattern recognition, has limitations: it requires ample sample data to be analyzed, and, given the availability of massive databases, to return a statistically meaningful pattern search, it requires very high system resources and computing power.
Typical search architectures are based on crawlers, indexing and query software, while relevance is determined by keywords, and yes, by the amounts of cash paid by companies to be listed at the top.
Contextual relevance and objective retrieval are further skewed by the forced relevance that is imposed on search results when payment schemes are adopted to prioritize certain entries over others, claim Mohamine, in a research paper entitled Desperately Seeking Search.
Other distortions, like the three-to-six-week lag needed for new information posted on the web to be indexed, makes accurate, timely and relevant information very hard to find indeed: "Consumer search is notoriously poor at returning pertinent, useful information, especially because there's little innovation - there is no real monetization model in use for search services."
For these and other reasons, the search models offered by most engines are very flawed and unsustainable, said Samir Samat, co-founder and CTO of Mohomine during a conversation a few months ago: "We are working on a search technology solution that is developing context-based spidering with high precision and recall, to allow customized extraction of information and personalized, automated information classification" he said.

Comments
Post new comment