The search for quality on the web- Nepali Times

In the not-so-distant future, students will be able to graduate from high school without ever touching a book. Twenty years ago, they could graduate from high school without ever using a computer. In only a few decades, computer technology and the Internet have transformed the core principles of information, knowledge, and education.

Indeed, today you can fit more books on the hard disk of your laptop computer than in a bookstore carrying 60,000 titles. The number of Web pages on the Internet is rumored to have exceeded 500 billion, enough to fill 10 modern aircraft carriers with the equivalent number of 500-page, one-pound books.

Such analogies help us visualise the immensity of the information explosion and ratify the concerns that come with it. Web search engines are the only mechanism with which to navigate this avalanche of information, so they should not be mistaken for an optional accessory, just another button to play with or a tool to locate the nearest pizza store. Search engines are the single most powerful distribution points of knowledge, wealth, and yes, misinformation.

When we talk about web search, the first name that pops up is, of course, Google. It is not far-fetched to say that Google made the Internet what it is today. It shaped a new generation of people who are strikingly different from their parents. Baby boomers might be the best placed to appreciate this, since they experienced Rock 'n' Roll as kids and Google as parents.

Google's design was based on statistical algorithms. But search technologies that are based on statistical algorithms cannot address the quality of information, simply because high-quality information is not always popular, and popular information is not always high-quality. You can collect statistics until the cows come home but you cannot expect statistics to produce an effect beyond what they are good for.

The inefficiencies of today's search engines have created a new industry called Search Engine Optimisation, which focusses on strategies to make web pages rank high against the popularity criteria of Google-esque search engines. It is a billion-dollar industry. If you have enough money, your Web page can be ranked higher than many others that are more credible or higher quality. Since the emergence of Google, quality information has never been so vulnerable to the power of commercialism.

Information quality, molded in the shadow of web search, will determine the future of mankind, but ensuring quality will require a revolutionary approach, a technological breakthrough beyond statistics. This revolution is underway, and it is called semantic technology.

To achieve the level of dexterity in handling languages by computer algorithms, an ontology must be built. Ontology is neither a dictionary nor a thesaurus. Building an ontology encapsulating the world's knowledge may be an immense task, requiring an effort comparable to compiling a large encyclopedia and the expertise to build it, but it is feasible. Several start-up companies around the world like Hakia, Cognition Search and Lexxe have taken on this challenge. The result of these efforts remains to be seen.

But how would a semantic search engine solve the information quality problem? The answer is simple: precision. Once computers can handle natural languages with semantic precision, high-quality information will not need to become popular before it reaches the end user, unlike what is required by Web search today.

Semantic technology promises other means of assuring qualities by detecting the richness and coherence of the concepts encountered in a given text. If the text includes a phrase like "Bush killed the last bill in the Senate," does the rest of the text include coherent concepts? Or is this page a spam page that includes a bunch of popular single-liners wrapped with ads? Semantic technology can discern what it is.

Given humans' limited reading speed (200-300 words per minute) and the enormous volume of available information, effective decision-making today calls for semantic technology in every aspect of knowledge refinement. We cannot afford a future in which knowledge is at the mercy of popularity and money.

Project Syndicate

Riza Berkan is a nuclear scientist with a specialisation in artificial intelligence, fuzzy logic, and information systems. He is the founder of Hakia.