Semantic Web and Ontologies
I have been following news of the Semantic Web. I admit that I don’t fully understand all the details mainly for lack of time, but I do understand the benefits in allowing software agents to navigate and intelligently process information on the Internet. An agent is simply computer parlance for software that performs roles typically performed by people, usually requiring higher-order intelligence.
One of the components of the Semantic Web is the use of an ontology. My software uses a freely available Wordnet ontology (more often refered as a machine-readable dictionary), which I have mapped into memory in highly compressed and optimized form. It was built by hand by a few processors in Princeton University over a period of two decades. It can have a significant impact the type of intelligence an application exhibits.
An ontology is often seen a hierarchical classification of entities. I actually view an ontology as a network of relationships between words (hypernomy-“is a,” meronomy-“is a part of,” antonomy, synonymy, and so on). The WordNet ontology that I use is contains several dozen different relationships between words. (The IsA relationship in WordNet by the way can have multiple parents.) There are other ontologies that exist and mainly built off of the work of WordNet. There is one being built for European languages. Microsoft has a project called MindNet, that automatically extracts definitions automatically through its Encarta and other collections. Cyc corporation has a stronger ontology based on WordNet and more suitable for AI.
An ontology is important for moving computers closer to human-like intelligence, because a computer can not reason about real world objects without knowing their relationships. Computers really don’t understand numbers, but we have given them the apparent ability to do so by a create set of instruction that mimic the properties of numbers.
For those who are skeptical about the application of ontologies, real world databases can be seen as limited forms of “ontologies,” which maintain various useful relationship between known entities and allow business data to be readily understood and used by computers.
One author that I have been reading, Clay Shirky, wants to throw water on the new technology with his articles “The Semantic Web, Syllogism and WorldView.” and “Ontology Overrated: Categories, Links and Tags.” He writes well, but I think that he suffers from lack of vision and poor analogies. Some of his arguments include the following:
- Classification systems are flawed
- Ontologies represent a world view, both in time and place
- Ontologies are inorganic and top-down, so is doom to fail.
- Syllogisms aren’t useful in the real word
In his article “Ontologies Overrated,” he confuses ontologies with categorization or classifications systems like the Yahoo directory and Dewey Decimal System. I think the correct analogy is is that an ontology is more like a dictionary.
A Yahoo directory might group under Education subcategories like Higher Education, Statistics, Teaching and Conference. A card catalog system might place under Religion books under Morality, the Bible, theology, philosophy. The fact that a subcategory is part of a categories provides very little information.
In an ontologies, the relationships between words are direct and strong. Christianity IsA religion. A hand IsPartOf a body. One can actually reason with such statements even though Christianity was not a religion 2000 years ago, and somebody could be missing both arms.
Clay argues that ontologies represent a particular world view, which is different for each culture and changes over time.
An ontology is fundamentally an electronic equivalent of a dictionary. Any charge that is leveled at ontologies can equally be applicable to dictionaries. New concepts emerge and old concepts die out over time. Dictionaries in practice need to be updated annually, but Webster’s 1913 dictionary is still quite usable. Any ontology should have a mechanism by which it is updated over time. It’s generally true for software as well since technology changes over time with new operating systems and process architectures.
If we look at the various dictionaries available today, they are remarkably consistent. Most words rarely change their meaning over time, especially in the last couple centuries, mainly because dictionaries have actually help to stabilize the English language; before then, English underwent large changes and Middle English is unrecognizable to today’s English speakers.
I would guess that anything relying words and symbols would necessarily represent a world view. I don’t buy the “It’s not perfect, so it’s not useful” argument; even his favorite, tags, suffers from the same issues. There are also projects that try to alleviate worldview issues such as LCS that attempt to represents unify words from different languages into a common system.
Clay argues that ontologies offer a single imposed view of the world, static in time and place. Clay claims that tags are superior to ontologies for those reasons. However, I disagree. Tags are useful for performing keyword searchings, collaborative filtering, and other statistical data mining. In combination with ontologies, tags gain even more power, because related tags can be more easily combined. Tags, however, are limited, because any algorithm that uses them is inherently heuristic and thereby can’t make guarantees about the results.
One more point. It’s not necessary to have one centralized ontology, for the same reason that there are multiple dictionaries, though I am sure there are some benefits in having a single authority (Oxford English Dictionary). Each person carries their own mental ontology.
His last claim is that people don’t really think and the real world doesn’t operate in syllogisms. I disagree… People don’t normally use explicit Aristotelian syllogisms in normal life, but they continuously make implicit deductions in the course of thinking and speaking. I’ll save this argument for another post.
NOTE: I will probably rewrite this post shortly to make it more rigorous. There are some claims that I make here that I want to provide stronger support for.