Sitewide
RSS Feed:
|
By: Robin Bloor Published: 18th April 2001 Copyright © 2001 |
Ever since the inception of the Internet, users have tried to develop metaphors to explain the phenomenon of a three dimensional information resource where every literate person can not only have access, but also publish. Initially, I used the metaphor of the Wild West, lawless, anarchic, untamed and thoroughly exciting and romantic. These days I prefer a Monty Python reference. Remember that final scene in 'The Meaning of Life' where the white-tie and tails master of ceremonies throws open the door and steps into the vastness of the universe?
Our universe itself keeps on expanding and expanding,
In all of the directions it can whiz;
As fast as it can go, at the speed of light, you know,
Twelve million miles a minute and that's the fastest speed there is.
He then proceeds to define the universe with phrases such as, 'hundred billion stars,' 'hundred thousand light years side to side,' 'sixteen thousand light years thick,' and 'our galaxy itself is one of millions of billions.' This is my visualisation of the Internet, a place mostly unexplored and expanding, but with continuing attempts to make sense of it via exploratory probes. And these attempts at creating understanding from knowledge are all words. Quantitative words, qualitative words. Linguistic forays into the unknown.
This is where we need to discuss language. This is also where we need to discuss culture, ethnography and evolution because language, and the understanding derived from it, are based in all of those things. So how does one extract meaning from the culturally biased, linguistically challenging Monty Python universe called the internet? The answer is to make situational and operational sense of language through the use of taxonomy.
What is taxonomy
The word taxonomy derives from two Greek roots: 'taxis' meaning arrangment and 'onoma' meaning name. I think it's highly appropriate that the 'taxis' root was used to describe the arraignment of military forces for battle because sometimes the creation of taxonomies may seem like making war. Taxonomy is the creation of structure (arrangment) and labels (name) to aid location of relevant information. In the digitised, Internet world of information, taxonomy means the labelling of metadata which allows the primary data or information to be systematically managed and manipulated. This metadata results in a hierarchical structure, which if done correctly, not only allows mapping by word pieces but allows mapping by concept and inference.
It is necessary at this point to define what we mean by mapping or maps. Not the flat Mercator visualisation of a road map necessarily, but a conceptualisation of the ways by which information can be extracted and retrieved. A discovery of the relationships that exist among the metadata of the information which may generally be described as 'physical' attributes or 'content' attributes. This is old news in the library world where information was retrieved through the use of 'enumerative classification,' a tree structure based on the successive division of categories into classes and sub-classes. Think Dewey Decimal System. Professional searchers developed thesauri to provide controlled language access to these classification systems. These were essentially indexes and searching tools. However, when powerful computing processes developed, the need for human mediated, search term generation was lessened. The computer was able to perform the indexing function from full text retrieval processes. The integration of classification and thesauri in an automated environment results in the construction of taxonomy. To escalate the understanding of taxonomy up a notch, we can refer to a new and expanded definition of taxonomy to be ontology which, contextualised, means, a high level device constructed to enable its users to gain an understanding of, and navigate around, available information. Originating from a theological context, ontology generally means the study of what exists in order to achieve a cogent description of reality. An appropriate analogy would be the notion of a knowledge map. Using a well developed, linguistically intelligent, culturally sound knowledge map (aka taxonomy) would allow space invaders to make sense of this Internet universe.
Let's also define what I mean by 'making sense.' In this Internet infoglut world where everyone is a professional searcher and the search process itself has been disintermediated, expectations for query results can be very low. Let's say you have a specific information need. You formulate a query that reflects this question and type it into one of the major search engines, (Yahoo!, Alta Vista and the like). You cross your fingers, click the Go button and algorithms churn, returning screen after screen of confusing hits. You laboriously wade through page after page of irrelevant information, perhaps finding the answer to your question, most likely not. You sigh and try a different search engine with similar results. Maybe you give up and decide that your question wasn't that important anyway. What you don't do is write a letter to Yahoo! or Alta Vista complaining about the lack of customer service, pointing out that your time is money and why can't they improve their product. You've developed a model for Internet information access that is based on low expectations. 'Making sense' of the Internet means having query results returned that are precise and highly relevant. Within seconds, you have the answer to your question if it is available. A taxonomic approach does just this. It is the intelligent metadata mediated layer that allows this to happen.
Here's where XML steps in. XML is important because it facilitates increased access to and description of the content contained within documents. The technology separates the intellectual content of a text from its surrounding structure, meaning that information can be converted into a uniform structure. XML is an enabling technology. It functions like a building permit at a construction site by permitting information aggregation and synthesis and performing the work itself. The taxonomy that overlays the XML layer is the architect.
Implementation of taxonomic approaches
Taxonomies have become an important part of knowledge management solutions. With the rise of enterprise information portals and vertical portals (vortals) and B2B exchanges, we have seen the re-focusing of attention on taxonomies specifically related to specialised content areas. Semio (semio.com) has developed specific terminology and categorisation packages for the chemical, medical, legal, computing and military markets. It has taken industry standard classifications and thesauri from each of these areas and developed templates which employ its core technology, Semio Tagger, to enable automatic analysis and organisation of content from disparate sources to give users information in context. Tagger is embedded directly into a portal. The basis of Semio's software is Semio Map and Semio Taxonomy. Semio's CEO Roger Ferguson says: Cataloging information appropriately for industries that use complex terminology, like legal or medicine, often requires subject-specific expertise. Our templates package the industry-specific expertise we have acquired with our customers, as well as accepted thesauri in various industries, so that businesses can quickly implement custom categorisation solutions.
The company intranet has quite often been the start point for taxonomy solutions. As the networked enterprise has matured, businesses have discovered various files and databases springing up like weeds. Is that piece of information on the D drive or the F? Did we put it in the sales folder or the consultants folder? Perhaps it has been archived or filed on the local drive at another site. These are the questions that have led taxonomic driven intranets within organisations.
Consider the very interessting case of Microsoft. Microsoft decided to develop a taxonomy based on all of its internally created information which included all of the content within the Microsoft firewall and materials licensed from outside publishers for internal distribution. The scope of the content was comprehensive. It included: news, market research, internal product information, software code, business procedures, documentation, email, discussion lists, video presentations, marketing materials, educational materials, streaming media, database entries and more. The expertise for this project came from inhouse personnel and through review by noted experts in the field of classification and thesaurus building. The taxonomy is currently being manually maintained.
Mike Crandall, Microsoft's Knowledge Architect Manager, stated at a recent conference that there is growing evidence of the worth of this taxonomy based project. He reports, that even at this early stage, they have seen a 62 percent reduction in the number of clicks, an average of 16 seconds saved per task and an 11 per cent increase in task success rate.
Conclusion
My next article will discuss the way taxonomy is employed in the creation of the semantic web.
We are no longer accepting comments against this item. We suggest contacting the author directly.
28th April 2002: 'James Insydney' said:
Very Informative
25th June 2002: 'Tammeria' said:
Thanks a lot for your examples and contexts. I am currently interning at one of the largest global financial institutions. I am expanding and maintaining a taxonomy that is supposed to be cross functional along various departments. Again thanks, your article was useful.
11th November 2002: 'svoebel' said:
Very informatiive. Would like to know more about different xml formats as they relate to taxonomy development and a way to automate the taxonomy building if not purchasing an out of box solution. Has anyone had experience implementing Semio with Verity?
20th November 2002: 'Kash' said:
Wonderfull, Its splendid, I am keen to read the forth coming article.Please keep us posted. Especially differences between, taxonomy, ontology and product models
5th December 2003: 'Dr. U' said:
Great article. Did you publish the article described in the final two lines of this one, "the way taxonomy is employed in the creation of the semantic web"? Can't seem to locate it and would appreciate your assistance. Thnx.
The messages above were all contributed by IT-Director.com readers. Whilst we take care to remove any posts deemed inappropriate, we can take no responsibility for these comments. If you would like a comment removed please contact our editorial team.
Published by: IT Analysis Communications Ltd.
T: +44 (0)1908 880760 | F: +44 (0)1908 880761