The Semantic Web as a Large, Searchable Catalogue: A Librarian’s Perspective


Some information observers have suggested that Web 2.0's rise has been due to software applications, while it is becoming increasingly obvious to futurists that the Semantic Web will be defined by services. How those services will be developed and leveraged to bring order to the Web is central to our discussion of the Semantic Web. We discuss the broad implications of these issues through a lens of our work as library professionals, and the time we spend blogging about information on the Web, and its evolution.

Globalization & Cultural Homogenization

Aspects of globalization will be facilitated by the Semantic Web, thereby fusing ideas, people and lifestyles together in a complex layering of economies and cultures. There is a sense that the Semantic Web is very closely linked to the globalization of capital, economies and culture - all very information-intensive activities. Thomas Friedman, the author of The World Is Flat, suggests that we are entering what he calls Globalization 3.0 due to ubiquitous computing, fiber-optics, better bandwidth capacity and software - technologies that are certain to connect corporations, nations and individuals.

In the future, Semantic Web companies will not necessarily be located in hi-tech enclaves like Silicon Valley as the Web is making us less dependent on proximity to other hi-skilled workers. In terms of communication, English will continue to be the Web's lingua franca but the Semantic Web will further "flatten" variations through the homogenization of language, and thus cultures. Tim Berners-Lee envisions a Semantic Web of meaning where concepts are named by a unifying global language and logic, a compelling concept. How can this vision of the future Semantic Web be achieved?

A Question of Semantics

While most of us perceive semantics as a linguistic concept, the Semantic Web is actually a way to translate and merge languages (computer languages included) into something universal. This simple notion promotes standards for databases and brings artificial intelligence to bear on computer programming and deciphering between similar concepts - for example, think of how two documents containing information about the actress "Paris Hilton" might get confused with a hotel in that City of Light. A simple search for 'paris hilton' is made difficult due to its connection to both actress and hotel, a lack of metadata and proper indexing using what librarians called controlled terms.

Web technologies such as social tagging (like and Flickr) provide a simple way for Web users to describe items they use, feel are useful and want to share with others. However, tagging has its limitations as it has no controls or standards to group infinite varations of similar concepts into a coherent whole. The Resource Description Framework (RDF), a method of connecting URIs in a meaningful way, is the key to making the Semantic Web possible. Making connections among and between documents and ideas is something librarians do for a living.

The Semantic Web as a Digital Catalogue

Put simply, a defining feature of the Semantic Web will be the organization of billions of documents similar to what Melville Dewey did for print materials when he created a classification system. Librarians have been interested in the power of using better vocabularies to organize materials, and as a way of bringing similar documents together on shelves and in catalogues. Further, the OCLC (the Online Computer Library Center) has brought order to the messiness of the Web through its Dublin Core Project. In broad terms, computer science has led to the creation of Web 1.0 and 2.0, whereas library and information science can lead knowledge organization in Web 3.0.

In some ways, RDF technologies are building upon the current principles of cataloging and classification of a former print-dominated culture. However, the Semantic Web is focused on the digital - collecting, organizing and disseminating digital information and organizing it using metadata (data about data). This idea is similar to the creation of the MARC record (machine readable-catalogue record) that revolutionized the description of items in a library's inventory. MARC changed the way libraries provide access to intellectual works by using the much-venerated principles of cataloguing - enshrined in the bible of library cataloguing, AACR (Anglo American Cataloguing Rules) - in a machine-readable context. We argue that the Semantic Web could very well be like a large card catalog, tying libraries of the world together into one large universal database, much like the one envisioned by our profession's major thinkers.

A Double Edge of Possibility, and Privacy
The Semantic Web will be characterized by more coherence, standards of description and interoperability. We like to think of the emerging Web as a more coherent space where one's needs for socialization, knowledge-sharing and leisure are easily met. However, we recognize that this poses a threat to privacy when you consider that all digital activities are traceable and can be used to market products. We consider that Facebook's move to allow Google to crawl parts of its content as emblematic of the double-edge of optimal searchability on the Web, and a decreased sense of privacy.

Further, if Web 2.0 is about social computing and interaction, the Semantic Web moves the web into a wide open, transparent space for all to carry out work and leisure activities. If you consider that reading someone's Facebook messages is disconcerting, think of the possibility of all personal conversations could one day be searchable, and organized by others for retrieval. This social element will perhaps be too "intimate" for digital comfort, and a move viewed with concern by privacy activists.


Although still conceptual, the Semantic Web is in many ways a counterweight to Web 2.0 - best described as a disjointed place where everything-is-miscellaneous and governed by the dichotomy of the global-local dynamic. In contrast, Web 3.0 is about bringing the miscellaneous back together meaningfully after it's been fragmented into a billion pieces. has already quietly applied some Web 3.0 principles to its strategic planning of services. In 2006, it installed QEMU, an open source processor emulator software to provide the additional computing power it needs to provide a range of new services. As such, Amazon like many other companies is bracing itself for the next Web wave. And it's only a matter of time until other knowledge-based companies will follow suit as semantics are used to address information overload of the uncontrolled Web.