COMP30352: Information Retrieval, Hypermedia and the Web (2008-2009)
Databases are not the only means for the storage, and subsequent retrieval of information, in fact databases only hold the subset of information known as "structured data". Although this constitutes the majority of data that drives the operational processes of an enterprise, it is actually the minority of information that is found in an enterprise. Documents and hypermedia are also information repositories, often referred to as semi-structured data, and forming the backbone of Digital Libraries and the Web.
Work has gone on for at least a decade on how to manage and find electronic documents, and how to structure and navigate hypertexts. Work has been going on for centuries on how to manage and catalogue libraries. The Web, as a global document repository and a distributed hypermedia, makes this area of information management more important than ever. A customer or another business finding my businesses web pages is a matter of my business's survival in e-Commerce land.
This course unit aims to give students an understanding of the issues and some solutions in hypermedia development, design document management and retrieval and metadata management. The case study is the Web and the 'Semantic Web'.
The objective of the course is that students will understand the fundamental techniques for hypermedia architectures, design and usability; document management and retrieval and metadata management. By the end of the course the student should:
Be familiar with the fundamentals of hypermedia systems, and hypermedia design and usability methodologies, sufficient to know how to develop a good web hypermedia and why a web site is good or bad.
Understand the difficulty of representing and retrieving documents.
Be familiar with the classical techniques of Information Retrieval, and the additional techniques employed by Web search engines sufficient to understand how web search engines work and how they could be improved.
Be familiar with techniques for conveying the meaning of documents or hypermedia content, for example, metadata, ontologies, thesauri, and classification taxonomies, sufficient to understand their application to the "Semantic Web".
Understand the latest W3C technologies for linking, describing and searching the Web.
Understand the relationship between IR, hypermedia and semantic models.
Assessment of Learning outcomesLearning outcomes 1-6 are assessed by examination.
Contribution to Programme Learning OutcomesA2, A3, A5, B1, B3
Motivation and context 
The semi-structured data landscape, an introduction to the techniques to be explored in the courses -- Information Retrieval, Hypermedia, Document metadata. The history of the Web.
Hypermedia architectures and models: closed hypermedia (HyperWave), open hypermedia (DLS, Microcosm), the Dexter model, AHM, HAM; Using Hypermedia: browsing, navigation and orientation, paths, trails, Hypermedia design: modelling methodologies (OOHDM, RMM), link consistency, link patterns, rhetoric and context, Usability and evaluation: Nielsen's guidance directives, SUE.
Hypermedia and the Web 
Web document mark-up languages: SGML, HTML, XML, DTDs and XML Schema Web linking technologies: XLink, XPointer.
Classical Information Retrieval 
Finding the Needle in the Haystack: How people seek information. Searching vs browsing; searching with browsing. dynamic query reformulation. Types of users: surgical searchers, advice seekers, window shoppers. Classic IR: basic concepts, boolean model, vector model, probabilistic model Text Operations: document pre-processing (word stemming, stop words, thesauri), document clustering. Retrieval Evaluation: recall and precision, and alternatives; reference collections and their relevance User Relevance Feedback: query expansion and term re-weighting, automatic local and global analysis: query expansion based on a statistical thesaurus Hybrid statistical and knowledge approaches: query expansion and refinement based on a similarity thesaurus and ontologies.
Searching the Web 
How search engines use and extend IR techniques. Challenges, search engines: centralised and distributed architectures, user interfaces, ranking, web crawling, indices, meta-searchers, ranking. Web examples include Verity, Autonomy, Lycos, Google etc. Combining searching with browsing: web directories (yahoo); combining knowledge representation and web searching ( yc + Lycos).
The Semantic Web 
Describing document meaning: The need for shared terminologies and controlled vocabularies (esp. in e-Commerce). Knowledge structures for shared terminologies: ontologies, thesauri, classifications (e.g. Yahoo, AAT) Metadata: what is metadata, taxonomies of metadata, expressing metadata, standards (e.g. Dublin Core) Web metadata languages: Topic Maps, RDF, RDF(S) Ontology driven web searching, browsing and hypermedia : COHSE, SHOE, OIL, Ontobroker.