Skip to navigation | Skip to main content | Skip to footer

COMP30352: Information Retrieval, Hypermedia and the Web (2008-2009)

This is an archived syllabus from 2008-2009

Information Retrieval, Hypermedia and the Web
Level: 3
Credit rating: 10
Pre-requisites: COMP20312 or similar
Co-requisites: none
Duration: 11 weeks
Lectures: 22
Lecturers: Carole Goble
Course lecturer: Carole Goble

Additional staff: view all staff
Sem 2 w19-26,30-33 Lecture LF15 Thu 10:00 - 11:00 xtra
Sem 2 w19-26,30-33 Lecture LF15 Fri 13:00 - 14:00 xtra
Sem 2 w19-26,30-33 Lecture UNI PL 4.206 Tue 15:00 - 17:00 -
Assessment Breakdown
Exam: 100%
Coursework: 0%
Lab: 0%


Databases are not the only means for the storage, and subsequent retrieval of information, in fact databases only hold the subset of information known as "structured data". Although this constitutes the majority of data that drives the operational processes of an enterprise, it is actually the minority of information that is found in an enterprise. Documents and hypermedia are also information repositories, often referred to as semi-structured data, and forming the backbone of Digital Libraries and the Web.

Work has gone on for at least a decade on how to manage and find electronic documents, and how to structure and navigate hypertexts. Work has been going on for centuries on how to manage and catalogue libraries. The Web, as a global document repository and a distributed hypermedia, makes this area of information management more important than ever. A customer or another business finding my businesses web pages is a matter of my business's survival in e-Commerce land.

This course unit aims to give students an understanding of the issues and some solutions in hypermedia development, design document management and retrieval and metadata management. The case study is the Web and the 'Semantic Web'.

Learning Outcomes

The objective of the course is that students will understand the fundamental techniques for hypermedia architectures, design and usability; document management and retrieval and metadata management. By the end of the course the student should:

Be familiar with the fundamentals of hypermedia systems, and hypermedia design and usability methodologies, sufficient to know how to develop a good web hypermedia and why a web site is good or bad.
Understand the difficulty of representing and retrieving documents.
Be familiar with the classical techniques of Information Retrieval, and the additional techniques employed by Web search engines sufficient to understand how web search engines work and how they could be improved.
Be familiar with techniques for conveying the meaning of documents or hypermedia content, for example, metadata, ontologies, thesauri, and classification taxonomies, sufficient to understand their application to the "Semantic Web".
Understand the latest W3C technologies for linking, describing and searching the Web.
Understand the relationship between IR, hypermedia and semantic models.

Assessment of Learning outcomes

Learning outcomes 1-6 are assessed by examination.

Contribution to Programme Learning Outcomes

A2, A3, A5, B1, B3


Motivation and context [2]

The semi-structured data landscape, an introduction to the techniques to be explored in the courses -- Information Retrieval, Hypermedia, Document metadata. The history of the Web.

Hypermedia [9]

Hypermedia architectures and models: closed hypermedia (HyperWave), open hypermedia (DLS, Microcosm), the Dexter model, AHM, HAM; Using Hypermedia: browsing, navigation and orientation, paths, trails, Hypermedia design: modelling methodologies (OOHDM, RMM), link consistency, link patterns, rhetoric and context, Usability and evaluation: Nielsen's guidance directives, SUE.

Hypermedia and the Web [2]

Web document mark-up languages: SGML, HTML, XML, DTDs and XML Schema Web linking technologies: XLink, XPointer.

Classical Information Retrieval [5]

Finding the Needle in the Haystack: How people seek information. Searching vs browsing; searching with browsing. dynamic query reformulation. Types of users: surgical searchers, advice seekers, window shoppers. Classic IR: basic concepts, boolean model, vector model, probabilistic model Text Operations: document pre-processing (word stemming, stop words, thesauri), document clustering. Retrieval Evaluation: recall and precision, and alternatives; reference collections and their relevance User Relevance Feedback: query expansion and term re-weighting, automatic local and global analysis: query expansion based on a statistical thesaurus Hybrid statistical and knowledge approaches: query expansion and refinement based on a similarity thesaurus and ontologies.

Searching the Web [1]

How search engines use and extend IR techniques. Challenges, search engines: centralised and distributed architectures, user interfaces, ranking, web crawling, indices, meta-searchers, ranking. Web examples include Verity, Autonomy, Lycos, Google etc. Combining searching with browsing: web directories (yahoo); combining knowledge representation and web searching (yc + Lycos).

The Semantic Web [5]

Describing document meaning: The need for shared terminologies and controlled vocabularies (esp. in e-Commerce). Knowledge structures for shared terminologies: ontologies, thesauri, classifications (e.g. Yahoo, AAT) Metadata: what is metadata, taxonomies of metadata, expressing metadata, standards (e.g. Dublin Core) Web metadata languages: Topic Maps, RDF, RDF(S) Ontology driven web searching, browsing and hypermedia : COHSE, SHOE, OIL, Ontobroker.