COMP38212: Topics in Advanced Information Retrieval (2010-2011)
The representation, organisation, indexing, searching and exploitation of information, both by people and by information systems and software agents, have become central to knowledge-based economies, necessitating new approaches to information-intensive knowledge work. However, contemporary conventional information and communication technologies cannot adequately address information needs, especially where these imply finding (and possibly justifying) an answer through exploitation of different types of structured, semi-structured and unstructured information sources and different types of media, geographically dispersed. Humans suffer from information overload and information overlook. We have the technology to enable distributed computing and networking, but the bulk of visible information on the Web is oriented at humans and cannot easily be identified and exploited by systems. Information in the deep, hidden Web remains unexploitable in the general case. Social networking has blossomed but has frustrating limitations.
What are known as semantic technologies hold out the promise of improved flexible interaction of systems, improved search and the ability to offload tasks to quasi-independent software agents. In this vision, data now become smart data. The existence of smart data enables semantic search, capable of, for example, delivering precise results instead of thousands of documents to read. Smart data and semantic search enable interoperability of software systems: agents can discover resources and intercommunicate to achieve complex goals.
As these technologies are intimately linked with well-established techniques and ideas, to do with the indexing, classification and retrieval of information, and with more recent advances in large-scale processing of unstructured text, this unit also examines contributions from the fields of information retrieval and text mining.
To provide students with an understanding of principles, issues, techniques and solutions connected with advanced information retrieval.
To enable students to gain knowledge of how advances in distributed systems and the Web, studied on other course units, relate to innovative approaches to indexing, organising, finding and exploiting information using semantic technologies.
Assessment of Learning outcomesExamination 80% (3 questions from 5, examination at end of semester 2)
Coursework 20% (1 assignment to be handed in during semester 2, exact date to be specified)
Learning outcomes 1-8 are assessed by examination.
Learning outcome 8 is also assessed by coursework.
Academic knowledge1. Demonstrate a requisite understanding of selected concepts, terminology and issues related to advanced information retrieval.
2. Demonstrate a requisite understanding of the fundamental techniques for document management and retrieval, and metadata management.
3. Demonstrate a requisite understanding of techniques to characterise and exploit the meaning of documents, and of the relationship between such techniques and information retrieval in a Web perspective.
4. Demonstrate a requisite understanding of relevant W3C technologies supporting advanced information retrieval.
Intellectual skills5. Explain the general principles of advanced information retrieval and discuss the content and role of relevant related standards.
6. Explain the difficulty of representing content for accurate retrieval in relation to needs.
7. Explain how techniques for characterising the meaning of content and for semantic search are applied.
8. Discuss, critically analyse and evaluate current approaches in the field.
Subject practical skills9. Be able to use the power of semantic technology for search, personalization and enterprise applications.
Transferable Skills10. Appreciate issues of communication of knowledge.
11. Ability to support enteprise knowledge management activities.
Information retrieval, semantic technologies and their relationship to the Web and distributed systems; structured, unstructured and semi-structured data; evolving information needs and knowledge management issues; executable knowledge, enhancing search experience, semantic applications and semantic infrastructure. 
Information Retrieval: basic concepts; indexing, classification and clustering; boolean, vector space and probabilistic models; evaluation of IR systems; relevance and feedback, query expansion and the role of thesauri and ontologies; IR in a Web environment; question answering. 
Semantic technologies: turning data into smart data: semantic metadata and ontologies; languages supporting semantic technologies; semantic interoperability; semantic search; semantic personalisation. 
Text mining: basic concepts; dealing with information overload and information overlook; going beyond document retrieval; overview of main approaches and techniques; relation to information retrieval; enabling technology for semantic solutions. 
Applications: Consideration of a range of applications demonstrating aspects of the above.