Skip to navigation | Skip to main content | Skip to footer

Current postgraduate taught students

COMP60372: Semi-structured Data and the Web (2009-2010)

This is an archived syllabus from 2009-2010

Semi-structured Data and the Web
Level: 6
Credit rating: 15
Pre-requisites: Good familiarity with Java programming.
Co-requisites: No Co-requisites
Lecturers: Bijan Parsia, Uli Sattler
Course lecturers: Bijan Parsia

Uli Sattler

Additional staff: view all staff
Sem 2 w19-23 Lecture 2.19 Fri 09:00 - 17:00 -
Assessment Breakdown
Exam: 50%
Coursework: 50%
Lab: 0%


This course unit covers various formalisms (predominantly XML) and applications for Semi-Structured Dat . Semi-structured data focuses on describing and querying data that comes in a format less tightly structured than that found in relational databases. Such data is dominant on the Web, from HTML pages and weblog feeds to SOAP messages and vector graphics.

See the pitch talk (last year's) for more details.


This course unit aims to give students a good overview of the ideas and the techniques which are behind the description and query mechanisms for semi-structured data as well as various design and modelling issues that arise, especially in the context of the Web. We discuss the basic concepts of semi-structured data and their representation as well as several major families of formalisms for semi-structured data with a heavy emphasis on XML (including Schema languages for XML data (DTD and XMLSchema), processing and manipulating XML data (XPath, XQuery), and some theoretical aspects of XML data processing).

Laboratory sessions will ground the abstract notions on practical cases and tools.

Learning Outcomes

A student completing this course unit should:

1. have an understanding of the history, foundations, and different models of semi-structured data, (A)

2. have an understanding of XML, schema languages for XML (DTD, XML Schema, Relax-NG, Schematron), processing, querying, and manipulating XML data (DOM, SAX, XPath, XQuery), and some theoretical aspects of XML data processing, (A)

3. have an understanding of HTML (esp. HTML 5), CSS, HTML validation, (A)

4. have mastered the basic range of techniques for representing, modelling, and querying semi-structured data, and be able to use tools developed for them. (B, C and D)

Assessment of Learning outcomes

Learning outcomes (1), (2), and (3), are assessed by examination; learning outcome (5) by examination and in the laboratory.

Contribution to Programme Learning Outcomes

A2, B2, B3, C2, D3, D4


The content and order of this syllabus fluctuate, but not hugely.

This syllabus can look rather technology oriented. While we do care about practical issues facing people working with standard technologies, we also present fundamental concepts and issues, design choices and tradeoffs, and try to develop insight into what makes a particular technology applicable in different situations.

Introduction: Semi-structured data.
XML + Namespaces: core concepts
DTDs, a simple schema language for XML documents
DOM and StAX/SAX, APIs for XML documents
XPath, a navigation query language for XML documents
XQuery, a query language for XML documents
XML Schema and RELAX NG, more expressive schema languages for XML documents
XSLT, a transformation language for XML documents
Schematron: a difference sort of schema language
HTML5 and CSS, the web, stucture, and style
Models of Error handling
Tree Grammars, Formal Foundations of Schemas Languages
Other models of Semi-structured Data

Reading List

Readings are primarily given through the Blackboard module for this course. They include research papers and technical specifications (especially from the W3C).

Special resources

We will use the XML Editor, which is available in the lab. Students should be familiar with the Java command line toolchain (e.g., javac) as well as their favorite IDE (e.g., Eclipse, Netbeans, etc.)

Title: XML in a nutshell: a desktop quick reference (3rd edition)
Author: Harold, Elliotte Rusty and W.Scott Means
ISBN: 9780596007645
Publisher: O'Reilly Media
Edition: 3rd
Year: 2004
Another book available online. Reasonably reliable reference to many XML technologies.

Title: Learning XML (2nd edition)
Author: Ray, Erik T.
ISBN: 0596004206
Publisher: O'Reilly
Edition: 2nd
Year: 2003
Students will find it useful to read through this text before the course starts. Don't believe everything you read! It is available online, for free, via the University Library Page (Safari Tech Books Online)