Skip to navigation | Skip to main content | Skip to footer

Current postgraduate taught students

COMP60411: Semi-structured Data and the Web (2010-2011)

This is an archived syllabus from 2010-2011

Semi-structured Data and the Web
Level: 6
Credit rating: 15
Pre-requisites: Good familiarity with relational databases and programming
Co-requisites: No Co-requisites
Lecturers: Bijan Parsia, Uli Sattler
Course lecturers: Bijan Parsia

Uli Sattler

Additional staff: view all staff
Sem 1 P1 Lecture 2.19 Fri 09:00 - 16:00 -
Sem 1 P1 Lab 2.25abcd Fri 15:00 - 17:00 -
Assessment Breakdown
Exam: 50%
Coursework: 50%
Lab: 0%

Themes to which this unit belongs
  • Data Management
  • Advanced Web Technologies


This course unit covers various formalisms (predominantly XML) and applications for Semi-Structured Dat . Semi-structured data focuses on describing and querying data that comes in a format less tightly structured than that found in relational databases. Such data is dominant on the Web, from HTML pages and weblog feeds to SOAP messages and vector graphics.

See the pitch talk for more details.


This course unit aims to give students
a good overview of the ideas and the techniques which are behind the
description and query mechanisms for semi-structured data as well as various design and modelling
issues that arise, especially in the context of the Web. We discuss the basic concepts of
semi-structured data and their representation as well as three major families of formalisms for semi-structured data:
XML (including Schema languages for XML
data (DTD and XMLSchema), processing and manipulating XML data (XPath,
XQuery), and some theoretical aspects of XML data
processing), HTML (including HTML 5, CSS, and AJAX), and knowledge representation based languages such as RDF and OWL.

Laboratory sessions
will ground the abstract notions on practical cases and tools.

Programme outcomeUnit learning outcomesAssessment
G1Have an understanding of the history and foundations semi-structured data and their representation.
  • Examination
G1Have an understanding of XML, schema languages for XML (DTD, XML Schema, Relax-NG, Schematron), processing, querying, and manipulating XML data (DOM, SAX, XPath, XQuery), and some theoretical aspects of XML data processing.
  • Examination
G1Have an understanding of HTML (esp. HTML 5), CSS, HTML validation, HTML processing and manipulation for Web Applications, and an an understanding of the design issues involved.
  • Examination
G1Have an understanding of knowledge represenation based languages such as RDF and OWL, their semantics, the theoretical aspects of their core inference services, and their applicability to dealing with semi-structured data.
  • Examination
G2 G3 G4Have mastered the basic range of techniques for representing, modelling, and querying semi-structured data, and be able to use tools developed for them.
  • Examination
  • Lab assessment


Introduction: Semi-structured data.
XML: core concepts
DTDs, a simple schema language for XML documents
XPath, a navigation language for XML documents
XML namespace: a concept ignored so far
XSLT, a transformation language for XML documents
DOM and SAX, a programmatic manipulation language for XML documents
XML Schema, a more expressive schema language for XML documents
XQuery, a query language for XML documents
HTML 5: text/html vs. application/xml+xhtml
Validation of HTML 5 (including use of Schematron)
CSS and the DOM: Web Data vs. Web Documents vs Web Applications
RDF and Linked Data
OWL: How inference can help data

Reading List

There's no need for the students taking the course to buy any book. However, there are some resources that a student may wish to consult:

W3C documents at

Special resources

We will use the XML Editor, Firefox and other web browsers, and Protege 4.