Current postgraduate taught students
COMP60372: Semi-structured Data and the Web (2009-2010)
This course unit covers various formalisms (predominantly XML) and applications for Semi-Structured Dat . Semi-structured data focuses on describing and querying data that comes in a format less tightly structured than that found in relational databases. Such data is dominant on the Web, from HTML pages and weblog feeds to SOAP messages and vector graphics.
See the pitch talk (last year's) for more details.
This course unit aims to give students a good overview of the ideas and the techniques which are behind the description and query mechanisms for semi-structured data as well as various design and modelling issues that arise, especially in the context of the Web. We discuss the basic concepts of semi-structured data and their representation as well as several major families of formalisms for semi-structured data with a heavy emphasis on XML (including Schema languages for XML data (DTD and XMLSchema), processing and manipulating XML data (XPath, XQuery), and some theoretical aspects of XML data processing).
Laboratory sessions will ground the abstract notions on practical cases and tools.
A student completing this course unit should:
1. have an understanding of the history, foundations, and different models of semi-structured data, (A)
2. have an understanding of XML, schema languages for XML (DTD, XML Schema, Relax-NG, Schematron), processing, querying, and manipulating XML data (DOM, SAX, XPath, XQuery), and some theoretical aspects of XML data processing, (A)
3. have an understanding of HTML (esp. HTML 5), CSS, HTML validation, (A)
4. have mastered the basic range of techniques for representing, modelling, and querying semi-structured data, and be able to use tools developed for them. (B, C and D)
Assessment of Learning outcomesLearning outcomes (1), (2), and (3), are assessed by examination; learning outcome (5) by examination and in the laboratory.
Contribution to Programme Learning OutcomesA2, B2, B3, C2, D3, D4
The content and order of this syllabus fluctuate, but not hugely.
This syllabus can look rather technology oriented. While we do care about practical issues facing people working with standard technologies, we also present fundamental concepts and issues, design choices and tradeoffs, and try to develop insight into what makes a particular technology applicable in different situations.
Introduction: Semi-structured data.
XML + Namespaces: core concepts
DTDs, a simple schema language for XML documents
DOM and StAX/SAX, APIs for XML documents
XPath, a navigation query language for XML documents
XQuery, a query language for XML documents
XML Schema and RELAX NG, more expressive schema languages for XML documents
XSLT, a transformation language for XML documents
Schematron: a difference sort of schema language
HTML5 and CSS, the web, stucture, and style
Models of Error handling
Tree Grammars, Formal Foundations of Schemas Languages
Other models of Semi-structured Data
Readings are primarily given through the Blackboard module for this course. They include research papers and technical specifications (especially from the W3C).
We will use the
Title: XML in a nutshell: a desktop quick reference (3rd edition)
Author: Harold, Elliotte Rusty and W.Scott Means
Publisher: O'Reilly Media
Another book available online. Reasonably reliable reference to many XML technologies.
Title: Learning XML (2nd edition)
Author: Ray, Erik T.
Students will find it useful to read through this text before the course starts. Don't believe everything you read! It is available online, for free, via the University Library Page (Safari Tech Books Online)