COMP60411 Semi-Structured Data and the Web syllabus 2014-2015
This course unit covers various formalisms (predominantly XML) and applications for Semi-Structured Dat . Semi-structured data focuses on describing and querying data that comes in a format less tightly structured than that found in relational databases. Such data is dominant on the Web, from HTML pages and weblog feeds to SOAP messages and vector graphics.
See the pitch talk for more details.
This course unit aims to give students a good overview of the ideas and the techniques which are behind the description and query mechanisms for semi-structured data as well as various design and modelling issues that arise, especially in the context of the Web. We discuss the basic concepts of semi-structured data and their representation as well as three major families of formalisms for semi-structured data:
- XML (including Schema languages for XML data (DTD and XMLSchema)
- Processing and manipulating XML data (XPath, XQuery)
- Some theoretical aspects of XML data processing)
- HTML (including HTML 5, CSS, and AJAX), and knowledge representation based languages such as RDF and OWL.
Laboratory sessions will ground the abstract notions on practical cases and tools.
- Introduction: Semi-structured data.
- XML: core concepts
- DTDs, a simple schema language for XML documents
- XPath, a navigation language for XML documents
- XML namespace: a concept ignored so far
- XSLT, a transformation language for XML documents
- DOM and SAX, a programmatic manipulation language for XML documents
- XML Schema, a more expressive schema language for XML documents
- XQuery, a query language for XML documents
- HTML 5: text/html vs. application/xml+xhtml
- Validation of HTML 5 (including use of Schematron)
- CSS and the DOM: Web Data vs. Web Documents vs Web Applications
- RDF and Linked Data
- OWL: How inference can help data
Feedback methodsWeekly coursework will be collected via Blackboard, and feedback is provided through the same mechanism.
- Assessment written exam (2 hours)
- Lectures (20 hours)
- Practical classes & workshops (15 hours)
- Analytical skills
- Problem solving
- Written communication
|Programme outcome||Unit learning outcomes||Assessment|
|G1||Have an understanding of the history and foundations semi-structured data and their representation.|
|G1||Have an understanding of XML, schema languages for XML (DTD, XML Schema, Relax-NG, Schematron), processing, querying, and manipulating XML data (DOM, SAX, XPath, XQuery), and some theoretical aspects of XML data processing.|
|G1||Have an understanding of HTML (esp. HTML 5), CSS, HTML validation, HTML processing and manipulation for Web Applications, and an an understanding of the design issues involved.|
|G1||Have an understanding of knowledge represenation based languages such as RDF and OWL, their semantics, the theoretical aspects of their core inference services, and their applicability to dealing with semi-structured data.|
|G2 G3 G4||Have mastered the basic range of techniques for representing, modelling, and querying semi-structured data, and be able to use tools developed for them.|
COMP60411 does not have a specified reading list.
Course unit materials
Links to course unit teaching materials can be found on the School of Computer Science website for current students.