Skip to navigation | Skip to main content | Skip to footer
Menu
Menu

COMP62421 Querying Data on the Web syllabus 2017-2018

COMP62421 materials

COMP62421 Querying Data on the Web

Level 6
Credits: 15
Enrolled students: 54

Course leader: Alvaro A. A. Fernandes


Additional staff: view all staff

Requisites

  • Pre-Requisite (Compulsory): COMP60411

Additional requirements

  • Pre-requisites

    Comparable knowledge to that provided by: COMP23111 Fundamentals of Databases

Assessment methods

  • 50% Written exam
  • 50% Coursework
Timetable
SemesterEventLocationDayTimeGroup
Sem 1 P2 Lecture 2.19 Fri 09:00 - 16:00 -
Sem 1 P2 Lab 2.25 (A+B) Fri 15:00 - 17:00 -
Themes to which this unit belongs
  • Data on the Web

Overview

This course unit offers an introduction to the principles and techniques underpinning query processing systems and technologies, with a particular focus on data on the Web.

More precisely, the course unit understands the notion of querying data on the Web as having two principal interpretations, both of which are covered in the course. On the one hand, querying data on the Web can be understood as querying XML and RDF data using W3C-recommended languages such as XQuery and SPARQL. Such activity is significant for the W3C-led vision of a Semantic Web to be fully realized. On the other hand, querying data on the Web can be understood as querying the various kinds of (so-called) NoSQL data, as modelled over different points in the spectrum from unstructured to fully-structured data models. Such activity plays an important role in, among others, the back-end of many Web-centred businesses (from Google to Facebook, from Amazon to eBay).

The basic design of the course unit is as follows. Firstly, classical query processing is studied by revisiting the relational calculi and relational algebra and then delving into the classical principles and techniques for logical and physical optimization of queries over relational data, including parallelization. The second part of the course unit builds on the knowledge acquired in the first part and covers query processing using languages designed specifically for the Web, viz., XQuery and SPARQL. The focus is on how XQuery and SPARQL queries can be optimized and evaluated. The third, and final, part of the course focusses on massively-parallel/distributed platforms for query processing and studies the latter in the context of map-reduce engines as well as NoSQL ones.

Aims

This course unit aims to endow students with knowledge and understanding of query processing technology, particularly with relation to data on the Web.

Given the changing landscape of computing towards a predominance of data-centric approaches in both scientific and industrial contexts, query processing is set to become an increasingly important activity on which organisations will compete. This course unit, therefore, aims to build upon the knowledge that students will have acquired of modelling data on the Web and to focus on studying how queries over data on the Web are expressed, optimised and evaluated. While, for practical reasons of infrastructure, the course unit does not linger on issues that arise when dealing with data on a very large scale, it is of course the case with the Web that the challenges associated with query processing in this setting are particularly interesting and so will be discussed in the course unit.

Significant use is made of reading assignments and other activities in order to stimulate students to engage in independent information acquisition.

Note that this course unit is about query processing from a systems viewpoint (as opposed, for example, to a theoretical one, or to an application-oriented one). Therefore, it concerns itself much more with how query processing systems are built on well-accepted principles, and not as much with how they are used to support applications. It should appeal particularly well to students who enjoy understanding how well-founded systems can be made to deliver advanced, challenging functionality. It is possibly less appealing to students who are more interested in how advanced technologies can be deployed, say, in businesses, in response to specific business needs, although the course unit does attempt to explain the motivations behind the technological advances it covers.

Syllabus

Part 1

[Day 1]

Introduction to the Course Unit [1]

Relational Query Processing (1 of 2)

     The Architectural Paradigm for Query Processing Systems [1]

     The Relational Model of Data [1]

     The Relational Calculi and Algebra [1]

     The SQL Language [1]

 

[Day 2]

Relational Query Processing (2 of 2)

     Logical Optimization [2]

     Physical Optimization [1]

     Classical Query Execution [1]

     Parallel Query Execution [1]

 

Part 2

[Day 3]

Query Processing Using XQuery

     Motivation for the Language [1]

     Example Capabilities [1]

     Compilation, Optimization, Evaluation [2]

     Applications [1]

 

[Day 4]

Query Processing Using SPARQL

     Motivation for the Language [1]

     Example Capabilities [1]

     Compilation, Optimization, Evaluation [2]

     Applications [1]

 

Part 3

[Day 5]

Map-Reduce for Query Processing

     The Models and Platforms [1]

     Using the Platform for Query Processing [1]

     Applications [1]

 

NoSQL Query Processing

     The Models and Platforms [1]

     Applications [1]

Teaching methods

One closed-book, 5 question, 50 mark, 2 hour written exam

Five weekly exercises including problem-solving lab work

Feedback methods

Coursework is assigned and lab sessions provide an opportunity for interaction. Coursework is marked offline with feedback given in writing. Lab sessions allow students to discuss the written feedback in more depth with the marker. The course unit will use the standard tools available in virtual learning environments for hints, tips, discussions, etc.

Study hours

  • Lectures (25 hours)
  • Practical classes & workshops (10 hours)

Employability skills

  • Analytical skills
  • Problem solving
  • Research
  • Written communication

Learning outcomes

Programme outcomeUnit learning outcomesAssessment
A1Have acquired knowledge of cutting-edge, research-led DBMS research.
  • Examination
  • Individual coursework
A2 A3Be able to compare and contrast the variety of approaches used in DBMS research to address the challenges raised by new software architectures, new kinds of data resource and new computational fabrics.
  • Examination
  • Individual coursework
B1Be able to identify, understand and articulate the shortcomings of current DBMS research and to suggest, in broad terms, possible strategies and approaches that might be used to overcome them.
  • Examination
  • Individual coursework

Reading list

COMP62421 does not have a specified reading list.

Additional notes

Course unit materials

Links to course unit teaching materials can be found on the School of Computer Science website for current students.