Skip to navigation | Skip to main content | Skip to footer
Menu
Menu

COMP62421 Querying Data on the Web syllabus 2018-2019

COMP62421 materials

COMP62421 Querying Data on the Web

Level 6
Credits: 15
Enrolled students: 41

Course lecturers: Andre Freitas


Bijan Parsia

Additional staff: view all staff

Requisites

  • Pre-Requisite (Compulsory): COMP60411

Additional requirements

  • The formal requirement is the attendance on Modelling Data on the Web.

    However, it is strongly recommended that the student attended a previous course on fundamentals of databases. Some of the activities (assessments) will require programming skills.

Assessment methods

  • 50% Written exam
  • 50% Coursework
Timetable
SemesterEventLocationDayTimeGroup
Sem 1 P2 Lecture 2.19 Fri 09:00 - 15:00 -
Sem 1 P2 Lecture 2.19+2.25A+2.25B Fri 15:00 - 16:00 -
Sem 1 P2 Lab 2.25 (A+B) Fri 16:00 - 17:00 -
Themes to which this unit belongs
  • Data on the Web

Overview

Given the changing landscape of computing towards a predominance of data-centric/data-intensive approaches in both scientific and industrial contexts, organising and querying data is set to become a primary concern in the construction of contemporary systems. The advance of Artificial Intelligence and Data Analysis applications and their requirement to process large-scale and heterogeneous data, creates the demand to build systems which can efficiently query and operate over this data.

This course unit aims to enable students to have a principled and critical understanding of contemporary mechanisms to support efficient access to large-scale and heterogeneous data. The course is organised will around the challenges present on processing different types of data on the Web (Tabular, Tree-shaped, Graph and Document-based), to cover the fundamental algorithms and data structures present “under the hood” of database systems.

 

Aims

The aim of this course is to provide the conceptual and practical foundations for building and optimizing systems which require accessing large-scale and heterogeneous data.

Syllabus

[Day 1]

Introduction to the Course Unit

Relational Query Processing (1 of 2)

            The Architectural Paradigm for Query Processing Systems

            The Relational Model of Data

            The Relational Calculi and Algebra

            The SQL Language

 

[Day 2]

Relational Query Processing (2 of 2)

            Logical Optimization

            Physical Optimization

            Classical Query Execution

            Parallel Query Execution

 

Query Processing Using XQuery

            Motivation for the Language

            Example Capabilities

            Compilation, Optimization, Evaluation

            Applications

 

[Day 3]

Massively-Parallel Schemes

       Replication

       Partitioning

       Transactions

       Consistency and Consensus

 

NOSQL Databases

          Key-Value Store

          Document-based Store

          Column-based Store

The Map-Reduce Model

Query Processing with Map-Reduce

 

[Day 4]

Graph Databases

SPARQL

Query Processing Using SPARQL

            Example Capabilities

            Compilation, Optimization, Evaluation

            Applications

 

[Day 5]

Contemporary Data-Intensive Architectures and Tools

Batch Processing, Stream Processing, Lambda/Kappa Architectures

Data Streams & Event-Centric Platforms

From Query to Machine Learning Pipelines

Supporting Frameworks: Kafka, Spark, Flink

Databases of the Future: Blockchain and AI applications

Teaching methods

The course is structured into 5 full-day lectures and lab sessions. Formative and summative assessments will be performed during the lectures. Some lectures will require active student engagement on the TLAs (e.g. work along exercises, changing activities, quizes).

Summative assessments consists of:

  • One closed-book 2 hour written exam
  • 5 quizes, 2 essays and 5 weekly exercises including problem-solving lab work

Some exercises might involve lightweight programming tasks.

Feedback methods

Coursework is assigned and lab sessions provide an opportunity for interaction. Coursework is marked offline with feedback given in writing. Lab sessions allow students to discuss the written feedback in more depth with the marker. The course unit will use the standard tools available in virtual learning environments for hints, tips, discussions, etc.

Study hours

  • Lectures (25 hours)
  • Practical classes & workshops (10 hours)

Employability skills

  • Analytical skills
  • Problem solving
  • Research
  • Written communication

Learning outcomes

Programme outcomeUnit learning outcomesAssessment
A1Have acquired knowledge of cutting-edge, research-led DBMS research.
  • Examination
  • Individual coursework
A2 A3Be able to compare and contrast the variety of approaches used in DBMS research to address the challenges raised by new software architectures, new kinds of data resource and new computational fabrics.
  • Examination
  • Individual coursework
B1Be able to identify, understand and articulate the shortcomings of current DBMS research and to suggest, in broad terms, possible strategies and approaches that might be used to overcome them.
  • Examination
  • Individual coursework

Reading list

COMP62421 does not have a specified reading list.

Additional notes

Course unit materials

Links to course unit teaching materials can be found on the School of Computer Science website for current students.