COMP62421 Querying Data on the Web syllabus 2021-2022
COMP62421 Querying Data on the Web
Level 6
Credits: 15
Enrolled students: 80
Course leader: Norman Paton
Additional staff: view all staff
Requisites
- Pre-Requisite (Compulsory): COMP60411
Additional requirements
The formal requirement is the attendance on Modelling Data on the Web.
However, it is strongly recommended that the student attended a previous course on fundamentals of databases. Some of the activities (assessments) will require programming skills.
Assessment methods
- 50% Written exam
- 50% Coursework
Semester | Event | Location | Day | Time | Group |
---|---|---|---|---|---|
Sem 1 w7-11 | Lecture | Uni Place 2.220 | Wed | 10:00 - 12:00 | - |
Sem 1 w7-11 | DROP-IN | Chemistry G.53 | Fri | 11:00 - 12:00 | - |
Sem 1 w7-11 | ONLINE LabORATORY | Wed | 14:00 - 17:00 | - |
- Data on the Web
Overview
This course unit detail provides the framework for delivery in 20/21 and may be subject to change due to any additional Covid-19 impact. Current students should see Blackboard/course unit related emails for any further updates.
Given the changing landscape of computing towards a predominance of data-centric/data-intensive approaches in both scientific and industrial contexts, organising and querying data is set to become a primary concern in the construction of contemporary systems. The advance of Artificial Intelligence and Data Analysis applications and their requirement to process large-scale and heterogeneous data, creates the demand to build systems which can efficiently query and operate over this data.
This course unit aims to enable students to have a principled and critical understanding of contemporary mechanisms to support efficient access to large-scale and heterogeneous data. The course is organised will around the challenges present on processing different types of data on the Web (Tabular, Tree-shaped, Graph and Document-based), to cover the fundamental algorithms and data structures present “under the hood” of database systems.
Aims
The aim of this course is to provide the conceptual and practical foundations for building and optimizing systems which require accessing large-scale and heterogeneous data.
Syllabus
[Day 1]
Introduction to the Course Unit
Relational Query Processing (1 of 2)
The Architectural Paradigm for Query Processing Systems
The Relational Model of Data
The Relational Calculi and Algebra
The SQL Language
[Day 2]
Relational Query Processing (2 of 2)
Logical Optimization
Physical Optimization
Classical Query Execution
Parallel Query Execution
Query Processing Using XQuery
Motivation for the Language
Example Capabilities
Compilation, Optimization, Evaluation
Applications
[Day 3]
Massively-Parallel Schemes
Replication
Partitioning
Transactions
Consistency and Consensus
NOSQL Databases
Key-Value Store
Document-based Store
Column-based Store
The Map-Reduce Model
Query Processing with Map-Reduce
[Day 4]
Graph Databases
SPARQL
Query Processing Using SPARQL
Example Capabilities
Compilation, Optimization, Evaluation
Applications
[Day 5]
Contemporary Data-Intensive Architectures and Tools
Batch Processing, Stream Processing, Lambda/Kappa Architectures
Data Streams & Event-Centric Platforms
From Query to Machine Learning Pipelines
Supporting Frameworks: Kafka, Spark, Flink
Databases of the Future: Blockchain and AI applications
Teaching methods
The course is structured into 5 full-day lectures and lab sessions. Formative and summative assessments will be performed during the lectures. Some lectures will require active student engagement on the TLAs (e.g. work along exercises, changing activities, quizes).
Summative assessments consists of:
- One closed-book exam
- Quizzes and lab work
Some exercises might involve lightweight programming tasks.
Feedback methods
Coursework is assigned and lab sessions provide an opportunity for interaction. Coursework is marked offline with feedback given in writing. Lab sessions allow students to discuss the written feedback in more depth with the marker. The course unit will use the standard tools available in virtual learning environments for hints, tips, discussions, etc.
Study hours
- Lectures (25 hours)
- Practical classes & workshops (10 hours)
Employability skills
- Analytical skills
- Problem solving
- Research
- Written communication
Learning outcomes
On successful completion of this unit, a student will be able to:
By the end of the course, students will be able to:
- Describe and differentiate different types of databases and their supporting querying syntax.
- Describe and differentiate query processing approaches for different types of data (Tabular, Tree-shaped, Graph, Document-based).
- Apply and evaluate query optimization strategies.
- Explain how different algorithms and data structures affect query performance for different types of data.
- Argue, contrast and compare different architectures and query optimisation strategies.
- Demonstrate and program queries over different databases.
- Analise a new data management situation and design the appropriate methods for it.
Reading list
Title | Author | ISBN | Publisher | Year |
---|---|---|---|---|
Learning SPARQL : querying and updating with SPARQL 1.1 | DuCharme, Bob. | 9781449313616 (e-book); 1449313612 (e-book) | Sebastopol California ; O'Reilly | 2011. |
Database systems [electronic resource] : the complete book | Garcia-Molina, Hector, | 9781292037301; 129203730X; 9781292024479; 129202447X | Pearson Education Limited; Dawson Books | [2014] |
Additional notes
Course unit materials
Links to course unit teaching materials can be found on the Department of Computer Science website for current students.