-
Instructor: M. Tamer Özsu (Office: DC3350)
-
Lecture Room: MC 2034
-
Lecture Time: Monday & Wednesday 2:30-3:50pm
-
Office Hour M 1:00-2:00pm
Introduction to data engineering issues in data science. Data management technology objectives. Structured data management: Relational database technology, database workloads (OLTP vs OLAP). Big data issues: dealing with volume (geo-distributed, cluster parallel, and cloud-native data management), dealing with variety (data type-native systems, NoSQL database systems), dealing with velocity (streaming data management), and big data processing platforms (MapReduce, Spark). Data preparation pipeline: data acquisition, data integration (data warehouses, data lakes, lake houses), dataset selection, data quality and cleaning, data provenance management. Introduction to several current topics in database research, such as Large Language Models, vector databases.
Open to Master of Data Science and Artificial Intelligence students and others without an undergraduate course on database systems (instructor approval required).
-
This is a course that is specially designed for the data science program. It is an in-person course and no accommodations are made for remote attendance. Please make arrangements to attend lectures.
-
The course will use LEARN for dissemination of notes and for discussions. I have set up discussion topics for different components of the course. Please post at the appropraite forum and refrain from sending me email with questions; post them on the discussion fora.
-
I will be posting lecture slides on LEARN (look under Content/Course Slides). However, they may be posted shortly before lectures or sometimes even after a lecture. Some of them will be detailed, others just a skeleton. So, it is important to attend lectures to get the most from these.
-
There is no textbook for the course. I have started to write my notes and I'll be posting them here (look under Content/Course Notes). I make no promises about the availability of these notes for every topic -- I'll write as much as I can. I may assign reading from other textbooks as appropriate.
-
I intend to have guest lecturers for some topics and will update the schedule as I get them confirmed. These guest lecture material are important component of the course.
-
Some logistics:
-
The lecture times are MW 2:30-3:50
-
My office hour is on M1:00-2:00
-
The TAs for this course are not yet identified. I will update the information on LEARN when I know more.
-
-
Final exam schedule will be announced by the Registrar's Office in due course and I cannot change the schedule. There will be no makeup for the final. You will need to take it the next time the course is offered (Fall 2026).
-
There will be five quizzes in the course. These will be 20-30 minute quizzes and may be done online within LEARN (I have not yet decided).
-
There will be two paper reviews. The logistics of these will be revealed later.
-
There will be homework assignments for you to work through the material, but these won't be marked - they are for you to review the material. I will provide solutions when the deadline for working on them is completed.
-
Two 48 hour extensions per student are provided. They may be used on one of the two paper reviews (at most one may be used per paper review). Email me and the TAs at least 24 hours before the deadline to let us know that you're using it, and why. We will adjust the deadline on LEARN.
-
Students can use generative AI tools as aid, but have to write their own text in paper reviews. These will be checked using appropriate tools.
| Week | Lecture | Topic | Speaker |
|---|---|---|---|
| 1 (Jan 5) | 1 | Course introduction; Structured Data Management: Introduction to Database Systems | Tamer Özsu |
| 2 | Introduction to Database Systems | Tamer Özsu | |
| 2 (Jan 12) | 1 | Relational model of data, relational calculus & algebra | Tamer Özsu |
| 2 | Relational algebra, SQL | Tamer Özsu | |
| 3 (Jan 19) | 1 | Database Workloads (OLTP, OLAP & HTAP systems) | Anil Goel |
| 2 | Big data: Dealing with volume | Tamer Özsu | |
| 4 (Jan 26) | 1 | Big data: Dealing with volume | Tamer Özsu |
| 2 | Big data: Dealing with variety | Tamer Özsu | |
| 5 (Feb 2) | 1 | Big data: Dealing with variety | Tamer Özsu |
| 2 | Big data: Dealing with velocity | Tamer Özsu | |
| 6 (Feb 9) | 1 | Big data: Dealing with velocity | Tamer Özsu |
| 2 | Cloud computing & cloud-native data management | ||
| 7 (Feb 16) | Reading week - no classes | ||
| 8 (Feb 23) | 1 | Introduction to data preparation pipeline | Tamer Özsu |
| 2 | Data acquisition | Tamer Özsu | |
| 9 (Mar 2) | 1 | Data integration: Data warehouses | Tamer Özsu |
| 2 | Data integration: Data lakes | ||
| 10 (Mar 9) | 1 | Data integration: Data lakehouses | |
| 2 | Data profiling | ||
| 11 (Mar 16) | 1 | Data quality & data cleaning | Mostafa Milani |
| 2 | Data quality & data cleaning | ||
| 12 (Mar 23) | 1 | Data provenance | |
| 2 | LLMs and Data Management | ||
| 13 (Mar 30) | 1 | LLMs and Data Management | |
| 2 | Vector databases | ||
| Final Exam |
-
Paper critiques (2): 40%
-
Quizzes (5): 20%
-
Final: 40%