Data Engineering

Data Engineering

1.

Subject title

Data Engineering

Data Engineering

2.

Code

DS003

3.

Study program

Data science in computer science and engineering, Cloud Computing, IT management, Security, Cryptography and Coding, Bioinformatics, Еducation with ICT, Eco-informatics, Inteligent Systems, Internet Technologies and cyber security, Computer Science, Software for embedded systems, Software Engineering, Cloud Computing, IT management, Bioinformatics, Security, Cryptography and Coding, Software Engineering, Statistics and Data Analytics, Statistics and Data Analytics,

4.

Organizer of the study program (unit, institute, department, division)

Faculty of Information Sciences and Computer Engineering

5.

Study cycle (first, second, third)

Втор циклус

6.

Academic year / semester

5 / Зимски

7. Number of ECTS credits

6.0

8.

Instructor

ворн. проф. д-р Ефтим Здравевски проф. д-р Георгина Мирчева проф. д-р Слободан Калајџиски

9.

Prerequisites for enrollment

10.

Subject goals and competencies:


Data engineering is a subfield of data science responsible for designing, building, and maintaining data infrastructure to collect, process, store, and deliver data to be used and analyzed at scale. The students will be capable of analyzing and organizing raw data through multiple stages of data processing and understanding challenges that often arise in real-life production settings. Students will also learn about solutions, technologies, and architectures to overcome these scalability and maintainability challenges in on-premise and cloud environments. This will help students to recognize data trends and patterns, prepare data for prescriptive and predictive modeling, and data visualization.

11.

Subject content:


Data pipelines and stages of data engineering Data collection Data preprocessing Data standardization, curation, and integration Data aggregation considerations in big data systems Data fusion Data visualization Data storage in scalable Big Data systems Data lakes Lakehouse architecture Changing data capture, data versioning, and loading strategies Scalable processing of streaming and batch data Data engineering challenges and effective deployment strategies Infrastructure provisioning and Continuous Integration and Deployment of Data Pipelines Data cataloging and lineage Data governance

12.

Learning methods:


Презентации, студии на случај...

13.

Total available time fund

6.0 ECTS x 30 hours = 180 hours

14.

Time distribution

30 + 45 + 0 + 60 + 60 = 180 hours

15.

Forms of teaching activities

15.1.

Lectures - theoretical teaching

30 hours

15.2.

Exercises (laboratory, classroom), seminars, team work

45 hours

16.

Other forms of activities

16.1.

Project tasks

60 hours

16.2.

Independent tasks

0 hours

16.3.

Homework

60 hours

17.

Grading method

17.1.

Tests

0 points

17.2.

Seminar work / project (presentation: written and oral)

60 points

17.3.

Activities and learning

0 points

17.4.

Final exam

0 points

18.

Grading criteria (points / grade)

up to 50 points

5 (five) (F)

from 51 to 60 points

6 (six) (E)

from 61 to 70 points

7 (seven) (D)

from 71 to 80 points

8 (eight) (C)

from 81 to 90 points

9 (nine) (B)

from 91 to 100 points

10 (ten) (A)

19.

Condition for signature and taking final exam

NULL

20.

Language of instruction

Англиски

21.

Quality assurance method

механизам на интерна евалуација и анкети

22.

Literature

22.1.

Mandatory literature

No.

Author

Title

Publisher

Year

6999

Paul Crickard

Data Engineering with Python

Packt Publishing Ltd

2020

7000

Bill Chambers

Spark – The Definitive Guide: Big data processing made simple

O`Reilly

2018

7001

Manoj Kukreja and Danil Zburivsky

Data Engineering with Apache Spark, Delta Lake, and Lakehouse: Create scalable pipelines that ingest, curate, and aggregate complex data in a timely and secure way

Packt

2021

22.2.

Additional literature

No.

Author

Title

Publisher

Year