Big data modeling and management

Big data modeling and management

1.

Subject title

Big data modeling and management

Моделирање и управување на големи податоци

2.

Code

m23_s_055

3.

Study program

IT management, Bioinformatics, Security, Cryptography and Coding, Еducation with ICT, Eco-informatics, Inteligent Systems, Internet Technologies and cyber security, Computer Science, Statistics and Data Analytics, Software for embedded systems, Software Engineering, Bioinformatics, Security, Cryptography and Coding, Statistics and Data Analytics, IT management, Software Engineering, Cloud Computing, Data science in computer science and engineering, Cloud Computing,

4.

Organizer of the study program (unit, institute, department, division)

Faculty of Information Sciences and Computer Engineering

5.

Study cycle (first, second, third)

Втор циклус

6.

Academic year / semester

5 / Летен

7. Number of ECTS credits

6.0

8.

Instructor

ворн. проф. д-р Ефтим Здравевски проф. д-р Горан Велинов

9.

Prerequisites for enrollment

10.

Subject goals and competencies:


Trends on the development of traditional relational (SQL) database management systems, data repositories, as well as the concepts of NOSQL and NewsQL systems for managing large data will be studied. The concepts of storing data on different storage media will be reviewed. Approaches for central or distributed storage will be studied, as well as a logical organization in queues, columns, graphs or documents. The methods of partitioning and indexing structured, (semi/non) structured and textual data will be studied. Real approaches and solutions will be processed to overcome the challenges in terms of modeling, management, implementation and release for the production of large data systems. Students at the end of the subject will know which systems are most suitable and what steps are needed to introduce large data systems in companies, as well as the challenges that companies face.

11.

Subject content:


New views of warehouses so data: conceptual, logical and physical models; Concepts of data lakes. Overview of large data management systems. Modeling data in large data systems: Time implications for data modeling. Concepts databases organized by columns (Monetdb, Hbase, Cassandra, Key-Value (Dynamodb, Riak), after documents (Mongodb, CouchDB), and in graphs (Neo4J, OrientDB). Transaction and analytical databases operating in main memory. Alternative media storage media. Strategies for indexing and partitioning and their impact on scalability and performance; Text indexing databases (Solr, ElASTICSEARCH) Integrating different data sources; Development planning, capacity and infrastructure. Systems and tools for static large data analysis, such as: Spark, Spark SQL, Hive, Pig, Tez and newer. Techniques for handling data streams; Systems and tools for analyzing dynamic large data such as: Spark Streaming, Storm, Oozie, Sqoop, Flink and newer. Managing the release of systems: expectations, assumptions, risks, building teams Strategies and scenarios for migration, security and storage backups of large data.

12.

Learning methods:


Предавања поддржани со презентации преку слајдови, интерактивни предавања, вежби (користење на опрема и софтверски пакети), тимска работа, пример случаи, поканети гости предавачи, самостојна изработка и одбрана на проектна задача и семинарска работа, учење во електронско опкружување (форуми, консултации).

13.

Total available time fund

6.0 ECTS x 30 hours = 180 hours

14.

Time distribution

30 + 30 + 30 + 45 + 45 = 180 hours

15.

Forms of teaching activities

15.1.

Lectures - theoretical teaching

30 hours

15.2.

Exercises (laboratory, classroom), seminars, team work

30 hours

16.

Other forms of activities

16.1.

Project tasks

45 hours

16.2.

Independent tasks

30 hours

16.3.

Homework

45 hours

17.

Grading method

17.1.

Tests

30 points

17.2.

Seminar work / project (presentation: written and oral)

45 points

17.3.

Activities and learning

20 points

17.4.

Final exam

0 points

18.

Grading criteria (points / grade)

up to 50 points

5 (five) (F)

from 51 to 60 points

6 (six) (E)

from 61 to 70 points

7 (seven) (D)

from 71 to 80 points

8 (eight) (C)

from 81 to 90 points

9 (nine) (B)

from 91 to 100 points

10 (ten) (A)

19.

Condition for signature and taking final exam

реализирани активности 15.1 и 15.2

20.

Language of instruction

македонски и

21.

Quality assurance method

механизам на интерна евалуација и анкети

22.

Literature

22.1.

Mandatory literature

No.

Author

Title

Publisher

Year

6534

Franz Faerber, Alfons Kemper, Per-Åke Larson, Justin Levandoski, Thomas Neumann and Andrew Pavlo

Main Memory Database Systems, Foundations and Trends in Databases

Now Publishers

2017

6535

Daniel Abadi, Peter Boncz, Stavros Harizopoulos, Stratos Idreos and Samuel Madden

The Design and Implementation of Modern Column-Oriented Database Systems

Now Publishers

2013

6536

García Márquez, Fausto Pedro, Lev, Benjamin

Big Data Management

Springer

2017

6537

Corea, Francesco

Big Data Analytics: A Management Perspective

Springer

2016

6538

Moshirpour, Mohammad, Far, Behrouz, Alhajj, Reda

Highlighting the Importance of Big Data Management and Analysis for Various Applications

Springer

2018

6539

Sherif Sakr and Mohamed Gaber

Large Scale and Big Data: Processing and Management

CRC Press

2014

6540

Shivnath Babu and Herodotos Herodotou

Massively Parallel Databases and MapReduce Systems

Now Publishers

2013

22.2.

Additional literature

No.

Author

Title

Publisher

Year