Big Data and Distributed Data Analytics

Big Data and Distributed Data Analytics

1.

Subject title

Big Data and Distributed Data Analytics

Аналитика на големи податоци и дистрибуирани податоци

2.

Code

m23_s_211

3.

Study program

Bioinformatics, Security, Cryptography and Coding, Еducation with ICT, Inteligent Systems, Internet Technologies and cyber security, Computer Science, Software for embedded systems, Software Engineering, Bioinformatics, Security, Cryptography and Coding, Statistics and Data Analytics, Software Engineering, Statistics and Data Analytics, Cloud Computing, Data science in computer science and engineering, IT management, Eco-informatics, Cloud Computing, IT management,

4.

Organizer of the study program (unit, institute, department, division)

Faculty of Information Sciences and Computer Engineering

5.

Study cycle (first, second, third)

Втор циклус

6.

Academic year / semester

5 / Летен

7. Number of ECTS credits

6.0

8.

Instructor

ворн. проф. д-р Милош Јовановиќ проф. д-р Вангел Ајановски

9.

Prerequisites for enrollment

10.

Subject goals and competencies:


The purpose of the course is to get acquainted with the concepts of large data, and the process of their analysis of distributed mass storage, to distributed mass processing (live during collection or collection) and analysis of data processing results in order to support support. The decision -making, business improvement and improvement of flows and processes. Competences expected to be acquired by the student after completing the subject: - Know techniques and methods of mass distributed storage of large data - Know techniques and methods of mass distributed preparation of large data for future processing - Know techniques and methods of mass and distributed processing and analysis of large data - to apply the acquired knowledge in a specific real project for storing, processing and processing and analyzing distributed and large data - To enable future architects to project distributed data management solutions, - To enable software engineers to project software solutions in cloud based on distributed databases, - Present the fundamental principles and techniques of future researchers in the field, and give them a basis for future independent research work

11.

Subject content:


Topics processed within this subject: - Introduction to large data. Need and value of large data. Large data from social networks. - Modeling large data and statistical processing of large data. - search and mining large data. - scientific applications with large data. - Privacy, integrity and protection of large data. - Introduction to distributed data processing. - Tools, algorithms and programming techniques for processing large data, such as HDFS, Mapreduce, Zookeper, Hbase and others. - Design and architecture of distributed data and distributed bases data systems. - Processing questionnaires in the distribution environment. - Distributed control of competitive approach and concepts of possible consistency. - Managing distributed bases data. - Processing questionnaires in distributed environment - streaming data and calculating in the cloud - NOSQL management for large data. Graph Analytics.

12.

Learning methods:


- Предавања и вежби со дискусии базирана на примери, анализа на различни достапни примери - Компјутерски потпомогнато учење - Електронско и учење на далечина - Групно истражување и развој - Користење на релевантни софтверски алатки - Изработка на проект и одбрана на проектот

13.

Total available time fund

6.0 ECTS x 30 hours = 180 hours

14.

Time distribution

30 + 30 + 15 + 90 + 15 = 180 hours

15.

Forms of teaching activities

15.1.

Lectures - theoretical teaching

30 hours

15.2.

Exercises (laboratory, classroom), seminars, team work

30 hours

16.

Other forms of activities

16.1.

Project tasks

90 hours

16.2.

Independent tasks

15 hours

16.3.

Homework

15 hours

17.

Grading method

17.1.

Tests

0 points

17.2.

Seminar work / project (presentation: written and oral)

90 points

17.3.

Activities and learning

30 points

17.4.

Final exam

15 points

18.

Grading criteria (points / grade)

up to 50 points

5 (five) (F)

from 51 to 60 points

6 (six) (E)

from 61 to 70 points

7 (seven) (D)

from 71 to 80 points

8 (eight) (C)

from 81 to 90 points

9 (nine) (B)

from 91 to 100 points

10 (ten) (A)

19.

Condition for signature and taking final exam

50% од активностите и првична верзија од проектот

20.

Language of instruction

македонски, англиски

21.

Quality assurance method

механизам на интерна евалуација и анкети

22.

Literature

22.1.

Mandatory literature

No.

Author

Title

Publisher

Year

7873

Viktor Mayer-Shonberger, Kenneth Cukier

Big Data: A Revolution That Will Transform How We Live, Work, and Think

Eamon Dolan/Houghton Mifflin Harcourt; Reprint edition

2014

7874

Eric Siegel

Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die, 2nd ed.

John Wiley & Sons

2016

7875

M. Tamer Özsu, Patrick Valduriez

Principles of Distributed Database Systems, 4th ed.

Springer

2020

7876

Saeed K. Rahimi, Frank S. Haug

Distributed Database Management Systems: A Practical Approach

Wiley-IEEE Computer Society

2010

7877

Jure Leskoec, Anand Rajaraman, Jeffrey D. Ullman

Mining of Massive Datasets, 3rd ed.

Cambridge University Press

2020

7878

Селекција на значајни и актуелни истражувачки трудови од областа –дадени во печатена или електронска форма

0

7879

Електронска документација од страниците на производителите на системите кои се користат во активностите

0

22.2.

Additional literature

No.

Author

Title

Publisher

Year