Big Data analytics

Chung-Ming Own

The instructor profile

Description: The course is aimed at developing the skills of analyzing large amounts of data to solve scientific and technological problems in the direction of dissertation research. The doctoral student acquires practical skills in solving experimental and theoretical problems in the field of big data analytics. The acquired practical skills will make it possible to present the results of the dissertation research in scientific journals and research reports in the form of a computer executable module of the computational model of experimental data attached to the article.

Amount of credits: 5

Пререквизиты:

  • Introduction to Data Mining methods

Course Workload:

Types of classes hours
Lectures 15
Practical works
Laboratory works 30
SAWTG (Student Autonomous Work under Teacher Guidance) 75
SAW (Student autonomous work) 30
Form of final control Exam
Final assessment method exam

Component: Component by selection

Cycle: Profiling disciplines

Goal
  • Acquisition of practical skills in conducting scientific research using modern data analysis technologies. Development of big data analysis skills to solve a wide range of applications, including analysis of corporate data, financial data from global data warehouse markets, data storage and processing modeling, forecasting of complex indicators.
Learning outcome: knowledge and understanding
  • Understand the theory and foundations of storing, processing and analyzing big data, advanced tools for collecting, storing, transferring and visualizing big data.
Learning outcome: applying knowledge and understanding
  • Be able to process and analyze large amounts of data using modern software
Learning outcome: formation of judgments
  • the ability to independently apply methods and means of knowledge, learning and self-control, to be aware of the prospects of intellectual, cultural, moral, physical and professional self-development and self-improvement, to be able to critically assess their strengths and weaknesses.
Learning outcome: communicative abilities
  • arry out communications in the professional sphere and in society as a whole, including in a foreign language, analyze existing and develop independently technical documentation, clearly state and protect the results of complex engineering activities in the field of IT technologies
Learning outcome: learning skills or learning abilities
  • readiness to change social, economic, professional roles, geographic and social mobility in the face of dynamics of change, to continue learning independently
Teaching methods

When conducting training sessions, the use of the following educational technologies is provided: - Technology of research activities - Technology of educational and research activities - Communication technologies (discussions, press conference, brainstorming, educational debates, etc.) - Information and communication (including remote) technologies

Assessment of the student's knowledge

Teacher oversees various tasks related to ongoing assessment and determines students' current performance twice during each academic period. Ratings 1 and 2 are formulated based on the outcomes of this ongoing assessment. The student's learning achievements are assessed using a 100-point scale, and the final grades P1 and P2 are calculated as the average of their ongoing performance evaluations. The teacher evaluates the student's work throughout the academic period in alignment with the assignment submission schedule for the discipline. The assessment system may incorporate a mix of written and oral, group and individual formats.

Period Type of task Total
1  rating Laboratory work 1 0-100
Laboratory work 2
Laboratory work 3
2  rating Laboratory work 4 0-100
Laboratory work 5
Laboratory work 6
Total control Exam 0-100
The evaluating policy of learning outcomes by work type
Type of task 90-100 70-89 50-69 0-49
Excellent Good Satisfactory Unsatisfactory
Evaluation form

The student's final grade in the course is calculated on a 100 point grading scale, it includes:

  • 40% of the examination result;
  • 60% of current control result.

The final grade is calculated by the formula:

FG = 0,6 MT1+MT2 +0,4E
2

 

Where Midterm 1, Midterm 2are digital equivalents of the grades of Midterm 1 and 2;

E is a digital equivalent of the exam grade.

Final alphabetical grade and its equivalent in points:

The letter grading system for students' academic achievements, corresponding to the numerical equivalent on a four-point scale:

Alphabetical grade Numerical value Points (%) Traditional grade
A 4.0 95-100 Excellent
A- 3.67 90-94
B+ 3.33 85-89 Good
B 3.0 80-84
B- 2.67 75-79
C+ 2.33 70-74
C 2.0 65-69 Satisfactory
C- 1.67 60-64
D+ 1.33 55-59
D 1.0 50-54
FX 0.5 25-49 Unsatisfactory
F 0 0-24
Topics of lectures
  • Introduction to data science and big data
  • Main tasks and methods of big data analysis
  • Data Storage and Management for Big Data
  • Big Data Storage Technologies
  • Distributed Computing and Parallel Processing
  • Architecture of the Hadoop big data analysis and processing ecosystem
  • Machine Learning for Big Data
  • Big Data Visualization
  • Cloud Computing and Big Data
  • Ethical aspects of using Big Data
  • NoSQL databases
  • Graph databases
  • Apache Spark Technology in Data Processing
  • Future Trends in Big Data Analytics
Key reading
  • Hadley Wickham, Garrett Grolemund. R for Data Science, O'Reilly, 2021
  • Wes McKinney. Python for Data Analysis, O'Reilly, 2022
  • Jules S. Damji, Brooke Wenig, Tathagata Das, Denny Lee. Learning Spark, 2nd Edition,O'Reilly Media, Inc., 2020
  • Apache Hadoop, url: https://hadoop.apache.org/
  • Apache Spark, url https://spark.apache.org/
  • François Chollet. Deep Learning with Python, Manning Publications, 2021
  • Fundamentals of Data Visualization, url: https://clauswilke.com/dataviz/
  • Getting Started: Graph Database | Neo4j, url: https://medium.com/data-science/getting-started-graph-database-neo4j-df6ebc9ccb5b
Further reading
  • Ambuj Agrawal. No-Code Artificial Intelligence, Published by BPB Online, 2023
  • Mark Watson. Ambuj Agrawal. Practical Python Artificial Intelligence Programming, 2023