Applied tasks of data analysis

Baklanova Olga Evgenyevna

The instructor profile

Description: The course is aimed at solving applied problems from various fields of data analysis: text analysis and information search, collaborative filtering and recommendation systems, business analytics, time series forecasting. During the course, the student acquires practical skills in extracting features from heterogeneous data, evaluating the quality of developed models, and applying various machine learning algorithms.

Amount of credits: 10

Пререквизиты:

  • Algorithmization and programming technologies

Course Workload:

Types of classes hours
Lectures 30
Practical works 30
Laboratory works 60
SAWTG (Student Autonomous Work under Teacher Guidance) 30
SAW (Student autonomous work) 150
Form of final control Exam
Final assessment method

Component: University component

Cycle: Profiling disciplines

Goal
  • The discipline "Applied problems of data analysis" is a proposal of the Data Science course and is aimed at fulfilling two goals. The first is to teach applied statistics, which is related to the implementation of machine learning and the analysis of results. This includes analyzing and comparing models, interpreting machine learning models, and planning and analyzing A/B test results. The second is to give experience in solving complex problems using machine learning methods. Namely, the problems of natural language processing, working with images and signals, forecasting time series, building recommender systems are considered.
Objective
  • formation of skills in the use of statistical methods in the analysis of linear models
  • formation of skills for interpreting machine learning models
  • formation of skills for statistical testing of hypotheses
  • formation of skills for the implementation of A / B testing
  • formation of skills of statistical methods of forecasting
  • formation of forecasting skills using machine learning
  • acquire practical skills in building recommender systems
  • acquire practical skills in solving computer vision problems
  • audio signal analysis, neural network approaches
  • formation of natural language analysis skills
Learning outcome: knowledge and understanding
  • Be proficient in time series forecasting methods
  • Know the main quality metrics of recommender systems
  • Master the basic methods of text analysis
Learning outcome: applying knowledge and understanding
  • Be able to interpret machine learning models
  • Be able to plan and analyze the results of A/B tests
  • Know how to build recommender systems
  • Be able to solve the main tasks of image and signal analysis
  • Be able to solve complex problems of text analysis (machine translation, etc.)
Learning outcome: formation of judgments
  • Formation of judgments about solving complex problems of text analysis (machine translation, etc.)
  • Formation of judgments about the construction of recommender systems
  • Interpreting Machine Learning Models
Learning outcome: communicative abilities
  • development and improvement of communicative abilities of students;
  • development of skills to participate in a constructive dialogue about the role and importance of artificial intelligence systems in the modern world, various directions in systems artificial intelligence
Learning outcome: learning skills or learning abilities
  • formation of skills in the field of artificial intelligence systems for the implementation of research work
  • developing skills for building recommender systems
  • the ability to contribute, within academic and professional contexts, to technological, social or cultural development in the interest of building a knowledge society
Teaching methods

- lectures and online lectures, laboratory classes using slides and other multimedia tools.

Assessment of the student's knowledge

Teacher oversees various tasks related to ongoing assessment and determines students' current performance twice during each academic period. Ratings 1 and 2 are formulated based on the outcomes of this ongoing assessment. The student's learning achievements are assessed using a 100-point scale, and the final grades P1 and P2 are calculated as the average of their ongoing performance evaluations. The teacher evaluates the student's work throughout the academic period in alignment with the assignment submission schedule for the discipline. The assessment system may incorporate a mix of written and oral, group and individual formats.

Period Type of task Total
1  rating ИДЗ №1 0-100
ИДЗ №2
ИДЗ №3
2  rating ИДЗ №4 0-100
ИДЗ №5
ИДЗ №6
Total control Exam 0-100
The evaluating policy of learning outcomes by work type
Type of task 90-100 70-89 50-69 0-49
Excellent Good Satisfactory Unsatisfactory
Evaluation form

The student's final grade in the course is calculated on a 100 point grading scale, it includes:

  • 40% of the examination result;
  • 60% of current control result.

The final grade is calculated by the formula:

FG = 0,6 MT1+MT2 +0,4E
2

 

Where Midterm 1, Midterm 2are digital equivalents of the grades of Midterm 1 and 2;

E is a digital equivalent of the exam grade.

Final alphabetical grade and its equivalent in points:

The letter grading system for students' academic achievements, corresponding to the numerical equivalent on a four-point scale:

Alphabetical grade Numerical value Points (%) Traditional grade
A 4.0 95-100 Excellent
A- 3.67 90-94
B+ 3.33 85-89 Good
B 3.0 80-84
B- 2.67 75-79
C+ 2.33 70-74
C 2.0 65-69 Satisfactory
C- 1.67 60-64
D+ 1.33 55-59
D 1.0 50-54
FX 0.5 25-49 Unsatisfactory
F 0 0-24
Topics of lectures
  • Методы построения моделей сложных систем
  • Критерии выбора аналитических платформ и пакетов Data Mining
  • Основные этапы моделирования
  • Методика анализа данных
  • Data Mining, KDD и взаимосвязи между ними
  • Аналитическая отчетность и многомерное представление данных
  • Хранилище данных
  • Алгоритмы, получившие наибольшее распространение для каждого типа задач
  • Этапы подготовки данных
  • Выдвижение гипотез
  • Методы сбора и систематизации фактов
  • Методы проведения экспертиз для выявления наиболее значимых факторов
  • Понятия парциальной и комплексной обработки
  • Анализ качества полученных моделей
  • Основные этапы внедрения систем анализа данных
Key reading
  • Анализ данных : учебник для вузов / В. С. Мхитарян [и др.] ; под редакцией В. С. Мхитаряна. — Москва : Издательство Юрайт, 2020. — 490 с. — (Высшее образование). — ISBN 978-5-534-00616-2. — Текст : электронный // ЭБС Юрайт [сайт]. — URL: https://urait.ru/bcode/450166 (дата обращения:06.04.2022).
  • Маккинли, Уэс Python и анализ данных / Уэс Маккинли ; перевод А. Слинкина. — 2-е изд. — Саратов : Профобразование, 2019. — 482 c. — ISBN 978-5-4488-0046-7. — Текст : электронный // Электронно-библиотечная система IPR BOOKS : [сайт]. — URL: http://www.iprbookshop.ru/88752.html (дата обращения: 06.04.2022). — Режим доступа: для авторизир. пользователей
  • Барсегян, А.А., Куприянов М. С. и др. Технологии анализа данных : Data Mining, Visual Mining, Text Minning, OLAP : учеб. - 2-е изд., перераб. и доп. -СПб. : БХВ-Петербург, 2017.
Further reading
  • Шнарева, Г. В. Анализ данных : учебно-методическое пособие / Г. В. Шнарева, Ж. Г. Пономарева. — Симферополь : Университет экономики и управления, 2019. — 129 c. — ISBN 2227-8397. — Текст : электронный // Электронно-библиотечная система IPR BOOKS : [сайт]. — URL: http://www.iprbookshop.ru/89482.html (дата обращения: 06.04.2022). — Режим доступа: для авторизир. пользователей
  • Мельниченко, А. С. Математическая статистика и анализ данных : учебное пособие / А. С. Мельниченко. — Москва : Издательский Дом МИСиС, 2018. — 45 c. — ISBN 978-5-906953-62-9. — Текст : электронный // Электронно-библиотечная система IPR BOOKS : [сайт]. — URL: http://www.iprbookshop.ru/78563.html (дата обращения: 06.04.2022). — Режим доступа: для авторизир. пользователей