Data mining
Description: Data mining refers to a set of techniques that have been designed to efficiently find interesting pieces of information or knowledge in large amounts of data. Association rules, for instance, are a class of patterns that tell which products tend to be purchased together. In this course masterstudents explore how this interdisciplinary field brings together techniques from databases, statistics, machine learning, and information retrieval. They will discuss the main data mining methods currently used, including data warehousing and data cleaning, clustering, classification, association rules mining, query flocks, text indexing and searching algorithms, how search engines rank pages, and recent techniques for web mining. Designing algorithms for these tasks is difficult because the input data sets are very large, and the tasks may be very complex. One of the main focuses in the field is the integration of these algorithms with relational databases and the mining of information from semi-structured data, and masterstudents will examine the additional complications that come up in this case.
Amount of credits: 6
Пререквизиты:
- Expert and Intelligent Systems
Course Workload:
Types of classes | hours |
---|---|
Lectures | 15 |
Practical works | |
Laboratory works | 30 |
SAWTG (Student Autonomous Work under Teacher Guidance) | 45 |
SAW (Student autonomous work) | 90 |
Form of final control | Exam |
Final assessment method |
Component: Component by selection
Cycle: Base disciplines
Goal
- Goals of the course: to familiarize undergraduates with the basic concepts and methods of data mining; develop skills in using the latest data mining software for solving practical problems, gain experience of self-study and research
Objective
- understand algorithms and methods of data mining
- develop data mining programs and applications
- program using available data mining tools and general purpose languages;
- understand analysis, metrics, visualization and navigation of data mining results
- learn how to use a few commercial data mining tools
Learning outcome: knowledge and understanding
- explain the basic principles of the primary data mining techniques
Learning outcome: applying knowledge and understanding
- be able to choose effective methods for solving applied problems using Data Mining technology in the field of business intelligence and research
- design data mining models and databases to use data mining technologies as part of larger systems
Learning outcome: formation of judgments
- the ability to form an idea of non-standard approaches to solving problems and in the search for new original ideas and design techniques using Data Mining technology in the field of business analytics and research
Learning outcome: communicative abilities
- the ability to read and translate IT literature, work with software applications in the field of mining with an English interface
Learning outcome: learning skills or learning abilities
- skills of obtaining new knowledge in the field of professional and continuing education
Teaching methods
- Technology of research activities -
Technology of educational and research activities
- Communication technologies (discussions, press conference, brainstorming, educational debates, etc.)
- Information and communication (including remote) technologies
Assessment of the student's knowledge
Teacher oversees various tasks related to ongoing assessment and determines students' current performance twice during each academic period. Ratings 1 and 2 are formulated based on the outcomes of this ongoing assessment. The student's learning achievements are assessed using a 100-point scale, and the final grades P1 and P2 are calculated as the average of their ongoing performance evaluations. The teacher evaluates the student's work throughout the academic period in alignment with the assignment submission schedule for the discipline. The assessment system may incorporate a mix of written and oral, group and individual formats.
Period | Type of task | Total |
---|---|---|
1 rating | Assignment1 | 0-100 |
Assignment2 | ||
Assignment3 | ||
Midterm1 | ||
2 rating | Assignment4 | 0-100 |
Assignment5 | ||
Assignment6 | ||
Midterm2 | ||
Total control | Exam | 0-100 |
The evaluating policy of learning outcomes by work type
Type of task | 90-100 | 70-89 | 50-69 | 0-49 |
---|---|---|---|---|
Excellent | Good | Satisfactory | Unsatisfactory |
Evaluation form
The student's final grade in the course is calculated on a 100 point grading scale, it includes:
- 40% of the examination result;
- 60% of current control result.
The final grade is calculated by the formula:
FG = 0,6 | MT1+MT2 | +0,4E |
2 |
Where Midterm 1, Midterm 2are digital equivalents of the grades of Midterm 1 and 2;
E is a digital equivalent of the exam grade.
Final alphabetical grade and its equivalent in points:
The letter grading system for students' academic achievements, corresponding to the numerical equivalent on a four-point scale:
Alphabetical grade | Numerical value | Points (%) | Traditional grade |
---|---|---|---|
A | 4.0 | 95-100 | Excellent |
A- | 3.67 | 90-94 | |
B+ | 3.33 | 85-89 | Good |
B | 3.0 | 80-84 | |
B- | 2.67 | 75-79 | |
C+ | 2.33 | 70-74 | |
C | 2.0 | 65-69 | Satisfactory |
C- | 1.67 | 60-64 | |
D+ | 1.33 | 55-59 | |
D | 1.0 | 50-54 | |
FX | 0.5 | 25-49 | Unsatisfactory |
F | 0 | 0-24 |
Topics of lectures
- Introduction and Math Foundations The KDD process and methodology
- Data Warehousing Data Warehousing Concepts Revisited
- OLAP Multidimensional data model
- Knowledge Representation Data, Information, Knowledge What is the data
- Data Preparation for Knowledge Discovery
- Machine Learning and Classification
- Decision trees Decision Tree Induction
- Neural Networks Biological inspiration
- Clustering basic concepts Clustering Algorithms
- Association rule mining
- Visualization Visualization techniques
- Text Mining Text Mining tasks
- Stages of Data Mining process
- Machine Learning and Data Mining Summary
Key reading
- Jiawei Han, Micheline Kamber, Jian Pei Data Mining: Concepts and Techniques 3rd Edition
- Graham J.Williams Simeon J. Data Mining: Theory, Methodology, Techniques and Applications, Springer, Australia, 2007. P.140
Further reading
- Sumathi S., Sivanandam S. Introduction to Data Mining and its Applications, Springer-Verlag Berlin Heidelberg 2006, P. 835