Natural Language Processing

Zhomartkyzy Gulnaz

The instructor profile

Description: The course is dedicated to the fundamentals of Natural Language Processing (NLP): from text preprocessing and language models to vector representations, sentiment analysis, and machine translation technologies. It covers key methods of classification, dimensionality reduction, and the development of efficient NLP systems.

Amount of credits: 6

Пререквизиты:

  • Software Engineering

Course Workload:

Types of classes hours
Lectures 30
Practical works
Laboratory works 30
SAWTG (Student Autonomous Work under Teacher Guidance) 30
SAW (Student autonomous work) 90
Form of final control Exam
Final assessment method

Component: University component

Cycle: Profiling disciplines

Goal
  • To develop students' theoretical knowledge and practical skills in the field of Natural Language Processing (NLP), necessary for the design and application of algorithms, methods, and models for automatic text data analysis, as well as to teach them to use modern NLP tools and models for solving practical tasks.
Objective
  • To study the basic concepts, methods and technologies of text and speech data processing.
  • Develop skills in analyzing and preprocessing text corpora and assessing the quality of NLP models.
Learning outcome: knowledge and understanding
  • Theoretical knowledge and practical skills in the field of natural language processing (NLP).
Learning outcome: applying knowledge and understanding
  • Be able to process and analyze large amounts of data using modern software
Learning outcome: formation of judgments
  • the ability to independently apply methods and means of knowledge, learning and self-control, to be aware of the prospects of intellectual, cultural, moral, physical and professional self-development and self-improvement, to be able to critically assess their strengths and weaknesses.
Learning outcome: communicative abilities
  • arry out communications in the professional sphere and in society as a whole, including in a foreign language, analyze existing and develop independently technical documentation, clearly state and protect the results of complex engineering activities in the field of IT technologies
Assessment of the student's knowledge

Teacher oversees various tasks related to ongoing assessment and determines students' current performance twice during each academic period. Ratings 1 and 2 are formulated based on the outcomes of this ongoing assessment. The student's learning achievements are assessed using a 100-point scale, and the final grades P1 and P2 are calculated as the average of their ongoing performance evaluations. The teacher evaluates the student's work throughout the academic period in alignment with the assignment submission schedule for the discipline. The assessment system may incorporate a mix of written and oral, group and individual formats.

Period Type of task Total
1  rating Laboratory work 1 0-100
Laboratory work 2
Laboratory work 3
Laboratory work 4
2  rating Laboratory work 5 0-100
Laboratory work 6
Laboratory work 7
Laboratory work 8
Total control Exam 0-100
The evaluating policy of learning outcomes by work type
Type of task 90-100 70-89 50-69 0-49
Excellent Good Satisfactory Unsatisfactory
Evaluation form

The student's final grade in the course is calculated on a 100 point grading scale, it includes:

  • 40% of the examination result;
  • 60% of current control result.

The final grade is calculated by the formula:

FG = 0,6 MT1+MT2 +0,4E
2

 

Where Midterm 1, Midterm 2are digital equivalents of the grades of Midterm 1 and 2;

E is a digital equivalent of the exam grade.

Final alphabetical grade and its equivalent in points:

The letter grading system for students' academic achievements, corresponding to the numerical equivalent on a four-point scale:

Alphabetical grade Numerical value Points (%) Traditional grade
A 4.0 95-100 Excellent
A- 3.67 90-94
B+ 3.33 85-89 Good
B 3.0 80-84
B- 2.67 75-79
C+ 2.33 70-74
C 2.0 65-69 Satisfactory
C- 1.67 60-64
D+ 1.33 55-59
D 1.0 50-54
FX 0.5 25-49 Unsatisfactory
F 0 0-24
Topics of lectures
  • Introduction to NLP technology
  • Text pre-processing techniques
  • Part of speech tagging
  • Term frequency and weighting
  • Word vector representation methods in NLP
  • Feature extraction based on n-grams
  • Methods for reducing the dimensionality of the feature space
  • Sentiment analysis using logistic regression
  • Sentiment Analysis of Texts Using the Naïve Bayes Classifier
  • Similarity measures and dimensionality reduction in NLP: Euclidean distance, Cosine similarity and PCA
  • Part of speech tagging
  • Architecture of the CBOW Model
  • Neural Networks and Recurrent Models in Text Processing
Key reading
  • Sunil Patel. Getting Started with Deep Learning for Natural Language Processing, BPB PUBLICATIONS, ISBN: 978-93-89898-11-8, 2021.
  • Ekaterina Kochmar. Getting Started with Natural Language Processing, Manning Publications Co., ISBN: 9781617296765, 2022
  • Materials https://www.deeplearning.ai/
  • Francesco Mosconi. Zero to Deep Learning, 2019
  • Hobson Lane. Natural Language Processing in Action. 2020
Further reading
  • Thushan Ganegedara. Natural Language Processing with TensorFlow, ISBN 978-1-83864-135-1, 2022