Module Detailed Content Hours
1 1.1 Introduction to NLP 03
Origin & History of NLP; Language, Knowledge and Grammar in
language processing; Stages in NLP;Ambiguities and its types in English
and Indian Regional Llanguages; Challenges of NLP;Applications of
NLP.
1.2 Self-Learning topics : Variety types of tools for regional languages preprocessing
and other functionalities (Refer Chapter 1)2.0 2.1 Word Level Analysis 09
Basic Terms: Tokenization, Stemming, Lemmatization; Survey of
English Morphology, Inflectional Morphology, Derivational
Morphology; Regular expression with types;
Morphological Models: Dictionary lookup, finite state morphology;
Morphological parsing with FST (Finite State Transducer);Lexicon free
FST Porter Stemmer algorithm; Grams and its variation: Bigram,
Trigram; Simple (Unsmoothed) N-grams;
N-gram Sensitivity to the Training Corpus; Unknown Words: Open.
versus closed vocabulary tasks; Evaluating N-grams: Perplexity;
Smoothing: Laplace Smoothing, Good-Turing Discounting;
2.2 Self-Learning topics: Noisy channel models, various edit distance,
Advance Issues in Language Modelling. (Refer Chapter 2)
3.0 3.1 Syntax analysis 10
Part-Of-Speech tagging(POS); Tag set for English (Upenn Treebank);
Difficulties /Challenges in POS tagging; Rule-based, Stochastic and
Transformation-based tagging; Generative Model: Hidden Markov
Model (HMM Viterbi) for POS tagging;
Issues in HMM POS tagging; Discriminative Model: Maximum Entropy
model, Conditional random Field (CRF);Parsers: Top down and Bottom
up; Modelling constituency; Bottom Up Parser: CYK, PCFG
(Probabilistic Context Free Grammar), Shift Reduce Parser; Top Down
Parser: Early Parser, Predictive Parser
3.2 Self-Learning topics: Evaluating parsers, Parsers based language
modelling, Regional languages POS tree banks. (Refer Chapter 3)
4.0 4.1 Semantic Analysis 07
Introduction, meaning representation; Lexical Semantics; Corpus study;
Study of Various language dictionaries like WorldNet, Babelnet;
Relations among lexemes & their senses –Homonymy, Polysemy,
Synonymy, Hyponymy; Semantic Ambiguity; Word Sense
Disambiguation (WSD); Knowledge based approach(Lesk‘s Algorithm),
Supervised (Naïve Bayes, Decision List),Introduction to Semi-supervised
method (Yarowsky) Unsupervised (Hyperlex)
4.2 Self-Learning topics: Dictionaries for regional languages, Distributional
Semantics, Topic Models. (Refer Chapter 4)5.0 5.1 Pragmatic & Discourse Processing 05
Discourse: Reference Resolution, Reference Phenomena, Syntactic &
Semantic constraint on coherence; Anaphora Resolution using Hobbs and
Cantering Algorithm.
5.2 Self-Learning topics: Discourse segmentation, Conference resolution.
(Refer Chapter 5)
6.0 6.1 Applications of NLP 05
Case studies on (preferable in regional language) : Machine translation;
Text Summarization; Sentiment analysis; Information retrieval; Question
Answering system.
6.2 Self-Learning topics : Applications based on Deep Neural Network with
NLP such as LSTM network, Recurrent Neural network etc.
(Refer Chapter 6)