NPTEL Natural Language Processing Week 9 Assignment Answers 2024
1. Which of the following is/are true?
Options:
- a) Topic modelling discovers the hidden themes that pervade the collection
- b) Topic modelling is a generative model
- c) Dirichlet hyperparameter Beta used to represent document-topic Density?
- d) None of the above
✅ Answer: a, b
Explanation:
Topic modeling uncovers latent themes (a) and uses generative models like LDA (b). Beta relates to topic-word distribution, not document-topic.
2. Which of the following is/are true?
Options:
- a) The Dirichlet is an exponential family distribution on the simplex positive and negative vectors sum to one
- b) Correlated Topic Model (CTM) predicts better via correlated topics
- c) LDA provides better fit than CTM
- d) CTM draws topic distributions from a logistic normal
✅ Answer: b, d
Explanation:
CTM allows topic correlation using logistic normal (d), offering better predictions than LDA when topics are interrelated (b).
3. You have a topic model with α = 0.89 and β = 0.04. To get sparser word distribution and denser topic distribution, what should be the values for α and β?
Options:
- a) Both α and β should be decreased
- b) Both α and β should be increased
- c) α should be decreased, but β should be increased
- d) α should be increased, but β should be decreased
✅ Answer: d
Explanation:
Larger α → more evenly mixed topics in documents (denser); smaller β → sparse word distributions per topic.
4. Which of the following is/are false about LDA assumption?
Options:
- a) LDA assumes that the order of documents matter
- b) LDA is not appropriate for corpora that spans hundreds of years
- c) LDA assumes that documents are a mixture of topics and topics are a mixture of words
- d) LDA can decide on the number of topics by itself.
✅ Answer: a, d
Explanation:
LDA assumes bag-of-words (not sequential) (a), and the number of topics must be predefined (d).
5. Which of the following is/are true about Relational Topic Model (RTM)?
Options:
- a) RTM uses same latent topic assignments to generate document content
- b) Link function uses linear regression
- c) Covariates are constructed by the Hadamard product
- d) Link probability depends on topic assignments that generated words
✅ Answer: a, c, d
Explanation:
RTM models content and link structure. It uses Hadamard product for features (c), and topic-based link prediction (a, d). It does not use linear regression (b).
6. Classically, topic models are introduced in the text analysis community for ___________ topic discovery.
Options:
- a) Unsupervised
- b) Supervised
- c) Semi-automated
- d) None of the above
✅ Answer: a
Explanation:
Topic models like LDA are unsupervised, discovering topics without labeled data.
7. Which of the following is/are false about Gibbs Sampling?
Options:
- a) Gibbs sampling is a form of Markov chain Monte Carlo (MCMC)
- b) Sampling is done sequentially until convergence
- c) It cannot estimate the posterior directly
- d) It is a variational method
✅ Answer: c, d
Explanation:
Gibbs Sampling does estimate posteriors directly (c is false) and is not a variational method (d is false). It is a standard MCMC technique.
8.
Answer: a