Natural Language Processing Week 7 NPTEL Assignment Answers 2025

NPTELNatural Language Processing Week 7 Assignment Answers 2024

1. Suppose you have a raw text corpus and you compute word co-occurrence matrix from there. Which of the following algorithm(s) can you utilize to learn word representations? (Choose all that apply)

Options:

  • a. CBOW
  • b. SVM
  • c. PCA
  • d. Bagging

Answer:a, c

Explanation:

  • CBOW (Continuous Bag of Words) uses a context window to predict the center word and learn dense embeddings — trained using text corpus.
  • PCA (Principal Component Analysis) can be applied on the co-occurrence matrix for dimensionality reduction and learning word embeddings.
  • SVM is a classifier, not used for learning word embeddings.
  • Bagging is an ensemble method in supervised learning, not applicable here.

2. What is the method for solving word analogy questions like, given A, B and D, find C such that A:B::C:D, using word vectors?

Options:

  • a. Vc=Va+(Vo−Va)V_c = V_a + (V_o – V_a)Vc​=Va​+(Vo​−Va​), then use cosine similarity
  • b. Vc=Va+(Va−Vs)V_c = V_a + (V_a – V_s)Vc​=Va​+(Va​−Vs​), then dictionary lookup
  • c. Vc=Va+(Vb−Va)V_c = V_a + (V_b – V_a)Vc​=Va​+(Vb​−Va​), then use cosine similarity to find closest word
  • d. Vc=Va+(Va−Vs)V_c = V_a + (V_a – V_s)Vc​=Va​+(Va​−Vs​), then dictionary lookup
  • e. None of the above

Answer:c

Explanation:
The formula used is: Vc=Vd+(Vb−Va)V_c = V_d + (V_b – V_a)Vc​=Vd​+(Vb​−Va​)

Which is interpreted as: “D is to C as B is to A”. We find Vc and look up its nearest neighbor using cosine similarity.


3. What is the value of PMI(W₁, W₂) given:

  • C(w1)=250C(w_1) = 250C(w1​)=250,
  • C(w2)=1000C(w_2) = 1000C(w2​)=1000,
  • C(w1,w2)=160C(w_1, w_2) = 160C(w1​,w2​)=160,
  • N=100000N = 100000N=100000 (total documents)?
    Use base 2 logarithm.

Options:

  • a. 4
  • b. 5
  • c. 6
  • d. 5.64

Answer: c

Explanation:
PMI formula: PMI(w1,w2)=log⁡2(P(w1,w2)P(w1)P(w2))=log⁡2(160/100000(250/100000)⋅(1000/100000))=log⁡2(1602.5)=log⁡2(64)=6PMI(w_1, w_2) = \log_2 \left( \frac{P(w_1, w_2)}{P(w_1)P(w_2)} \right) = \log_2 \left( \frac{160/100000}{(250/100000) \cdot (1000/100000)} \right) = \log_2 \left( \frac{160}{2.5} \right) = \log_2(64) = 6PMI(w1​,w2​)=log2​(P(w1​)P(w2​)P(w1​,w2​)​)=log2​((250/100000)⋅(1000/100000)160/100000​)=log2​(2.5160​)=log2​(64)=6


4. Given two binary word vectors:

  • w1=[1 0 1 0 1 0 1 0 1 0]w_1 = [1\ 0\ 1\ 0\ 1\ 0\ 1\ 0\ 1\ 0]w1​=[1 0 1 0 1 0 1 0 1 0]
  • w2=[0 0 1 1 1 1 1 1 0 0]w_2 = [0\ 0\ 1\ 1\ 1\ 1\ 1\ 1\ 0\ 0]w2​=[0 0 1 1 1 1 1 1 0 0]

Compute Dice and Jaccard similarity.

Options:

  • a. 6/11, 3/8
  • b. 10/11, 5/6
  • c. 4/9, 2/7
  • d. 5/9, 5/8

Answer: a

Explanation:

  • Intersection (common 1s): 3 positions
  • |A| = 5, |B| = 6
  • Dice similarity = 2⋅35+6=611\frac{2 \cdot 3}{5 + 6} = \frac{6}{11}5+62⋅3​=116​
  • Jaccard similarity = 35+6−3=38\frac{3}{5 + 6 – 3} = \frac{3}{8}5+6−33​=83​

5. KL Divergence between p and q

  • p=[0.20, 0.75, 0.50]p = [0.20,\ 0.75,\ 0.50]p=[0.20, 0.75, 0.50]
  • q=[0.90, 0.10, 0.25]q = [0.90,\ 0.10,\ 0.25]q=[0.90, 0.10, 0.25]

Options:

  • a. 4.704, 1.720
  • b. 1.692, 0.553
  • c. 2.246, 1.412
  • d. 3.213, 2.426

Answer: c

Explanation:
KL Divergence DKL(p∣∣q)=∑ipi⋅log⁡2(piqi)D_{KL}(p || q) = \sum_i p_i \cdot \log_2\left( \frac{p_i}{q_i} \right)DKL​(p∣∣q)=∑i​pi​⋅log2​(qi​pi​​)
Compute for each index and sum up. Final result approximately = 2.246


6.

Answer :- c


7. Which of the following type of relations can be captured by word2vec (CBOW or Skipgram)?

Options:

  • a. Analogy (A:B::C:?)
  • b. Antonymy
  • c. Polysemy
  • d. All of the above

Answer: a

Explanation:

  • Word2Vec models can capture semantic analogies through vector arithmetic.
  • Antonymy and Polysemy are not directly captured, as Word2Vec cannot distinguish between contextual meanings or opposites unless trained specifically.