NPTELNatural Language Processing Week 7 Assignment Answers 2024
1. Suppose you have a raw text corpus and you compute word co-occurrence matrix from there. Which of the following algorithm(s) can you utilize to learn word representations? (Choose all that apply)
Options:
- a. CBOW
- b. SVM
- c. PCA
- d. Bagging
Answer: ✅ a, c
Explanation:
- CBOW (Continuous Bag of Words) uses a context window to predict the center word and learn dense embeddings — trained using text corpus.
- PCA (Principal Component Analysis) can be applied on the co-occurrence matrix for dimensionality reduction and learning word embeddings.
- SVM is a classifier, not used for learning word embeddings.
- Bagging is an ensemble method in supervised learning, not applicable here.
2. What is the method for solving word analogy questions like, given A, B and D, find C such that A:B::C:D, using word vectors?
Options:
- a. Vc=Va+(Vo−Va)V_c = V_a + (V_o – V_a)Vc=Va+(Vo−Va), then use cosine similarity
- b. Vc=Va+(Va−Vs)V_c = V_a + (V_a – V_s)Vc=Va+(Va−Vs), then dictionary lookup
- c. Vc=Va+(Vb−Va)V_c = V_a + (V_b – V_a)Vc=Va+(Vb−Va), then use cosine similarity to find closest word
- d. Vc=Va+(Va−Vs)V_c = V_a + (V_a – V_s)Vc=Va+(Va−Vs), then dictionary lookup
- e. None of the above
Answer: ✅ c
Explanation:
The formula used is: Vc=Vd+(Vb−Va)V_c = V_d + (V_b – V_a)Vc=Vd+(Vb−Va)
Which is interpreted as: “D is to C as B is to A”. We find Vc and look up its nearest neighbor using cosine similarity.
3. What is the value of PMI(W₁, W₂) given:
- C(w1)=250C(w_1) = 250C(w1)=250,
- C(w2)=1000C(w_2) = 1000C(w2)=1000,
- C(w1,w2)=160C(w_1, w_2) = 160C(w1,w2)=160,
- N=100000N = 100000N=100000 (total documents)?
Use base 2 logarithm.
Options:
- a. 4
- b. 5
- c. 6
- d. 5.64
Answer: c
Explanation:
PMI formula: PMI(w1,w2)=log2(P(w1,w2)P(w1)P(w2))=log2(160/100000(250/100000)⋅(1000/100000))=log2(1602.5)=log2(64)=6PMI(w_1, w_2) = \log_2 \left( \frac{P(w_1, w_2)}{P(w_1)P(w_2)} \right) = \log_2 \left( \frac{160/100000}{(250/100000) \cdot (1000/100000)} \right) = \log_2 \left( \frac{160}{2.5} \right) = \log_2(64) = 6PMI(w1,w2)=log2(P(w1)P(w2)P(w1,w2))=log2((250/100000)⋅(1000/100000)160/100000)=log2(2.5160)=log2(64)=6
4. Given two binary word vectors:
- w1=[1 0 1 0 1 0 1 0 1 0]w_1 = [1\ 0\ 1\ 0\ 1\ 0\ 1\ 0\ 1\ 0]w1=[1 0 1 0 1 0 1 0 1 0]
- w2=[0 0 1 1 1 1 1 1 0 0]w_2 = [0\ 0\ 1\ 1\ 1\ 1\ 1\ 1\ 0\ 0]w2=[0 0 1 1 1 1 1 1 0 0]
Compute Dice and Jaccard similarity.
Options:
- a. 6/11, 3/8
- b. 10/11, 5/6
- c. 4/9, 2/7
- d. 5/9, 5/8
Answer: a
Explanation:
- Intersection (common 1s): 3 positions
- |A| = 5, |B| = 6
- Dice similarity = 2⋅35+6=611\frac{2 \cdot 3}{5 + 6} = \frac{6}{11}5+62⋅3=116
- Jaccard similarity = 35+6−3=38\frac{3}{5 + 6 – 3} = \frac{3}{8}5+6−33=83
5. KL Divergence between p and q
- p=[0.20, 0.75, 0.50]p = [0.20,\ 0.75,\ 0.50]p=[0.20, 0.75, 0.50]
- q=[0.90, 0.10, 0.25]q = [0.90,\ 0.10,\ 0.25]q=[0.90, 0.10, 0.25]
Options:
- a. 4.704, 1.720
- b. 1.692, 0.553
- c. 2.246, 1.412
- d. 3.213, 2.426
Answer: c
Explanation:
KL Divergence DKL(p∣∣q)=∑ipi⋅log2(piqi)D_{KL}(p || q) = \sum_i p_i \cdot \log_2\left( \frac{p_i}{q_i} \right)DKL(p∣∣q)=∑ipi⋅log2(qipi)
Compute for each index and sum up. Final result approximately = 2.246
6.
Answer :- c
7. Which of the following type of relations can be captured by word2vec (CBOW or Skipgram)?
Options:
- a. Analogy (A:B::C:?)
- b. Antonymy
- c. Polysemy
- d. All of the above
Answer: a
Explanation:
- Word2Vec models can capture semantic analogies through vector arithmetic.
- Antonymy and Polysemy are not directly captured, as Word2Vec cannot distinguish between contextual meanings or opposites unless trained specifically.