1. In a corpus, you found that the word with rank 4th has a frequency of 600. What can be the best guess for the rank of a word with frequency 300?
- 2
- 4
- 8
- 6
Answer:- 8
2. In the sentence, “In Kolkata I took my hat off. But I can’t put it back on.”, total number of word tokens and word types are:
- 14, 13
- 13, 14
- 15, 14
- 14, 15
Answer:- 14, 13
3. Let the rank of two words, w1 and w2, in a corpus be 1600 and 400, respectively. Let m1 and m2 represent the number of meanings of w1 and w2 respectively. The ratio m1 : m2 would tentatively be
- 1:4
- 4:1
- 1:2
- 2:1
Answer:- 1:2
4. What is the valid range of type-token ratio of any text corpus?
- TTRe(0, 1] (excluding zero)
- TTRe[0, 1]
- TTRe[-1,1]
- TTRe[0, +∞] (any non-negative number)
Answer:- TTRe(0, 1] (excluding zero)
5. If first corpus has TTR, = 0.025 and second corpus has TTR2 = 0.25, where TTR, and TTR2 represents type/token ratio in first and second corpus respectively, then
- First corpus has more tendency to use different words.
- Second corpus has more tendency to use different words.
- Both a and b
- None of these
Answer:- Second corpus has more tendency to use different words.
6. Which of the following is/are true for the English Language?
- Lemmatization works only on inflectional morphemes and Stemming works only on derivational morphemes.
- The outputs of lemmatization and stemming for the same word might differ.
- Output of lemmatization are always real words
- Output of stemming are always real words
Answer:- b. The outputs of lemmatization and stemming for the same word might differ. c. Output of lemmatization are always real words.
7. An advantage of Porter stemmer over a full morphological parser?
- The stemmer is better justified from a theoretical point of view
- The output of a stemmer is always a valid word
- The stemmer does not require a detailed lexicon to implement
- None of the above
Answer:- The stemmer does not require a detailed lexicon to implement.
8. Which of the following are instances of stemming? (as per Porter Stemmer)
- are -> be
- plays -> play
- saw -> s
- university -> univers
Answer:- b. plays -> play d.university -> univers
9. What is natural language processing good for?
- Summarize blocks of text
- Automatically generate keywords
- Identifying the type of entity extracted
- All of the above
Answer:- All of the above
10. What is the size of unique words in a document where total number of words = 12000. K = 3.71 Beta = 0.69?
- 2421
- 3367
- 5123
- 1529
Answer:- 2421