NPTEL Data Analytics with Python Week 12 Assignment Answers 2024
Q1. Which clustering algorithm works well when the shape of the clusters is hyper-spherical?
a. K means
b. Agglomerative Hierarchical clustering
c. Divisive Hierarchical clustering
d. All of the above
Answer: d
Explanation:
All three algorithms — K-means, Agglomerative, and Divisive — work with distance-based similarity and perform well when clusters are spherical in nature, although K-means is most effective. Hence, “All of the above” is appropriate.
Q2. In decision tree, an internal node represents –
a. A test on an attribute
b. An outcome of the test
c. Entire sample population
d. Holds a class label
Answer: a
Explanation:
An internal node in a decision tree splits the data based on a condition (i.e., a test on an attribute). Leaves hold class labels, not internal nodes.
Q3. Choose the correct statement about the CART model
a. CART is an unsupervised learning technique
b. CART is a supervised learning technique
c. CART adopts a greedy approach
d. Both b. and c.
Answer: d
Explanation:
CART (Classification and Regression Trees) is a supervised learning technique and it uses a greedy algorithm to build the tree — it makes the best split at each step without revisiting previous decisions.
Q4. Which library is used to build the decision tree model?
a. Decision tree classifier
b. DecisionTreeClassifier
c. Decision_Tree_Classifier
d. Decision_tree_model
Answer: b
Explanation:DecisionTreeClassifier
is a class from the sklearn.tree
module in Python and is used to build decision tree models for classification.
Q5. State True or False: Gini Index enforces the resulting tree to have multiway splits
a. True
b. False
Answer: b
Explanation:
Gini Index doesn’t enforce multiway splits. It simply measures the impurity; whether the split is binary or multiway depends on the algorithm’s implementation and attribute type.
Q6. Chance nodes are represented by ________
a. Disks
b. Squares
c. Circles
d. Triangles
Answer: c
Explanation:
In decision trees or decision analysis diagrams, chance nodes are typically represented using circles, indicating uncertainty or probabilistic outcomes.
Q7. _______ is the measure of uncertainty of a random variable, it characterizes the impurity of an arbitrary collection of examples.
a. Information Gain
b. Gini Index
c. Entropy
d. None of the above
Answer: c
Explanation:
Entropy quantifies the uncertainty or impurity in data. A higher entropy means more disorder, commonly used in ID3 decision tree algorithms.
Q8. End Nodes are represented by ____________
a. Disks
b. Squares
c. Circles
d. Triangles
Answer: b
Explanation:
In decision trees, end nodes (also called leaf nodes or terminal nodes) are typically represented using squares.
Q9. Decision tree learners may create biased trees if some classes dominate. What’s the solution for it?
a. Balance the dataset prior to fitting
b. Imbalance the dataset prior to fitting
c. Balance the dataset after fitting
d. None of the above
Answer: a
Explanation:
Balancing the dataset before training helps prevent bias in decision trees toward the majority class, ensuring fair representation.
Q10. Suppose, your target variable is the price of a house using Decision Tree. What type of tree do you need to predict the target variable?
a. Classification tree
b. Regression Tree
c. Clustering tree
d. Dimensionality reduction tree
Answer: b
Explanation:
Since “price” is a continuous variable, a Regression Tree is required, which predicts numeric values instead of class labels.