NPTEL Data Science for Engineers Week 8 Assignment Answers 2024
1. According to the built model, the within cluster sum of squares for each cluster is _____________ (the order of values in each option could be different):
- a. 8.316061 11.952463 16.212213 19.922437
- b. 7.453059 12.158682 13.212213 21.158766
- c. 8.316061 13.952463 15.212213 19.922437
- d. None of the above
✅ Answer :- a
✏️ Explanation: These values represent the within-cluster sum of squares (WCSS), which measures the compactness of each cluster. Option (a) matches the WCSS values from the built model.
2. According to the built model, the size of each cluster is ______________ (the order of values in each option could be different):
- a. 13 13 7 14
- b. 11 18 25 24
- c. 8 13 16 13
- d. None of the above
✅ Answer :- c
✏️ Explanation: Cluster sizes indicate how many data points belong to each cluster. Option (c) matches the correct distribution in the model.
3. The Between Cluster Sum-of-Squares (BCSS) value of the built K-means model is ______________ (Choose the appropriate range)
- a. 100 – 200
- b. 200 – 300
- c. 300 – 350
- d. None of the above
✅ Answer :- a
✏️ Explanation: BCSS measures the variance between clusters. A value in the range of 100–200 is typical for a moderate model with 4 clusters.
4. The Total Sum-of-Squares value of the built k-means model is _________ (Choose the appropriate range)
- a. 100 – 200
- b. 200 – 300
- c. 300 – 350
- d. None of the above
✅ Answer :-
✏️ Explanation: TSS = WCSS + BCSS. Since BCSS is already in the 100–200 range, and WCSS values are moderate, TSS is also expected to lie in this range.
5. Which of the statement is INCORRECT about KNN algorithm?
- a. KNN works ONLY for binary classification problems
- b. If k=1, then the algorithm is simply called the nearest neighbour algorithm
- c. Number of neighbours (K) will influence classification output
- d. None of the above
✅ Answer :- a
✏️ Explanation: KNN can be used for multi-class classification as well, not just binary. Hence, statement (a) is incorrect.
6. K means clustering algorithm clusters the data points based on:-
- a. Dependent and independent variables
- b. The eigen values
- c. Distance between the points and a cluster centre
- d. None of the above
✅ Answer :- c
✏️ Explanation: K-means uses Euclidean distance to assign points to the nearest cluster center.
7. The method / metric which is NOT useful to determine the optimal number of clusters in unsupervised clustering algorithms is
- a. Scatter plot
- b. Elbow method
- c. Dendrogram
- d. None of the above
✅ Answer :- a
✏️ Explanation: A scatter plot visualizes data but does not directly help in determining the optimal number of clusters. Elbow and dendrogram methods do.
8. The unsupervised learning algorithm which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest centroid is
- a. Hierarchical clustering
- b. K-means clustering
- c. KNN
- d. None of the above
✅ Answer :- b
✏️ Explanation: K-means clustering partitions data based on proximity to centroids, making it the correct answer.