Data Analytics with Python Week 11 NPTEL Assignment Answers 2025

NPTEL Data Analytics with Python Week 11 Assignment Answers 2024

Q1. Which library is used for calculating distance measures in clustering using Python?

a. distance_matrix
b. scipy.spatial
c. scipy_spatial
d. distance.matrix

Answer: b

Explanation: scipy.spatial is a module in SciPy that provides functions for spatial computations, including distance metrics like Euclidean, Minkowski, etc., which are commonly used in clustering.

Q2. Formula for dissimilarity computation between two objects for categorical variables is –
Here p is a categorical variable and m denotes the number of matches.

a. D(i, j) = (p – m) / p
b. D(i, j) = (p – m) / m
c. D(i, j) = (m – p) / p
d. D(i, j) = (m – p) / m

Answer: a

Explanation: The correct formula for dissimilarity in categorical variables is D(i, j) = (p – m) / p, where p is the total number of variables and m is the number of matching variables between objects i and j.

Q3. For a dataset with 7 objects and an interval-scaled variable f = (1, 2, 3, 4, 5, 8, 50), containing an outlier, which is true?

a. Std deviation (std_f) and mean absolute deviation (s_f) are equally affected
b. Mean absolute deviation (s_f) is more affected by the outlier
c. Std deviation (std_f) is less affected by the outlier
d. Std deviation (std_f) is more affected by the outlier

Answer: d

Explanation: Standard deviation squares the deviations, so it is more sensitive to outliers compared to mean absolute deviation which uses absolute values.

Q4. Select the correct statement about the standardization in clustering.

a. Standardizing the data always gives inefficient result while making clusters
b. Standardizing the data is always beneficial during clustering analysis
c. The variables having an absolute value may not be efficient after standardization during clustering
d. Outliers cannot be detected by standardized data

Answer: c

Explanation: Standardization might distort variables with meaningful absolute values, such as binary or categorical encodings, making clustering less effective.

[Week 1-12] NPTEL Data Analytics with Python Assignment Answers 2026

By Answer GPT In 2026 January to April

Buy Now

[Week 1-12] NPTEL Data Analytics with Python Assignment Answers 2024

By Answer GPT In 2024 January to April, Programming

View Cart

Q5. Which of the following can act as possible termination conditions in K-Means?

For a fixed number of iterations.
Assignment of observations to clusters does not change between iterations.
Centroids do not change between successive iterations.
Terminate when RSS falls below a threshold.

a. 1, 3 and 4
b. 1, 2, 3 and 4
c. 2 and 3
d. None of these

Answer: b

Explanation: All four are valid stopping criteria in K-Means clustering: a set number of iterations, no change in assignments, stable centroids, or RSS falling below a defined threshold.

Q6. In the figure, if you draw a horizontal line at y = 2, what will be the number of clusters formed?

a. 1
b. 2
c. 3
d. 4

Answer: b

Explanation: Drawing a horizontal line at a certain height in a dendrogram (such as y = 2) helps determine how many clusters will be formed based on the number of vertical lines intersected.

Q7. Which type of clustering uses a merging approach?

a. Partitional
b. Naive Bayes
c. Hierarchical
d. None of the above

Answer: c

Explanation: Hierarchical clustering can follow a bottom-up (agglomerative) merging approach, where each data point starts in its own cluster and merges iteratively.

Q8. True or False: Hierarchical clustering should primarily be used for exploration.

True
False

Answer: True

Explanation: Hierarchical clustering helps understand the nested structure of data and is often used for exploratory data analysis, especially using dendrograms.

Q9. True or False: For finding dissimilarity between clusters in hierarchical clustering, average-link is the only metric used.

True
False

Answer: False

Explanation: There are multiple linkage criteria such as single-link, complete-link, and average-link, used for measuring dissimilarity in hierarchical clustering.

Q10. If two variables V1 and V2 are used for clustering with k = 3, and:

If V1 and V2 have a correlation of 1, the cluster centroids will be in a straight line.
If V1 and V2 have a correlation of 0, the cluster centroids will be in a straight line.

a. 1 only
b. 2 only
c. 1 and 2
d. None of the above

Answer: a

Explanation: When V1 and V2 are perfectly correlated (correlation = 1), the data and therefore the centroids lie along a straight line. When correlation = 0, centroids are likely scattered in 2D space.

Data Analytics with Python Week 11 NPTEL Assignment Answers 2025

NPTEL Data Analytics with Python Week 11 Assignment Answers 2024

Important Links

Quick Links

NPTEL Data Analytics with Python Week 11 Assignment Answers 2024

[Week 1-12] NPTEL Data Analytics with Python Assignment Answers 2026

[Week 1-12] NPTEL Data Analytics with Python Assignment Answers 2024

Related Posts