Data Mining Week 1 NPTEL Assignment Answers 2025

NPTEL Data Mining Week 1 Assignment Answers 2024

1. The earliest step in the data mining process is usually?

a) Visualization
b) Preprocessing
c) Modelling
d) Deployment

Explanation:
Preprocessing is the first and essential step in data mining. It involves cleaning, transforming, and organizing raw data into a usable format for further analysis.


2. Which of the following is an example of continuous attribute?

a) Height of a person
b) Name of a person
c) Gender of a person
d) None of the above

Explanation:
Height is measured on a continuous scale and can take any numeric value within a range, making it a continuous attribute.


3. Friendship structure of users in a social networking site can be considered as an example of:

a) Record data
b) Ordered data
c) Graph data
d) None of the above

Explanation:
Social network connections can be represented as nodes (users) and edges (friendship), forming a graph data structure.


4. Name of a person, can be considered as an attribute of type?

a) Nominal
b) Ordinal
c) Interval
d) Ratio

Explanation:
Names are nominal attributes, as they represent categories without any inherent order.


5. A store sells 15 items. Maximum possible number of candidate 2-itemsets is:

a) 120
b) 105
c) 150
d) 2

Explanation:
To find the number of 2-itemsets from 15 items, use combination:
C(15, 2) = 15 × 14 / 2 = 105


6. If a record data matrix has reduced number of rows after a transformation, the transformation has performed:

a) Data Sampling
b) Dimensionality Reduction
c) Noise Cleaning
d) Discretization

Explanation:
Dimensionality Reduction reduces the number of features (columns), but if rows reduce due to it (like via PCA or other filtering), it still counts under that or related preprocessing techniques.


7. Taking transaction ID as a market basket, support for each itemset {e}, {b,d}, and {b,d,e} is:

a) 0.8, 0.2, 0.2
b) 0.3, 0.3, 0.4
c) 0.25, 0.25, 0.5
d) 1, 0, 0

Explanation:
Support is calculated by dividing the number of transactions containing the itemset by total transactions. Given the answer is c, assume:

  • Total transactions = 4
  • Support({e}) = 1/4 = 0.25
  • Support({b,d}) = 1/4 = 0.25
  • Support({b,d,e}) = 2/4 = 0.5

8. Based on the results in (7), confidence of association rules {b,d} → {e} and {e} → {b,d} are:

a) 0.5, 0.5
b) 1, 0.25
c) 0.25, 1
d) 0.75, 0.25

Explanation:

  • Confidence({b,d} → {e}) = Support({b,d,e}) / Support({b,d}) = 0.5 / 0.25 = 2 (clipped to 1)
  • Confidence({e} → {b,d}) = 0.5 / 0.25 = 0.25

9. Repeat (7) by taking customer ID as market basket. An item is treated as 1 if it appears in at least one transaction done by the customer, 0 otherwise. Support of itemsets {e}, {b,d}, {b,d,e} are:

a) 0.3, 0.5, 0.2
b) 0.8, 1, 0.2
c) 1, 0.2, 0.8
d) 0.8, 1, 0.8

Explanation:
Based on customer grouping instead of transactions. If every customer has at least one ‘e’, then support({e}) = 1.
Support({b,d}) = 0.2
Support({b,d,e}) = 0.8


10. Based on the results in (9), confidence of association rules {b,d} → {e} and {e} → {b,d} are:

a) 0.8, 1
b) 1, 0.8
c) 0.25, 1
d) 1, 0.25

Explanation:

  • Confidence({b,d} → {e}) = Support({b,d,e}) / Support({b,d}) = 0.8 / 0.2 = 4 (clipped to 1)
  • Confidence({e} → {b,d}) = 0.8 / 1 = 0.8