NPTEL Introduction to Machine Learning Week 5 Assignment Answers 2024
1. Consider a feedforward neural network that performs regression on a p
-dimensional input to produce a scalar output. It has m hidden layers and each of these layers has k hidden units. What is the total number of trainable parameters in the network? Ignore the bias terms.
- pk+mk2
- pk+mk2+k
- pk+(m−1)k2+k
- p2+(m−1)pk+k
- p2+(m−1)pk+k2
Answer :- c
2.

Answer :- b
3.

Answer :- b, d
4. Which of the following statement(s) about the initialization of neural network weights is/are true?
- Two different initializations of the same network could converge to different minima.
- For a given initialization, gradient descent will converge to the same minima irrespective of the learning rate.
- The weights should be initialized to a constant value.
- The initial values of the weights should be sampled from a probability distribution.
Answer :- a, d
7. Consider a Bernoulli distribution with with p=0.7 (true value of the parameter). We draw samples from this distribution and compute an MAP estimate of p by assuming a prior distribution over p. Which of the following statement(s) is/are true?
- If the prior is Beta(2,6), we will likely require fewer samples for converging to the true value than if the prior is Beta(6,2).
- If the prior is Beta(6,2), we will likely require fewer samples for converging to the true value than if the prior is Beta(2,6).
- With a prior of Beta(2,100), the estimate will never converge to the true value, regardless of the number of samples used.
- With a prior of U(0,0.5)(i.e. uniform distribution between 0 and 0.5), the estimate will never converge to the true value, regardless of the number of samples used.
Answer :- b, d
8. Which of the following statement(s) about parameter estimation techniques is/are true?
- To obtain a distribution over the predicted values for a new data point, we need to compute an integral over the parameter space.
- The MAP estimate of the parameter gives a point prediction for a new data point.
- The MLE of a parameter gives a distribution of predicted values for a new data point.
- We need a point estimate of the parameter to compute a distribution of the predicted values for a new data point.
Answer :- a, b