Introduction to Machine Learning Week 12 NPTEL Assignment Answers 2025

NPTEL Introduction to Machine Learning Week 12 Assignment Answers 2024

1. Statement 1: Empirical error is always greater than generalisation error.
Statement 2: Training data and test data have different underlying(true) distributions.
Choose the correct option:

  • Statement 1 is true. Statement 2 is true. Statement 2 is the correct reason for statemnet 1.
  • Statement 1 is true. Statement 2 is true. Statement 2 is not the correct reason for statemnet 1.
  • Statement 1 is true. Statement 2 is false.
  • Both statements are false.
Answer :- b

3. Which of the following is/are the shortcomings of TD Learning that Q-learning resolves?

  • TD learning cannot provide values for (state, action) pairs, limiting the ability to extract an optimal policy directly
  • TD learning requires knowledge of the reward and transition functions, which is not always available
  • TD learning is computationally expensive and slow compared to Q-learning
  • TD learning often suffers from high variance in value estimation, leading to unstable learning
  • TD learning cannot handle environments with continuous state and action spaces effectively
Answer :- a, d

5. The VC dimension of a pair of squares is:

  • 3
  • 4
  • 5
  • 6
Answer :- a

6. What is V(X4) after one application of the given formula?

  • 1
  • 0.9
  • 0.81
  • 0
Answer :- b

7. What is V(X1) after one application of given formula?

  • -1
  • -0.9
  • -0.81
  • 0
Answer :- d

8. What is V(X1) after V converges?

  • 0.54
  • -0.9
  • 0.63
  • 0
Answer :- d

10. In games like Chess or Ludo, the transition function is known to us. But what about Counter Strike or Mortal Combat or Super Mario? In games where we do not know T, we can only query the game simulator with current state and action, and it returns the next state. This means we cannot directly argmax or argmin for V(T(S,a)). Therefore, learning the value function V is not sufficient to construct a policy. Which of these could we do to overcome this? (more than 1 may apply)

Assume there exists a method to do each option. You have to judge whether doing it solves the stated problem.

  • Directly learn the policy
  • Learn a different function which stores value for state-action pairs (instead of only state like V does)
  • Learn T along with V
  • Run a random agent repeatedly till it wins. Use this as the winning policy
Answer :- a, b