Reinforcement Learning Week 5 NPTEL Assignment Answers 2025

Need help with this week’s assignment? Get detailed and trusted solutions for Reinforcement Learning Week 5 NPTEL Assignment Answers. Our expert-curated answers help you solve your assignments faster while deepening your conceptual clarity.

✅ Subject: Reinforcement Learning
📅 Week: 5
🎯 Session: NPTEL 2025 July-October
🔗 Course Link: Click Here
🔍 Reliability: Verified and expert-reviewed answers
📌 Trusted By: 5000+ Students

For complete and in-depth solutions to all weekly assignments, check out 👉 NPTEL Reinforcement Learning Week 5 NPTEL Assignment Answers

🚀 Stay ahead in your NPTEL journey with fresh, updated solutions every week!

NPTEL Reinforcement Learning Week 5 Assignment Answers 2025

1. In policy iteration, which of the following is/are true of the Policy Evaluation (PE) and Policy Improvement (PI) steps?

  • The values of states that are returned by PE may fluctuate between high and low values as the algorithm runs.
  • PE returns the fixed point of Lπn
  • PI can randomly select any greedy policy for a given value function vn.
  • Policy iteration always converges for a finite MDP.
Answer : See Answers

2. Consider Monte-Carlo approach for policy evaluation. Suppose the states are S1,S2,S3,S4,S5,S6
and terminalstate. You sample one trajectory as follows – S1→S5→S3→S6→terminalstate. Which among the following states can be updated from this sample?

  • S1
  • S2
  • S6
  • S4
Answer :

3. Which of the following statements are true with regards to Monte Carlo value approximation methods?

  • To evaluate a policy using these methods, a subset of trajectories in which all states are encountered at least once are enough to update all state-values.
  • Monte-Carlo value function approximation methods need knowledge of the full model.
  • Monte-Carlo methods update state-value estimates only at the end of an episode.
  • Monte-Carlo methods can only be used for episodic tasks.
Answer :

4. In every visit Monte Carlo methods, multiple samples for one state are obtained from a single trajectory. Which of the following is true?

  • There is an increase in bias of the estimates.
  • There is an increase in variance of the estimates.
  • It does not affect the bias or variance of estimates.
  • Both bias and variance of the estimates increase.
Answer :

5. Which of the following statements are FALSE about solving MDPs using dynamic programming?

  • If the state space is large or computation power is limited, it is preferred to update only those states that are seen in the trajectories.
  • Knowledge of transition probabilities is not necessary for solving MDPs using dynamic programming.
  • Methods that update only a subset of states at a time guarantee performance equal to or better than classic DP.
Answer : See Answers

6. Select the correct statements about Generalized Policy Iteration (GPI).

  • GPI lets policy evaluation and policy improvement interact with each other regardless of the details of the two processes.
  • Before convergence, the policy evaluation step will usually cause the policy to no longer be greedy with respect to the updated value function.
  • GPI converges only when a policy has been found which is greedy with respect to its own value function.
  • The policy found by GPI at convergence will be optimal but value function will not be optimal.
Answer :

7. What is meant by ”off-policy” Monte Carlo value function evaluation?

  • The policy being evaluated is the same as the policy used to generate samples.
  • The policy being evaluated is different from the policy used to generate samples.
  • The policy being learnt is different from the policy used to generate samples.
  • The policy being learnt is different from the policy used to generate samples.
Answer :

8. For both value and policy iteration algorithms we will get a sequence of vectors after some iterations, say v1,v2….vn for value iteration and v′1,v′2…v′n for policy iteration. Which of the following statements are true.

  • For all vi ∈{v1,v2….vn} there exists a policy for which vi is a fixed point.
  • For all v′i∈{v′1,v′2….v′n} there exists a policy for which v′i is a fixed point.
  • For all vi ∈{v1,v2….vn} there may not exist a policy for which vi is a fixed point.
  • For all v′i ∈{v′1,v′2….v′n} there may not exist a policy for which v′i is a fixed point.
Answer : See Answers