Reinforcement Learning Week 5 NPTEL Assignment Answers 2025

Need help with this week’s assignment? Get detailed and trusted solutions for Reinforcement Learning Week 5 NPTEL Assignment Answers. Our expert-curated answers help you solve your assignments faster while deepening your conceptual clarity.

✅ Subject: Reinforcement Learning
📅 Week: 5
🎯 Session: NPTEL 2025 July-October
🔗 Course Link: Click Here
🔍 Reliability: Verified and expert-reviewed answers
📌 Trusted By: 5000+ Students

For complete and in-depth solutions to all weekly assignments, check out 👉 NPTEL Reinforcement Learning Week 5 NPTEL Assignment Answers

🚀 Stay ahead in your NPTEL journey with fresh, updated solutions every week!

NPTEL Reinforcement Learning Week 5 Assignment Answers 2025

1. In policy iteration, which of the following is/are true of the Policy Evaluation (PE) and Policy Improvement (PI) steps?

The values of states that are returned by PE may fluctuate between high and low values as the algorithm runs.
PE returns the fixed point of L_πn
PI can randomly select any greedy policy for a given value function vn.
Policy iteration always converges for a finite MDP.

Answer : See Answers

2. Consider Monte-Carlo approach for policy evaluation. Suppose the states are S₁,S₂,S₃,S₄,S₅,S₆
and terminalstate. You sample one trajectory as follows – S₁→S₅→S₃→S₆→terminalstate. Which among the following states can be updated from this sample?

S₁
S₂
S₆
S₄

Answer :

3. Which of the following statements are true with regards to Monte Carlo value approximation methods?

To evaluate a policy using these methods, a subset of trajectories in which all states are encountered at least once are enough to update all state-values.
Monte-Carlo value function approximation methods need knowledge of the full model.
Monte-Carlo methods update state-value estimates only at the end of an episode.
Monte-Carlo methods can only be used for episodic tasks.

Answer :

4. In every visit Monte Carlo methods, multiple samples for one state are obtained from a single trajectory. Which of the following is true?

There is an increase in bias of the estimates.
There is an increase in variance of the estimates.
It does not affect the bias or variance of estimates.
Both bias and variance of the estimates increase.

Answer :

5. Which of the following statements are FALSE about solving MDPs using dynamic programming?

If the state space is large or computation power is limited, it is preferred to update only those states that are seen in the trajectories.
Knowledge of transition probabilities is not necessary for solving MDPs using dynamic programming.
Methods that update only a subset of states at a time guarantee performance equal to or better than classic DP.

Answer : See Answers

6. Select the correct statements about Generalized Policy Iteration (GPI).

GPI lets policy evaluation and policy improvement interact with each other regardless of the details of the two processes.
Before convergence, the policy evaluation step will usually cause the policy to no longer be greedy with respect to the updated value function.
GPI converges only when a policy has been found which is greedy with respect to its own value function.
The policy found by GPI at convergence will be optimal but value function will not be optimal.

Answer :

7. What is meant by ”off-policy” Monte Carlo value function evaluation?

The policy being evaluated is the same as the policy used to generate samples.
The policy being evaluated is different from the policy used to generate samples.
The policy being learnt is different from the policy used to generate samples.
The policy being learnt is different from the policy used to generate samples.

Answer :

8. For both value and policy iteration algorithms we will get a sequence of vectors after some iterations, say v₁,v₂….v_n for value iteration and v′₁,v′₂…v′_n for policy iteration. Which of the following statements are true.

For all v_i ∈{v₁,v₂….v_n} there exists a policy for which v_i is a fixed point.
For all v′_i∈{v′₁,v′₂….v′_n} there exists a policy for which v′_i is a fixed point.
For all v_i ∈{v₁,v₂….v_n} there may not exist a policy for which v_i is a fixed point.
For all v′_i ∈{v′₁,v′₂….v′_n} there may not exist a policy for which v′_i is a fixed point.

Answer : See Answers

Reinforcement Learning Week 5 NPTEL Assignment Answers 2025

NPTEL Reinforcement Learning Week 5 Assignment Answers 2025

Important Links

Quick Links

NPTEL Reinforcement Learning Week 5 Assignment Answers 2025

Related Posts