Reinforcement Learning Week 7 NPTEL Assignment Answers 2025

Need help with this week’s assignment? Get detailed and trusted solutions for Reinforcement Learning Week 7 NPTEL Assignment Answers. Our expert-curated answers help you solve your assignments faster while deepening your conceptual clarity.

✅ Subject: Reinforcement Learning
📅 Week: 7
🎯 Session: NPTEL 2025 July-October
🔗 Course Link: Click Here
🔍 Reliability: Verified and expert-reviewed answers
📌 Trusted By: 5000+ Students

For complete and in-depth solutions to all weekly assignments, check out 👉 NPTEL Reinforcement Learning Week 7 NPTEL Assignment Answers

🚀 Stay ahead in your NPTEL journey with fresh, updated solutions every week!

NPTEL Reinforcement Learning Week 7 Assignment Answers 2025

1. Which of the following is the corrected n-step truncated return?

  • Rt+nnVt(st+n)
  • Rt+1+γRt+22(Rt+3)+…+γn−1Rt+nnVt(st+n)
  • γRt+12Rt+23(Rt+3)+…+γnRt+nn+1Vt(st+n)
  • None of the above.
Answer : See Answers

2. Suppose that in a particular problem, the agent keeps going back to the same state in a loop. What is the maximum value that can be taken by the eligibility trace of such a state if we consider accumulating traces with λ=0.5 and γ=0.5?

  • 0.5
  • 5
  • 1.33
  • 3
Answer :

3. Consider the TD(λ) algorithm. Which of these is true when λ=1 and γ=1?

  • The method behaves like a Monte Carlo method for an undiscounted, episodic task.
  • The value of all states are updated by the TD error in each episode
  • Eligibility traces do not decay with time
  • None of the above
Answer :

4. In solving the control problem, suppose that the first action that is taken is not an optimal action according to the current policy at the start of an episode. Would an update be made corresponding to this action and the subsequent reward received in Watkin’s Q(λ) algorithm?

  • Yes
  • No
Answer :

5. Given the following sequence of states observed from the beginning of an episode,

            s2,s1,s3,s1,s3,s2,s1,s6

What is the eligibility value, e7(s1) of state s1 at time step 7, given trace decay parameter λ, discount rate γ
, initial value e0(s1)=0, when accumulating traces are used?

  • γ7λ7
  • (γλ)6+(γλ)4+(λγ)
  • γ((γλ)4+(γλ)6+(λγ)2)
  • None of these
Answer :

6. For the above question, what is the eligibility value if replacing traces are used?

  • γ7λ7
  • γ6λ6
  • γλ+γ4λ46λ6
  • γλ
Answer : See Answers

7. State True or False:
The idea in Sarsa(λ) is to apply the TD(λ) prediction method to just the states rather than to state-action pairs.

  • True
  • False
Answer :

8. Assertion: Eligibility traces provide a way to implement Monte Carlo algorithm in an incremental fashion.
Reason: The λ-return can be set to Monte Carlo return, which can be implemented with eligibility traces.

  • Assertion and Reason are both true and Reason is a correct explanation of Assertion
  • Assertion and Reason are both true and Reason is not a correct explanation of Assertion
  • Assertion is true and Reason is false
  • Both Assertion and Reason are false
Answer :

9.

Answer :

10. Considering episodic tasks and for λ ∈ (0, 1), is it true that the one-step return always gets assigned the maximum weight in the λ-return?

  • Yes
  • No
Answer : See Answers