Need help with this week’s assignment? Get detailed and trusted solutions for Reinforcement Learning Week 4 NPTEL Assignment Answers. Our expert-curated answers help you solve your assignments faster while deepening your conceptual clarity.
✅ Subject: Reinforcement Learning
📅 Week: 4
🎯 Session: NPTEL 2025 July-October
🔗 Course Link: Click Here
🔍 Reliability: Verified and expert-reviewed answers
📌 Trusted By: 5000+ Students
For complete and in-depth solutions to all weekly assignments, check out 👉 NPTEL Reinforcement Learning Week 4 NPTEL Assignment Answers
🚀 Stay ahead in your NPTEL journey with fresh, updated solutions every week!
NPTEL Reinforcement Learning Week 4 Assignment Answers 2025
1. State True/False
The state transition graph for any MDP is a directed acyclic graph.
- True
- False
Answer : See Answers
2. Consider the following statements:
(i) The optimal policy of an MDP is unique.
(ii) We can determine an optimal policy for a MDP using only the optimal value function(v∗
), without accessing the MDP parameters.
(iii) We can determine an optimal policy for a given MDP using only the optimal q-value function(q∗
), without accessing the MDP parameters.
Which of these statements are false?
- Only (ii)
- Only (iii)
- Only (i), (ii)
- Only (i), (iii)
- Only (ii), (iii)
Answer :
3. Which of the following statements are true for a finite MDP? (Select all that apply).
- The Bellman equation of a value function of a finite MDP defines a contraction in Banach space (using the max norm).
- If 0≤γ<1, then the eigenvalues of γPπ are less than 1.
- We call a normed vector space ’complete’ if Cauchy sequences exist in that vector space.
- The sequence defined by vn=rπ+γPπvn−1 is a Cauchy sequence in Banach space (using the max norm). (Pπ is a stochastic matrix)
Answer :
4. Which of the following is a benefit of using RL algorithms for solving MDPs?
- They do not require the state of the agent for solving a MDP.
- They do not require the action taken by the agent for solving a MDP.
- They do not require the state transition probability matrix for solving a MDP.
- They do not require the reward signal for solving a MDP.
Answer :
5. Consider the following equations:

Which of the above are correct?
- Only (i)
- Only (i), (ii)
- Only (ii), (iii)
- Only (i), (iii)
- (i), (ii), (iii)
Answer :
6. What is true about the γ (discount factor) in reinforcement learning?
- Discount factor can be any real number
- The value of γ cannot affect the optimal policy
- The lower the value of gamma, the more myopic the agent gets, i.e the agent maximises rewards that it receives over a shorter horizon
Answer : See Answers
7. Consider the following statements for a finite MDP (I is an identity matrix with dimensions |S|×|S|(S is the set of all states) and Pπ is a stochastic matrix):
(i) MDP with stochastic rewards may not have a deterministic optimal policy.
(ii) There can be multiple optimal stochastic policies.
(iii) If 0≤γ<1, then rank of the matrix I−γPπ is equal to |S|.
(iv) If 0≤γ<1, then rank of the matrix I−γPπ is less than |S|.
Which of the above statements are true?
- Only (ii), (iii)
- Only (ii), (iv)
- Only (i), (iii)
- Only (i), (ii), (iii)
Answer :
8. Consider an MDP with 3 states A,B,C. At each state we can go to either of the two states. i.e if we are in state A then we can perform 2 actions, going to state B or C. The rewards for each transactions are r(A,B)=−3 (reward if we go from A to B), r(B,A)=−1, r(B,C)=8, r(C,B)=4, r(A,C)=0, r(C,A)=5, discount factor is 0.9. Find the fixed point of the value function for the policy π(A)=B (if we are in state A we choose the action to go to B) π(B)=C,π(C)=A.vπ([ABC])=? (round to 1 decimal place)
- [20.6, 21.8, 17.6]
- [30.4, 44.2, 32.4]
- [30.4, 37.2, 32.4]
- [21.6, 21.8, 17.6]
Answer :
9. Which of the following is not a valid norm function? (x is a D dimensional vector)

Answer :
10. For an operator L, which of the following properties must be satisfied by x for it to be a fixed point for L?(Multi-Correct)
- Lx=x
- L2x=x
- ∀λ>0Lx=λx
- None of the above
Answer : See Answers


