Deep Learning – IIT Ropar Week 4 NPTEL Assignment Answers 2025

Need help with this week’s assignment? Get detailed and trusted solutions for Deep Learning – IIT Ropar Week 4 NPTEL Assignment Answers. Our expert-curated answers help you solve your assignments faster while deepening your conceptual clarity.

✅ Subject: Deep Learning – IIT Ropar
📅 Week: 4
🎯 Session: NPTEL 2025 July-October
🔗 Course Link: Click Here
🔍 Reliability: Verified and expert-reviewed answers
📌 Trusted By: 5000+ Students

For complete and in-depth solutions to all weekly assignments, check out 👉 NPTEL Deep Learning – IIT Ropar Week 4 NPTEL Assignment Answers

🚀 Stay ahead in your NPTEL journey with fresh, updated solutions every week!

NPTEL Deep Learning – IIT Ropar Week 4 Assignment Answers 2025

1. You are training a neural network on a dataset with 5 million samples using Mini-Batch Gradient Descent. The mini-batch size is 500, and each parameter update takes 100 milliseconds. How many seconds will it take to complete 5 epochs of training?

  • 5,000
  • 10,000
  • 2,500
  • 50,000
Answer : See Answers

2. You are comparing training times using different gradient descent algorithms on a dataset with 1,000,000 data points. Each parameter update takes 2 milliseconds. How many milliseconds longer will Stochastic Gradient Descent take compared to Vanilla (Batch) Gradient Descent to complete 2 epochs?

  • 4,000,004 ms
  • 4,000,000 ms
  • 3,999,994 ms
  • 3,999,996 ms
Answer :

3. What is the most practical benefit of using smaller batch sizes on constrained devices?

  • Reduces computation time significantly
  • Increases training accuracy
  • Minimizes memory usage and reduces overhead
  • Allows larger models to be trained
Answer :

4. You reduce the batch size from 4,000 to 1,000. What happens to the number of weight updates per epoch?

  • Doubles
  • Quadruples
  • Remains constant
  • Halves
Answer :

5. Which of the following statements are true about mini-batch gradient descent?

  • It offers a compromise between computation and accuracy
  • It prevents gradient vanishing completely
  • It allows parallelism in training
  • It can still be prone to overfitting
Answer :

6. What could be the reason for slow learning in this scenario?

  • Large Learning rate
  • Very small gradients
  • Very high momentum
  • Incorrect label noise
Answer :

7. What optimizer would help improve learning in this situation?

  • Vanilla Gradient Descent
  • Mini-Batch SGD
  • Momentum based Gradient Descent
  • Adam with bias correction and adaptive learning rate
Answer :

8. If the above model takes 300 steps per epoch, then after 5 epochs the number of weight updates is ______________
Fill in the blank: _________

Answer : See Answers

9. Which of the following helps in handling small gradients?

  • Reducing learning rate
  • Using Adagrad
  • Using adaptive optimizers like Adam
  • Using large batch sizes
Answer :

10. What are the advantages of using momentum?

  • Faster convergence
  • Larger steps in the wrong direction
  • Helps escape shallow minima
  • Avoids oscillation in steep slopes
Answer :

11. Which of the following would not help in this scenario?

  • Switch to adaptive gradient descent
  • Add momentum
  • Reduce learning rate
  • Normalize input data
Answer :

12. If the learning rate η=0.01,momentum coefficient 𝛾 = 0.9, the current gradient at step 𝑡 is ∇wt=0.2, and the previous update was 0.1, then what is the value of the new update?

  • 0.11
  • 0.092
  • 0.091
  • 0.12
Answer :

13. A data scientist is analyzing how momentum affects learning speed over time. She initializes a weight w0=1.0, and uses the following parameters in momentum-based gradient descent:
Momentum coefficient 𝛾 = 0.8, Learning rate η=0.05 and Initial update update0=0 She receives gradients over 3 consecutive iterations: ∇w1=−0.5, ∇w2=−0.2, and ∇w3=−0.3
What is the value of the update at time 𝑡 = 3?

  • 0.0172
  • −0.0172
  • 0.0216
  • −0.009
Answer :

14. What are the benefits of using mini-batch over full batch.

  • Less memory usage
  • More frequent weight updates
  • Higher computational cost
  • Better generalization
Answer :

15. What is a likely cause of oscillations?

  • Too low learning rate
  • Batch size too small
  • Too high learning rate
  • No dropout
Answer : See Answers

16. Which technique helps reduce oscillations?

  • Momentum
  • Adagrad
  • Weight decay
  • None of the above
Answer :

17. Which optimizer allows you to peek ahead before computing the gradient?

  • Adam
  • Vanilla SGD
  • Nesterov Accelerated Gradient
  • Adagrad
Answer :

18. What happens if momentum is set to 1

  • Model stops updating
  • Model overshoots and diverges
  • Model converges quickly
  • No effect
Answer :

19. Which of the following are the advantages of mini-batch gradient descent over SGD:

  • Reduces variance of updates
  • Requires fewer epochs
  • Faster convergence
  • More computation per update
Answer :

20. What does the line search algorithm aim to optimize at every step of training?

  • Batch size
  • The cost function value along the gradient direction
  • Momentum term
  • Validation accuracy
Answer :

21. What is the key computational disadvantage of applying line search in every update?

  • May overfit the data
  • Many more computations in each step.
  • Doesn’t converge
  • Reduces gradient magnitude
Answer : See Answers

22. Which of the following schedules typically require setting two hyperparameters?

  • Exponential decay
  • 1/t decay
  • Constant learning rate
  • Step decay
Answer :

23. Exponential decay adjusts learning rate using which formula?

Answer :

24. Learning rate decay is typically used to:

  • Fine-tune the model toward the end of training
  • Avoid oscillation near minima
  • Eliminate the need for momentum
  • Control the impact of noisy gradients
Answer :

25. In step decay, the learning rate changes at _______________ intervals.

  • Predefined
  • One
  • Random
  • None of the above
Answer :

26. If you have 100,000 samples and batch size is 10,000, how many parameter updates happen in one epoch?

  • 10
  • 100
  • 1000
  • 1
Answer :

27. If N = 60,000 and batch size B = 5,000, the number of weight updates per epoch = ______________.

Answer :

28. Suppose you’re using Nesterov Accelerated Gradient and are at time step 𝑡. The current gradient at the look-ahead position is ∇wlook=0.3, the previous velocity (update) is updatet−1=0.2, and the hyperparameters are: 𝛾 = 0.8, 𝜂 = 0.05 .
What is the value of the current update updatet?

  • 0.175
  • 0.195
  • 0.18
  • 0.31
Answer :

29. You’re optimizing a neural network with NAG. At iteration 𝑡, you have: Current weight: wt=1.0, Previous update: updatet−1=0.25,γ=0.9,η=0.01, Gradient at look-ahead position: ∇wlook=−0.5.

  • 0.78
  • 0.775
  • 0.79
  • 0.77
Answer : See Answers