Deep Learning – IIT Ropar Week 4 NPTEL Assignment Answers 2025

Need help with this week’s assignment? Get detailed and trusted solutions for Deep Learning – IIT Ropar Week 4 NPTEL Assignment Answers. Our expert-curated answers help you solve your assignments faster while deepening your conceptual clarity.

✅ Subject: Deep Learning – IIT Ropar
📅 Week: 4
🎯 Session: NPTEL 2025 July-October
🔗 Course Link: Click Here
🔍 Reliability: Verified and expert-reviewed answers
📌 Trusted By: 5000+ Students

For complete and in-depth solutions to all weekly assignments, check out 👉 NPTEL Deep Learning – IIT Ropar Week 4 NPTEL Assignment Answers

🚀 Stay ahead in your NPTEL journey with fresh, updated solutions every week!

NPTEL Deep Learning – IIT Ropar Week 4 Assignment Answers 2025

1. You are training a neural network on a dataset with 5 million samples using Mini-Batch Gradient Descent. The mini-batch size is 500, and each parameter update takes 100 milliseconds. How many seconds will it take to complete 5 epochs of training?

5,000
10,000
2,500
50,000

Answer : See Answers

2. You are comparing training times using different gradient descent algorithms on a dataset with 1,000,000 data points. Each parameter update takes 2 milliseconds. How many milliseconds longer will Stochastic Gradient Descent take compared to Vanilla (Batch) Gradient Descent to complete 2 epochs?

4,000,004 ms
4,000,000 ms
3,999,994 ms
3,999,996 ms

Answer :

3. What is the most practical benefit of using smaller batch sizes on constrained devices?

Reduces computation time significantly
Increases training accuracy
Minimizes memory usage and reduces overhead
Allows larger models to be trained

Answer :

4. You reduce the batch size from 4,000 to 1,000. What happens to the number of weight updates per epoch?

Doubles
Quadruples
Remains constant
Halves

Answer :

5. Which of the following statements are true about mini-batch gradient descent?

It offers a compromise between computation and accuracy
It prevents gradient vanishing completely
It allows parallelism in training
It can still be prone to overfitting

Answer :

6. What could be the reason for slow learning in this scenario?

Large Learning rate
Very small gradients
Very high momentum
Incorrect label noise

Answer :

7. What optimizer would help improve learning in this situation?

Vanilla Gradient Descent
Mini-Batch SGD
Momentum based Gradient Descent
Adam with bias correction and adaptive learning rate

Answer :

8. If the above model takes 300 steps per epoch, then after 5 epochs the number of weight updates is ______________
Fill in the blank: _________

Answer : See Answers

9. Which of the following helps in handling small gradients?

Reducing learning rate
Using Adagrad
Using adaptive optimizers like Adam
Using large batch sizes

Answer :

10. What are the advantages of using momentum?

Faster convergence
Larger steps in the wrong direction
Helps escape shallow minima
Avoids oscillation in steep slopes

Answer :

11. Which of the following would not help in this scenario?

Switch to adaptive gradient descent
Add momentum
Reduce learning rate
Normalize input data

Answer :

12. If the learning rate η=0.01,momentum coefficient 𝛾 = 0.9, the current gradient at step 𝑡 is ∇w_t=0.2, and the previous update was 0.1, then what is the value of the new update?

0.11
0.092
0.091
0.12

Answer :

13. A data scientist is analyzing how momentum affects learning speed over time. She initializes a weight w0=1.0, and uses the following parameters in momentum-based gradient descent:
Momentum coefficient 𝛾 = 0.8, Learning rate η=0.05 and Initial update update0=0 She receives gradients over 3 consecutive iterations: ∇w₁=−0.5, ∇w₂=−0.2, and ∇w₃=−0.3
What is the value of the update at time 𝑡 = 3?

0.0172
−0.0172
0.0216
−0.009

Answer :

14. What are the benefits of using mini-batch over full batch.

Less memory usage
More frequent weight updates
Higher computational cost
Better generalization

Answer :

15. What is a likely cause of oscillations?

Too low learning rate
Batch size too small
Too high learning rate
No dropout

Answer : See Answers

16. Which technique helps reduce oscillations?

Momentum
Adagrad
Weight decay
None of the above

Answer :

17. Which optimizer allows you to peek ahead before computing the gradient?

Adam
Vanilla SGD
Nesterov Accelerated Gradient
Adagrad

Answer :

18. What happens if momentum is set to 1

Model stops updating
Model overshoots and diverges
Model converges quickly
No effect

Answer :

19. Which of the following are the advantages of mini-batch gradient descent over SGD:

Reduces variance of updates
Requires fewer epochs
Faster convergence
More computation per update

Answer :

20. What does the line search algorithm aim to optimize at every step of training?

Batch size
The cost function value along the gradient direction
Momentum term
Validation accuracy

Answer :

21. What is the key computational disadvantage of applying line search in every update?

May overfit the data
Many more computations in each step.
Doesn’t converge
Reduces gradient magnitude

Answer : See Answers

22. Which of the following schedules typically require setting two hyperparameters?

Exponential decay
1/t decay
Constant learning rate
Step decay

Answer :

23. Exponential decay adjusts learning rate using which formula?

Answer :

24. Learning rate decay is typically used to:

Fine-tune the model toward the end of training
Avoid oscillation near minima
Eliminate the need for momentum
Control the impact of noisy gradients

Answer :

25. In step decay, the learning rate changes at _______________ intervals.

Predefined
One
Random
None of the above

Answer :

26. If you have 100,000 samples and batch size is 10,000, how many parameter updates happen in one epoch?

10
100
1000
1

Answer :

27. If N = 60,000 and batch size B = 5,000, the number of weight updates per epoch = ______________.

Answer :

28. Suppose you’re using Nesterov Accelerated Gradient and are at time step 𝑡. The current gradient at the look-ahead position is ∇w_look=0.3, the previous velocity (update) is update_t−1=0.2, and the hyperparameters are: 𝛾 = 0.8, 𝜂 = 0.05 .
What is the value of the current update updatet?

0.175
0.195
0.18
0.31

Answer :

29. You’re optimizing a neural network with NAG. At iteration 𝑡, you have: Current weight: w_t=1.0, Previous update: update_t−1=0.25,γ=0.9,η=0.01, Gradient at look-ahead position: ∇w_look=−0.5.

0.78
0.775
0.79
0.77

Answer : See Answers

Deep Learning – IIT Ropar Week 4 NPTEL Assignment Answers 2025

NPTEL Deep Learning – IIT Ropar Week 4 Assignment Answers 2025

Important Links

Quick Links

NPTEL Deep Learning – IIT Ropar Week 4 Assignment Answers 2025

Related Posts