Need help with this week’s assignment? Get detailed and trusted solutions for Deep Learning – IIT Ropar Week 4 NPTEL Assignment Answers. Our expert-curated answers help you solve your assignments faster while deepening your conceptual clarity.
✅ Subject: Deep Learning – IIT Ropar
📅 Week: 4
🎯 Session: NPTEL 2025 July-October
🔗 Course Link: Click Here
🔍 Reliability: Verified and expert-reviewed answers
📌 Trusted By: 5000+ Students
For complete and in-depth solutions to all weekly assignments, check out 👉 NPTEL Deep Learning – IIT Ropar Week 4 NPTEL Assignment Answers
🚀 Stay ahead in your NPTEL journey with fresh, updated solutions every week!
NPTEL Deep Learning – IIT Ropar Week 4 Assignment Answers 2025
1. You are training a neural network on a dataset with 5 million samples using Mini-Batch Gradient Descent. The mini-batch size is 500, and each parameter update takes 100 milliseconds. How many seconds will it take to complete 5 epochs of training?
- 5,000
- 10,000
- 2,500
- 50,000
Answer : See Answers
2. You are comparing training times using different gradient descent algorithms on a dataset with 1,000,000 data points. Each parameter update takes 2 milliseconds. How many milliseconds longer will Stochastic Gradient Descent take compared to Vanilla (Batch) Gradient Descent to complete 2 epochs?
- 4,000,004 ms
- 4,000,000 ms
- 3,999,994 ms
- 3,999,996 ms
Answer :
3. What is the most practical benefit of using smaller batch sizes on constrained devices?
- Reduces computation time significantly
- Increases training accuracy
- Minimizes memory usage and reduces overhead
- Allows larger models to be trained
Answer :
4. You reduce the batch size from 4,000 to 1,000. What happens to the number of weight updates per epoch?
- Doubles
- Quadruples
- Remains constant
- Halves
Answer :
5. Which of the following statements are true about mini-batch gradient descent?
- It offers a compromise between computation and accuracy
- It prevents gradient vanishing completely
- It allows parallelism in training
- It can still be prone to overfitting
Answer :
6. What could be the reason for slow learning in this scenario?
- Large Learning rate
- Very small gradients
- Very high momentum
- Incorrect label noise
Answer :
7. What optimizer would help improve learning in this situation?
- Vanilla Gradient Descent
- Mini-Batch SGD
- Momentum based Gradient Descent
- Adam with bias correction and adaptive learning rate
Answer :
8. If the above model takes 300 steps per epoch, then after 5 epochs the number of weight updates is ______________
Fill in the blank: _________
Answer : See Answers
9. Which of the following helps in handling small gradients?
- Reducing learning rate
- Using Adagrad
- Using adaptive optimizers like Adam
- Using large batch sizes
Answer :
10. What are the advantages of using momentum?
- Faster convergence
- Larger steps in the wrong direction
- Helps escape shallow minima
- Avoids oscillation in steep slopes
Answer :
11. Which of the following would not help in this scenario?
- Switch to adaptive gradient descent
- Add momentum
- Reduce learning rate
- Normalize input data
Answer :
12. If the learning rate η=0.01,momentum coefficient 𝛾 = 0.9, the current gradient at step 𝑡 is ∇wt=0.2, and the previous update was 0.1, then what is the value of the new update?
- 0.11
- 0.092
- 0.091
- 0.12
Answer :
13. A data scientist is analyzing how momentum affects learning speed over time. She initializes a weight w0=1.0, and uses the following parameters in momentum-based gradient descent:
Momentum coefficient 𝛾 = 0.8, Learning rate η=0.05 and Initial update update0=0 She receives gradients over 3 consecutive iterations: ∇w1=−0.5, ∇w2=−0.2, and ∇w3=−0.3
What is the value of the update at time 𝑡 = 3?
- 0.0172
- −0.0172
- 0.0216
- −0.009
Answer :
14. What are the benefits of using mini-batch over full batch.
- Less memory usage
- More frequent weight updates
- Higher computational cost
- Better generalization
Answer :
15. What is a likely cause of oscillations?
- Too low learning rate
- Batch size too small
- Too high learning rate
- No dropout
Answer : See Answers
16. Which technique helps reduce oscillations?
- Momentum
- Adagrad
- Weight decay
- None of the above
Answer :
17. Which optimizer allows you to peek ahead before computing the gradient?
- Adam
- Vanilla SGD
- Nesterov Accelerated Gradient
- Adagrad
Answer :
18. What happens if momentum is set to 1
- Model stops updating
- Model overshoots and diverges
- Model converges quickly
- No effect
Answer :
19. Which of the following are the advantages of mini-batch gradient descent over SGD:
- Reduces variance of updates
- Requires fewer epochs
- Faster convergence
- More computation per update
Answer :
20. What does the line search algorithm aim to optimize at every step of training?
- Batch size
- The cost function value along the gradient direction
- Momentum term
- Validation accuracy
Answer :
21. What is the key computational disadvantage of applying line search in every update?
- May overfit the data
- Many more computations in each step.
- Doesn’t converge
- Reduces gradient magnitude
Answer : See Answers
22. Which of the following schedules typically require setting two hyperparameters?
- Exponential decay
- 1/t decay
- Constant learning rate
- Step decay
Answer :
23. Exponential decay adjusts learning rate using which formula?

Answer :
24. Learning rate decay is typically used to:
- Fine-tune the model toward the end of training
- Avoid oscillation near minima
- Eliminate the need for momentum
- Control the impact of noisy gradients
Answer :
25. In step decay, the learning rate changes at _______________ intervals.
- Predefined
- One
- Random
- None of the above
Answer :
26. If you have 100,000 samples and batch size is 10,000, how many parameter updates happen in one epoch?
- 10
- 100
- 1000
- 1
Answer :
27. If N = 60,000 and batch size B = 5,000, the number of weight updates per epoch = ______________.
Answer :
28. Suppose you’re using Nesterov Accelerated Gradient and are at time step 𝑡. The current gradient at the look-ahead position is ∇wlook=0.3, the previous velocity (update) is updatet−1=0.2, and the hyperparameters are: 𝛾 = 0.8, 𝜂 = 0.05 .
What is the value of the current update updatet?
- 0.175
- 0.195
- 0.18
- 0.31
Answer :
29. You’re optimizing a neural network with NAG. At iteration 𝑡, you have: Current weight: wt=1.0, Previous update: updatet−1=0.25,γ=0.9,η=0.01, Gradient at look-ahead position: ∇wlook=−0.5.
- 0.78
- 0.775
- 0.79
- 0.77
Answer : See Answers


