Need help with this week’s assignment? Get detailed and trusted solutions for Deep Learning – IIT Ropar Week 2 NPTEL Assignment Answers. Our expert-curated answers help you solve your assignments faster while deepening your conceptual clarity.
✅ Subject: Deep Learning – IIT Ropar
📅 Week: 2
🎯 Session: NPTEL 2025 July-October
🔗 Course Link: Click Here
🔍 Reliability: Verified and expert-reviewed answers
📌 Trusted By: 5000+ Students
For complete and in-depth solutions to all weekly assignments, check out 👉 NPTEL Deep Learning – IIT Ropar Week 2 NPTEL Assignment Answers
🚀 Stay ahead in your NPTEL journey with fresh, updated solutions every week!
NPTEL Deep Learning – IIT Ropar Week 2 Assignment Answers 2025
1. Consider a single perceptron shown below.w=1,b=−0.5.The perceptron uses a step activation function defined as

Predict the output for input values 0.51 and 0.49.
- 1, 0
- 0, 1
- 1, 1
- 0, 0
Answer : See Answers
2. You are given a Boolean function that is not linearly separable. Which of the following is true regarding its representation using a perceptron-based network?
- It can be represented using a single-layer perceptron if you increase the number of perceptrons.
- It requires at least one hidden layer in the network.
- It cannot be represented by any feedforward neural network.
- It can only be represented by a network with more than 2n perceptrons.
Answer :
3. As 𝑛 increases, representing all Boolean functions using a 2-layer perceptron becomes impractical due to:
- Increase in training data size
- Exponential increase in required hidden layer neurons
- Limitation in backpropagation algorithm
- Decrease in classification accuracy
Answer :
4. You are designing neural networks to represent Boolean functions. Consider the capabilities of single-layer and multi-layer perceptrons.
Which of the following statements are true?
- A single-layer perceptron can represent all linearly separable Boolean functions.
- XOR requires at least one hidden layer to be represented.
- A network with 2n hidden neurons and one output neuron can represent all Boolean functions over n inputs.
- A single-layer perceptron can represent the XOR function if the learning rate is set appropriately.
Answer :
5. You are given a neural network with 2 inputs, a hidden layer with 4 perceptrons, and one output neuron. The hidden neurons are designed to fire for specific input patterns like {–1, +1}, etc.
Which of the following are true about such a network?
- It can represent linearly non-separable functions like XOR.
- The network uses hidden neurons to convert a non-linearly separable function into linearly separable subproblems.
- Removing any one hidden neuron will not affect the network’s ability to represent XOR.
- This network must use sigmoid activation in the hidden layer to implement XOR.
Answer :
6. You are designing a spam filter using a perceptron. Some input features (like the presence of the word “FREE”) are not linearly separable from others. Which architecture is most appropriate for learning from such data?
- Single-layer perceptron with more training data
- Multi-layer perceptron with hidden neurons
- Removing the non-linearly separable features
- Output layer with more neurons
Answer :
7. You are given an arbitrary Boolean function defined over 4 binary inputs. Which of the following neural network architectures is guaranteed to represent this function?
- One perceptron
- A network with 4 hidden neurons
- A network with 16 hidden neurons and one output perceptron
- A network with 5 output neurons
Answer : See Answers
8. For a single input value x=1.5,w=2,b=−1,compute the output of the sigmoid neuron up to 2 decimal places.
Fill the blank: ______________
Answer :
9.

- Upwards
- Leftwards
- Downwards
- Rightwards
Answer :
10. Which of the following statements are true?
I. Logistic function is smooth and continuous
II. Logistic function is differentiable.
- Only Statement I is true
- Only Statement II is true
- Both statements I and II are true
- None of the above
Answer :
11. Which of the following statements are true about learning algorithms?
I. Learning algorithms always maximize a loss function
II. Learning algorithms learn parameters from data
- Only Statement I is true
- Only Statement II is true
- Both statements I and II are true
- None of the above
Answer :
12. Consider a neural network with 12 input features, a hidden layer with 8 neurons, and a single output neuron. All layers are fully connected, and biases are included in both the hidden and output layers.
How many gradients must be computed during backpropagation?
- 101
- 110
- 105
- 113
Answer :
13. You are evaluating a regression model on a dataset of 3 points. The actual target values and predicted outputs from your model are given below.


What is the MSE for this model on the given dataset?
- 1.00
- 0.67
- 0.33
- 2.00
Answer :
14.

Answer :
15.

Answer : See Answers
16. You are comparing two models for different function learning tasks:
Model A: A multilayer network of perceptrons
Model B: A multilayer network of sigmoid neurons
Task 1: Learn a Boolean function (like XOR)
Task 2: Learn a continuous function (like sin(x))
Which of the following statements is most appropriate?
- Model A can represent both tasks with high precision
- Model A is better for Task 1, Model B is better for Task 2
- Model B can approximate both Task 1 and Task 2 outputs, but not represent Task 1 exactly
- Both models are equivalent in their representation abilities
Answer :
17. A neural network is trained to predict customer churn based on multiple features: age, contract duration, and monthly charges. After training, you observe that the weight associated with the monthly charges feature is close to zero, while the others have larger magnitudes.
What is the most reasonable inference?
- Monthly charges had missing values in training data
- Monthly charges were not normalized correctly
- Monthly charges may not have contributed significantly to the model’s prediction
- The learning rate was too high for that feature
Answer :
18. You are building a neural network-based fraud detection system. A sigmoid neuron receives three inputs:
x1: transaction amount
x2: number of transactions in last hour
x3: time of transaction
After training, the learned weights are:
w1=3.2,w2=0.05,w3=−0.02
Assume all input features have been scaled to a similar range (for example, between 0 and 1).
Which of the following is the most reasonable conclusion?
- The time of transaction is the most important feature
- number of transactions in last hour is the most important feature
- The transaction amount is a highly influential feature
- The sigmoid neuron is not functioning properly
Answer :
19. You are optimizing a function f(x)=x2−x+2 using gradient descent. Let the learning rate be η=0.01, and the value of x at a step t be xt. Which of the following gives the correct value of x at step t+1 after one update using gradient descent?
- xt+1=xt−0.01(2xt−1)
- xt+1=xt+0.01(2xt)
- xt+1=xt−(2xt−1)
- xt+1=xt−0.01(xt−1)
Answer :
20. Let f(x)=x3−4x+1. You are using gradient descent with learning rate η=0.1.
What is the correct update rule for x at step t+1, given that xt is the current value?
- xt+1=xt−0.1⋅(3x2t−4)
- xt+1=xt−0.1⋅(3x2t+4)
- xt+1=xt+0.1⋅(3x2t−4)
- xt+1=xt+0.1⋅(3x2t+4)
Answer :
21. In a temperature calibration model, the function f(T,x)=T2+5x+20 models the system deviation, where T is the temperature input and x is a sensor setting. Suppose gradient descent with a learning rate of 1 is used to minimize the deviation. The process starts at (T,x)=(0,0).
What will be the value of T after 10 iterations?
- 50
- -10
- 5
- 0
Answer : See Answers
22.

Answer :
23.

Answer :
24. You train a logistic regression model for spam classification with labels 1(spam) and 0 (not spam). After training, the model has learned a weight vector such that
wTx=2.5
Which of the following can be correctly inferred about the model’s prediction?
- The predicted probability of class 1 is greater than 0.5
- The predicted label is 1
- The predicted label is 0
- The value of wTx irrelevant to prediction
Answer :
25. You are designing a binary classifier using logistic regression. The model has learned the weight vector w=[−3,4] and no bias term is used.
If a new point x=[1,1] is evaluated, what will be the model output and prediction?
- The predicted label is 1
- The predicted label is 0
- The model output cannot be determined without a bias term
- The model output is undefined for input [1, 1]
Answer :
26.

Answer :
27.


Based on the curve, what can you infer about the parameters w and b?
- w is close to 0 and b is large
- w is large and b is small
- w is large and b is large
- w is small and b is negative
Answer :
28.

Which of the following changes would make the curve transition more sharply (closer to a step function)?
- Increase b
- Increase w
- Decrease w
- Set b=0
Answer :
29. Why is Sum of Squared Errors (SSE) considered better than Sum of Errors (SE) in many learning scenarios?
- SSE ensures that positive and negative errors do not cancel each other out
- SSE magnifies larger errors, making the model more sensitive to outliers
- The derivative of SSE with respect to prediction is simple and continuous
- SSE always leads to better accuracy than SE
- Sum of errors can be zero even when individual predictions are wrong
Answer :
30. Statement I: Any linearly separable function can be represented using a singlelayer perceptron.
Statement II: A single sigmoid neuron can approximate any Boolean function with zero error.
Which of the above statements is/are correct?
- Only I
- Only II
- Both I and II
- None
Answer :
31. You are given a multi-layer perceptron with one hidden layer consisting of 8 perceptrons and a single output neuron. Each perceptron in the hidden layer outputs either 0 or 1 based on its input.
Which of the following statements is true about the function capacity of this network?
- The network is capable of implementing 28 Boolean functions
- The network is capable of implementing 264 Boolean functions
- The output neuron receives a continuous-valued input
- Each hidden neuron produces 64 possible outputs
Answer : See Answers
NPTEL Deep Learning – IIT Ropar Week 2 Assignment Answers 2024
1. How many boolean functions can be designed for 3 inputs?
8
16
256
64
Answer :- 256
Each Boolean function for n inputs has 22n2^{2^n}22n possible combinations. For 3 inputs, that’s 223=28=2562^{2^3} = 2^8 = 256223=28=256.
2. How many weights does a neural network have if it consists of an input layer with 2 neurons, three hidden layers each with 4 neurons, and an output layer with 2 neurons? Assume there are no bias terms in the network.
48
36
50
44
Answer :- 48
Layer 1 (input to hidden): 2×4 = 8
Layer 2 (hidden to hidden): 4×4 = 16
Layer 3 (hidden to hidden): 4×4 = 16
Layer 4 (hidden to output): 4×2 = 8
Total = 8 + 16 + 16 + 8 = 48
3. A function f(x) is approximated using 250 tower functions. What is the minimum number of neurons required to construct the network that approximates the function?
250
249
251
500
750
501
Answer :- 501
Each tower function needs 2 neurons. So, 250 × 2 + 1 (final output neuron) = 501 neurons.
4. Given the following input values to a sigmoid neuron: x1: 0.72, x2: 0.49, x3: 0.08, x4: 0.53, and x5: 0.27, what labels will the sigmoid neuron predict for these inputs? (Answer in sequence from x1 to x5).
[0, 1, 1, 1, 1]
[1, 0, 0, 1, 0]
[0, 1, 0, 1, 0]
[1, 1, 0, 1, 0]
Answer :- [1, 0, 0, 1, 0]
If sigmoid output > 0.5, output is 1; else 0. Based on the inputs and assuming a threshold at 0.5, this pattern matches.
5. How many boolean functions can be designed for 4 inputs?
65,536
8
256
64
Answer :- 65,536
Using the formula 22n2^{2^n}22n, for 4 inputs: 224=216=65,5362^{2^4} = 2^{16} = 65,536224=216=65,536
6. We have a function that we want to approximate using 150 rectangles (towers). How many neurons are required to construct the required network?
301
451
150
500
Answer :- 301
Each tower needs 2 neurons. 150 towers × 2 = 300, plus 1 final output neuron = 301.
7. What happens to the output of the sigmoid function as |x| becomes very large for input x? Select all relevant operations.
The output approaches 0.5
The output approaches 1
The output oscillates between 0 and 1
The output approaches 0
Answer :- The output approaches 1, The output approaches 0
As x → +∞, sigmoid(x) → 1; as x → −∞, sigmoid(x) → 0.
8. We have a classification problem with labels 0 and 1. We train a logistic model and find out that ω₀ learned by our model is -17. We are to predict the label of a new test point x using this trained model. If ωᵀx = 1, which of the following statements is True?
We cannot make any prediction as the value of ωᵀx does not make sense
The label of the test point is 0
The label of the test point is 1
We cannot make any prediction as we do not know the value of x
Answer :- The label of the test point is 0
z = ω₀ + ωᵀx = -17 + 1 = -16 → sigmoid(-16) ≈ 0 → label = 0
10. Suppose we have a function f(x₁, x₂) = x₁² + 3x₂ + 25 which we want to minimize using the gradient descent algorithm. We initialize (x₁, x₂) = (0,0). What will be the value of x₁ after ten updates in the gradient descent process? (Let η be 1)
0
-3
−4.5
−3
Answer :- 0
The partial derivative ∂f/∂x₁ = 2x₁ → at x₁ = 0, derivative is 0 → x₁ remains 0 throughout all updates.


