Skip to content

Commit aee6bdb

Browse files
github-actions[bot]FAQ Bot
andauthored
UPDATE: Where does the number of input features to the first Linear layer after (#44)
Co-authored-by: FAQ Bot <faq-bot@datatalks.club>
1 parent 99b18c0 commit aee6bdb

File tree

1 file changed

+74
-32
lines changed

1 file changed

+74
-32
lines changed
Lines changed: 74 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -1,61 +1,103 @@
11
---
22
id: 52fbe7351e
3-
question: Where does the number of Conv2d layer’s params come from? Where does the
4-
number of 'features' we get after the Flatten layer come from?
3+
question: Where does the number of input features to the first Linear layer after
4+
a CNN/Flatten come from, and how can I determine it reliably?
55
sort_order: 26
66
---
77

8-
Let's say we define our Conv2d layer like this:
8+
The number of features input to the first Linear layer (the in_features for the Linear) is the size of the tensor after the CNN/pooling layers and after Flatten. In practice, there are several reliable ways to determine this without manually deriving the dimensions for every layer.
9+
10+
### 1) Use a model summary utility (recommended)
11+
- torchinfo (formerly torch-summary) can show the output shape of each layer, ending with the Flatten, so you can read off the number of features.
912

1013
```python
11-
tf.keras.layers.Conv2D(32, (3,3), activation='relu', input_shape=(150, 150, 3))
12-
```
14+
# Install and import
15+
!pip install torchinfo
16+
from torchinfo import summary
1317

14-
This means our input image is RGB (3 channels, 150 by 150 pixels), the kernel is 3x3, and the number of filters (layer’s width) is 32.
18+
input_size = (1, 3, 150, 150) # batch size, channels, height, width
19+
model = CNN()
20+
summary(model, input_size=input_size)
21+
```
1522

16-
If we check `model.summary()` we will get this:
23+
Output will include an entry for the Flatten layer with its output size, e.g. `(1, 10000)` which indicates `in_features` should be 10000 for the next Linear layer.
1724

25+
> Note: If you prefer the older torchsummary, you can use it similarly, but be mindful of the batch dimension when supplying `input_size`.
1826
```
19-
_________________________________________________________________
20-
Layer (type) Output Shape Param #
21-
=================================================================
22-
conv2d (Conv2D) (None, 148, 148, 32) 896
27+
from torchsummary import summary
28+
summary(model, input_size=(3, 150, 150))
2329
```
2430

25-
So where do 896 params come from? It’s computed like this:
31+
### 2) Forward a dummy input through the CNN up to Flatten
32+
Create a model that runs the CNN portion and returns the flattened features, then inspect the feature dimension.
2633

2734
```python
28-
(3*3*3 + 1) * 32
35+
model = CNN() # as defined in your example
36+
# Create a dummy input matching the CNN's expected input
37+
dummy_input = torch.randn(1, 3, 150, 150)
38+
# Forward through the CNN (including Flatten)
39+
out = model(dummy_input)
40+
# in_features for the first Linear layer
41+
in_features_calc = out.size(1)
42+
print(in_features_calc) # e.g., 10000
2943
```
3044

31-
This results in 896:
45+
This yields the exact value you should use for `nn.Linear(in_features_calc, ...)`.
3246

33-
- 3x3 kernel
34-
- 3 channels RGB
35-
- +1 for bias
36-
- 32 filters
47+
### 3) Use a LazyLinear layer for automatic in_features inference
48+
If you want to avoid computing the exact dimension, you can use a lazy linear layer for the first fully connected layer:
3749

50+
```python
51+
import torch.nn as nn
52+
class CNNWithLazy(nn.Module):
53+
def __init__(self):
54+
super().__init__()
55+
self.conv1 = nn.Conv2d(3, 16, kernel_size=(2, 2), stride=2, padding=2)
56+
self.relu = nn.ReLU()
57+
self.maxpool = nn.MaxPool2d((3, 3))
58+
self.flatten_dim = None # will be inferred by LazyLinear
59+
self.fc = nn.LazyLinear(out_features=10) # plans to learn 10 classes, features inferred
60+
def forward(self, x):
61+
x = self.conv1(x)
62+
x = self.relu(x)
63+
x = self.maxpool(x)
64+
x = torch.flatten(x, 1)
65+
x = self.fc(x)
66+
return x
67+
```
3868

39-
Number of 'Features' after the Flatten Layer
69+
> The `nn.LazyLinear` will infer the required `in_features` from the input tensor size during the first forward pass.
4070
41-
For our homework, `model.summary()` for the last MaxPooling2d and Flatten layers looked like this:
71+
### 4) Reason about shapes with a concrete example
72+
Consider a CNN where:
73+
- Input: 3 channels, 150x150 image
74+
- After Conv2d(in_channels=3, out_channels=16, kernel_size=(2,2), stride=2, padding=2): output shape depends on padding/stride
75+
- After ReLU and MaxPool2d((3,3)): final feature map size could be e.g. [N, 16, 25, 25]
76+
- Flatten yields 16 * 25 * 25 = 10000 features
4277

43-
```
44-
_________________________________________________________________
45-
Layer (type) Output Shape Param #
46-
=================================================================
47-
max_pooling2d_3 (None, 7, 7, 128) 0
48-
flatten (Flatten) (None, 6272) 0
78+
```python
79+
# Example confirmation
80+
# Suppose after the CNN layers you get a flattened vector of length 10000
81+
num_features = 10000
82+
self.fc = nn.Linear(num_features, num_classes)
4983
```
5084

51-
So where do 6272 vectors come from? It’s computed like this:
85+
In this scenario, the next linear layer should be defined with `in_features = 10000`.
5286

87+
### Practical example with your CNN from the prompt
88+
Given:
89+
- Final pooled feature map size: [N, 16, 25, 25]
90+
- Flatten to [N, 16*25*25] = [N, 10000]
91+
92+
Therefore, the first linear layer should be defined as:
5393
```python
54-
7*7*128
94+
self.fc1 = nn.Linear(10000, num_classes)
5595
```
5696

57-
This results in 6272:
58-
59-
- 7x7 "image shape" after several convolutions and poolings
60-
- 128 filters
97+
Note: If your input size or layer parameters differ, recompute the Flatten output accordingly using any of the methods above. The key concept is: in_features equals the total number of elements in the flattened feature tensor per sample (i.e., batch-independent dimension).
6198

99+
### Summary
100+
- The direct geometric/product approach is accurate but can be error-prone for complex architectures.
101+
- Use `torchinfo.summary` or a forward pass with a dummy input to read off or compute the flattened feature size.
102+
- Alternatively, use `nn.LazyLinear` to infer `in_features` automatically.
103+
- Always ensure your chosen `in_features` matches the actual tensor size just before the first `nn.Linear` to avoid shape mismatches.

0 commit comments

Comments
 (0)