From f8201eac7ec098f1c7efd460079ea618ef905bb7 Mon Sep 17 00:00:00 2001
From: FAQ Bot <faq-bot@datatalks.club>
Date: Tue, 2 Dec 2025 08:05:06 +0000
Subject: [PATCH] UPDATE: Where does the number of input features to the first
 Linear layer after

---
 ...the-number-of-conv2d-layers-params-come.md | 106 ++++++++++++------
 1 file changed, 74 insertions(+), 32 deletions(-)

diff --git a/_questions/machine-learning-zoomcamp/module-8/026_52fbe7351e_where-does-the-number-of-conv2d-layers-params-come.md b/_questions/machine-learning-zoomcamp/module-8/026_52fbe7351e_where-does-the-number-of-conv2d-layers-params-come.md
index 32c5d618..0d25002c 100644
--- a/_questions/machine-learning-zoomcamp/module-8/026_52fbe7351e_where-does-the-number-of-conv2d-layers-params-come.md
+++ b/_questions/machine-learning-zoomcamp/module-8/026_52fbe7351e_where-does-the-number-of-conv2d-layers-params-come.md
@@ -1,61 +1,103 @@
 ---
 id: 52fbe7351e
-question: Where does the number of Conv2d layer’s params come from? Where does the
-  number of 'features' we get after the Flatten layer come from?
+question: Where does the number of input features to the first Linear layer after
+  a CNN/Flatten come from, and how can I determine it reliably?
 sort_order: 26
 ---
 
-Let's say we define our Conv2d layer like this:
+The number of features input to the first Linear layer (the in_features for the Linear) is the size of the tensor after the CNN/pooling layers and after Flatten. In practice, there are several reliable ways to determine this without manually deriving the dimensions for every layer.
+
+### 1) Use a model summary utility (recommended)
+- torchinfo (formerly torch-summary) can show the output shape of each layer, ending with the Flatten, so you can read off the number of features.
 
 ```python
- tf.keras.layers.Conv2D(32, (3,3), activation='relu', input_shape=(150, 150, 3))
-```
+# Install and import
+!pip install torchinfo
+from torchinfo import summary
 
-This means our input image is RGB (3 channels, 150 by 150 pixels), the kernel is 3x3, and the number of filters (layer’s width) is 32.
+input_size = (1, 3, 150, 150)  # batch size, channels, height, width
+model = CNN()
+summary(model, input_size=input_size)
+```
 
-If we check `model.summary()` we will get this:
+Output will include an entry for the Flatten layer with its output size, e.g. `(1, 10000)` which indicates `in_features` should be 10000 for the next Linear layer.
 
+> Note: If you prefer the older torchsummary, you can use it similarly, but be mindful of the batch dimension when supplying `input_size`.
 ```
-_________________________________________________________________
-Layer (type)                Output Shape              Param #
-=================================================================
-conv2d (Conv2D)             (None, 148, 148, 32)      896
+from torchsummary import summary
+summary(model, input_size=(3, 150, 150))
 ```
 
-So where do 896 params come from? It’s computed like this:
+### 2) Forward a dummy input through the CNN up to Flatten
+Create a model that runs the CNN portion and returns the flattened features, then inspect the feature dimension.
 
 ```python
-(3*3*3 + 1) * 32
+model = CNN()  # as defined in your example
+# Create a dummy input matching the CNN's expected input
+dummy_input = torch.randn(1, 3, 150, 150)
+# Forward through the CNN (including Flatten)
+out = model(dummy_input)
+# in_features for the first Linear layer
+in_features_calc = out.size(1)
+print(in_features_calc)  # e.g., 10000
 ```
 
-This results in 896:
+This yields the exact value you should use for `nn.Linear(in_features_calc, ...)`.
 
-- 3x3 kernel
-- 3 channels RGB
-- +1 for bias
-- 32 filters
+### 3) Use a LazyLinear layer for automatic in_features inference
+If you want to avoid computing the exact dimension, you can use a lazy linear layer for the first fully connected layer:
 
+```python
+import torch.nn as nn
+class CNNWithLazy(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.conv1 = nn.Conv2d(3, 16, kernel_size=(2, 2), stride=2, padding=2)
+        self.relu = nn.ReLU()
+        self.maxpool = nn.MaxPool2d((3, 3))
+        self.flatten_dim = None  # will be inferred by LazyLinear
+        self.fc = nn.LazyLinear(out_features=10)  # plans to learn 10 classes, features inferred
+    def forward(self, x):
+        x = self.conv1(x)
+        x = self.relu(x)
+        x = self.maxpool(x)
+        x = torch.flatten(x, 1)
+        x = self.fc(x)
+        return x
+```
 
-Number of 'Features' after the Flatten Layer
+> The `nn.LazyLinear` will infer the required `in_features` from the input tensor size during the first forward pass.
 
-For our homework, `model.summary()` for the last MaxPooling2d and Flatten layers looked like this:
+### 4) Reason about shapes with a concrete example
+Consider a CNN where:
+- Input: 3 channels, 150x150 image
+- After Conv2d(in_channels=3, out_channels=16, kernel_size=(2,2), stride=2, padding=2): output shape depends on padding/stride
+- After ReLU and MaxPool2d((3,3)): final feature map size could be e.g. [N, 16, 25, 25]
+- Flatten yields 16 * 25 * 25 = 10000 features
 
-```
-_________________________________________________________________
-Layer (type)                Output Shape              Param #
-=================================================================
-max_pooling2d_3       (None, 7, 7, 128)         0
-flatten (Flatten)           (None, 6272)              0
+```python
+# Example confirmation
+# Suppose after the CNN layers you get a flattened vector of length 10000
+num_features = 10000
+self.fc = nn.Linear(num_features, num_classes)
 ```
 
-So where do 6272 vectors come from? It’s computed like this:
+In this scenario, the next linear layer should be defined with `in_features = 10000`.
 
+### Practical example with your CNN from the prompt
+Given:
+- Final pooled feature map size: [N, 16, 25, 25]
+- Flatten to [N, 16*25*25] = [N, 10000]
+
+Therefore, the first linear layer should be defined as:
 ```python
-7*7*128
+self.fc1 = nn.Linear(10000, num_classes)
 ```
 
-This results in 6272:
-
-- 7x7 "image shape" after several convolutions and poolings
-- 128 filters
+Note: If your input size or layer parameters differ, recompute the Flatten output accordingly using any of the methods above. The key concept is: in_features equals the total number of elements in the flattened feature tensor per sample (i.e., batch-independent dimension).
 
+### Summary
+- The direct geometric/product approach is accurate but can be error-prone for complex architectures.
+- Use `torchinfo.summary` or a forward pass with a dummy input to read off or compute the flattened feature size.
+- Alternatively, use `nn.LazyLinear` to infer `in_features` automatically.
+- Always ensure your chosen `in_features` matches the actual tensor size just before the first `nn.Linear` to avoid shape mismatches.