When doing structured pruning, sometimes we need to apply the same mask before or after different modules if they have the same input or output space. Say, if we are trying to cut off some dimensions of the model, we would apply masks before all modules that take a vector in $\mathbb{R}^{\mathrm{dim\_model}}$ as input, and after all modules that have a vector in $\mathbb{R}^{\mathrm{dim\_model}}$ as output.
However, we cannot directly use code like if(module.dim_in==dim_model) because there may be other parameters being equal to dim_model accidentally, and some linear layers take one of the parameters as dim_in. So, in the current approach, I can only come up with the solution of specifying all the modules that take dim_model as the dimension of its input or output space for each model, which makes the code long, hard to maintain, and hard to be extended for future models.
So I suggest a solution that, for all modules, their initialization parameters which describe the dimension of "linear spaces used by itself or its submodules" can be either type int, or a newly created type SpaceDim (maybe there's a better name). If one of the init parameters describing a dimension in a module is of type int, the module generates a SpaceDim instance when initializing itself, and passes the instance to its submodules as init parameters if necessary. Thus, the parameter becomes traceable: we can try to write if(module.dim_in==model.dim_model), because now all linear spaces taking a vector in $\mathbb{R}^{\mathrm{dim\_model}}$ as input will have its dim_in being same instance as model.dim_model.
In detail, as far as I can see, the modified code could be like the following:
- Newly created class:
- For
__init__ of most of the modules:
def __init__(self,
dim_xx,
...
):
if isinstance(dim_xx, int):
self.dim_xx = SpaceDim(dim_xx)
else:
self.dim_xx = dim_xx
...
self.sub = SubModuleClass(self.dim_xx, ...)
...
- For modules with tensors as their parameters directly (let's take
Linear as an example):
def __init__(self,
dim_in,
dim_out,
...
):
if isinstance(dim_in, int):
self.dim_in = SpaceDim(dim_in)
else:
self.dim_in = dim_in
...
self.weight = ...(torch.empty((self.dim_out.num, self.dim_in.num), ...), ...)
...
When doing structured pruning, sometimes we need to apply the same mask before or after different modules if they have the same input or output space. Say, if we are trying to cut off some dimensions of the model, we would apply masks before all modules that take a vector in$\mathbb{R}^{\mathrm{dim\_model}}$ as input, and after all modules that have a vector in $\mathbb{R}^{\mathrm{dim\_model}}$ as output.
However, we cannot directly use code like
if(module.dim_in==dim_model)because there may be other parameters being equal todim_modelaccidentally, and some linear layers take one of the parameters asdim_in. So, in the current approach, I can only come up with the solution of specifying all the modules that takedim_modelas the dimension of its input or output space for each model, which makes the code long, hard to maintain, and hard to be extended for future models.So I suggest a solution that, for all modules, their initialization parameters which describe the dimension of "linear spaces used by itself or its submodules" can be either type$\mathbb{R}^{\mathrm{dim\_model}}$ as input will have its
int, or a newly created typeSpaceDim(maybe there's a better name). If one of the init parameters describing a dimension in a module is of typeint, the module generates aSpaceDiminstance when initializing itself, and passes the instance to its submodules as init parameters if necessary. Thus, the parameter becomes traceable: we can try to writeif(module.dim_in==model.dim_model), because now all linear spaces taking a vector indim_inbeing same instance asmodel.dim_model.In detail, as far as I can see, the modified code could be like the following:
__init__of most of the modules:Linearas an example):