Hello, i insert your glore_unit into the resnet101.
However, i find that through the BatchNorm2d operation, it always output nan-value tensor.
So i check the initialization of BatchNorm2d, and do two settings:
- weight: 1.0 and bias: 0
- weight: 0.0 and bias: 0
However, they both do not work. So do you have any suggestions? Thanks very much~
------------------------------- Append ------------------------------------
i find that in glore_unit, there is no BN layer between conv. layer.
i also find this will lead to value explosion in exp. after matrix multiply.
so this strategy has special purpose?i think this contains potential risks when train network.