Convolutional Neural Networks (CNNs)

A convolutional neural network (CNN, or ConvNet) is a class of deep neural networks, most commonly applied to analyzing visual imagery. They are also known as shift invariant or space invariant artificial neural networks, based on their shared-weights architecture and translation invariance characteristics

Resources

Best deep CNN architectures and their principles: from AlexNet to EfficientNet

Convolutions

1x1 convolutions

1x1 convolutions: https://d2l.ai/chapter_convolutional-neural-networks/channels.html#times-1-convolutional-layer
1x1 convolutions
Networks in Networks and 1x1 Convolutions
One by One [ 1 x 1 ] Convolution - counter-intuitively useful – Aaditya Prakash (Adi) – Machine Learning
1x1 Convolutions: Demystified by Anwesh Marwade | Towards Data Science
1X1 Convolution, CNN, CV, Neural Networks | Analytics Vidhya
A Gentle Introduction to 1x1 Convolutions to Manage Model Complexity - MachineLearningMastery.com
- A convolutional layer with a 1×1 filter is used at any point in a CNN to control the number of feature maps. It's often referred to as a projection operation or projection layer, or even a feature map or channel pooling layer

Human pose estimation and activity recognition

Code

#CODE Modern Convolutional Neural Network Architectures
- AlexNet, VGG, GoogleNet, ResNet, ResNeXT, Xception, MobileNet, EfficientNet, RegNet, ConvMixer, ConvNEXT
#CODE Keras Layers (for TensorFlow 2.x)
#CODE Model Zoo - Discover open source deep learning code and pretrained models
#CODE https://github.com/microsoft/computervision-recipes

Channel/visual attention

#CODE Visual-attention-tf
- Pixel Attention
- Channel Attention (CBAM)
- Efficient Channel Attention
#CODE Convolution Variants
- Attention Augmented (AA) Convolution Layer
- Mixed Depthwise Convolution Layer
- Drop Block
- Efficient Channel Attention (ECA) Layer
- Convolutional Block Attention Module (CBAM) Layer

References

#PAPER ConvNext: A ConvNet for the 2020s (Liu 2022)
- #CODE https://github.com/facebookresearch/ConvNeXt
- #CODE https://github.com/bamps53/convnext-tf/
- #CODE https://github.com/sayakpaul/ConvNeXt-TF
- Paper explained:
- https://twitter.com/papers_daily/status/1481937771732566021
- ConvNeXt essentially takes a ResNet and gradually "modernizes" it to discover components that contribute to performance gains. ConvNeXt applies several tricks like larger kernels, layer norm, fewer activation functions, separate downsampling layers to name a few.
- These results show that hybrid models are promising and that different components can still be optimized further and composed more effectively to improve the overall model on a wide range of vision tasks.

#PAPER Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs (Ding 2022)
- #CODE https://paperswithcode.com/paper/scaling-up-your-kernels-to-31x31-revisiting
#PAPER More ConvNets in the 2020s: Scaling up Kernels Beyond 51x51 using Sparsity (Liu 2022)
- #CODE https://github.com/vita-group/slak
- explore the possibility of training extreme convolutions larger than 31×31 and test whether the performance gap can be eliminated by strategically enlarging convolutions
- propose Sparse Large Kernel Network (SLaK), a pure CNN architecture equipped with 51×51 kernels that can perform on par with or better than state-of-the-art hierarchical Transformers and modern ConvNet architectures like ConvNeXt and RepLKNet, on ImageNet classification as well as typical downstream tasks

Sequence (time series) modelling

#PAPER An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling (Bai 2018)
- Temporal convolutional networks (TCN)
- #CODE https://github.com/philipperemy/keras-tcn
- Implementing Temporal Convolutional Networks
  - The most important component of TCNs is dilated causal convolution. “Causal” simply means a filter at time step t can only see inputs that are no later than t. The point of using dilated convolution is to achieve larger receptive field with fewer parameters and fewer layers.
  - A residual block stacks two dilated causal convolution layers together, and the results from the final convolution are added back to the inputs to obtain the outputs of the block.
- Temporal convolutional networks for sequence modeling
#PAPER Convolutions Are All You Need (For Classifying Character Sequences) (Wood-doughty 2018)
#PAPER InceptionTime: Finding AlexNet for time series classification (Fawaz 2021)
- #CODE https://github.com/hfawaz/InceptionTime
- https://arxiv.org/abs/1909.04939

Edge detection

Human pose estimation and activity recognition

Motion detection, tracking

#PAPER FlowNet: Learning Optical Flow with Convolutional Networks (Fischer 2015)

Deconvolution

#PAPER Deep Convolutional Neural Network for Image Deconvolution (Xu 2014)

Visual/Channel attention and Saliency

See AI/XAI#Explainability methods for Neural Networks

#PAPER Squeeze-and-Excitation Networks, SENets (Hu 2017)
- Features can incorporate global context
- Since SENet only revolves around providing channel attention by using dedicated global feature descriptors, which in this case is Global Average Pooling (GAP), there is a loss of information and the attention provided is point-wise. This means that all pixels are mapped in the spatial domain of a feature map uniformly, and thus not discriminating between important or class-deterministic pixels versus those which are part of the background or not containing useful information.
- Thus, the importance/need for spatial attention is justified to be coupled with channel attention. One of the prime examples of the same is CBAM (published at ECCV 2018)
- #CODE https://github.com/hujie-frank/SENet
- #CODE https://github.com/yoheikikuta/senet-keras
- https://blog.paperspace.com/channel-attention-squeeze-and-excitation-networks/
- https://programmerclick.com/article/4934219785/
- https://pyimagesearch.com/2022/05/30/attending-to-channels-using-keras-and-tensorflow/
#PAPER CBAM: Convolutional Block Attention Module (Woo 2018)
#PAPER ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks (Wang 2020)
- this paper proposes an Efficient Channel Attention (ECA) module, which only involves a handful of parameters while bringing clear performance gain
- proposed a local cross-channel interaction strategy without dimensionality reduction, which can be efficiently implemented via 1D convolution
#PAPER See
{ #srwithpixelattention}
in Super-resolution
#PAPER Attention Mechanisms in Computer Vision: A Survey (Guo 2021)
- https://github.com/MenghaoGuo/Awesome-Vision-Attentions
#PAPER Visual Attention Network (Guo 2022)
- #CODE https://paperswithcode.com/paper/visual-attention-network?from=n26
- This work presents an approach that decomposes a large kernel convolution operation to capture long-range relationship. After obtaining long-range relationship, it estimates the importance of a point and generates attention map
#PAPER Attention Map-Guided Visual Explanations for Deep Neural Networks (An 2022)
- attention-map-guided visual explanations for deep neural networks, employing an attention mechanism to find the most important region of an input image
- The Grad-CAM method is used to extract the feature map for deep neural networks, and then the attention mechanism is used to extract the high-level attention maps
- Inspired in CBAM technique

Spherical CNNs

See AI/Deep learning/Spherical CNNs

Resources

Convolutions

1x1 convolutions

Human pose estimation and activity recognition

Code

Channel/visual attention

References

Sequence (time series) modelling

Object classification, image recognition

Semantic segmentation

Object detection

Video segmentation and prediction

Image and video captioning

Image-to-image translation

Super-resolution

Inpainting

Background subtraction, foreground detection

Edge detection

Human pose estimation and activity recognition

Motion detection, tracking

Deconvolution

Visual/Channel attention and Saliency

Spherical CNNs