Deep CV
DL is used in the domain of digital image processing to solve difficult problems (e.g.image colorization, classification, segmentation and detection). DL methods such as CNNs mostly improve prediction performance using big data and plentiful computing resources and have pushed the boundaries of what was possible. Problems which were assumed to be unsolvable are now solved with super-human accuracy (eg image classification). Since being reignited by Krizhevsky, Sutskever and Hinton in 2012, DL has dominated the domain ever since due to a substantially better performance compared to traditional methods.
See:
Resources
- https://github.com/kjw0612/awesome-deep-vision
- https://github.com/timzhang642/3D-Machine-Learning
- https://medium.com/@taposhdr/medical-image-analysis-with-deep-learning-i-23d518abf531
- http://danielnouri.org/notes/2014/12/17/using-convolutional-neural-nets-to-detect-facial-keypoints-tutorial/
Applications
See:
- AI/Computer Vision/Background subtraction
- AI/Computer Vision/Image and video captioning
- AI/Computer Vision/Image-to-image translation
- AI/Computer Vision/Inpainting and restoration
- AI/Computer Vision/Object classification, image recognition
- AI/Computer Vision/Object detection
- AI/Computer Vision/Semantic segmentation
- AI/Computer Vision/Super-resolution
- AI/Computer Vision/Video Frame Interpolation
- AI/Computer Vision/Video segmentation and prediction
Code
- #CODE ChainerCV: a Library for Computer Vision in Deep Learning
- http://chainercv.readthedocs.io/en/stable/ - #CODE Vision - The torchvision package consists of popular datasets, model architectures, and common image transformations fo CV
- #CODE Scenic - A Jax Library for Computer Vision Research and Beyond
- https://www.marktechpost.com/2021/10/30/google-research-introduces-scenic-an-open-source-jax-library-for-computer-vision-research/
- codebase with a focus on research around attention-based models for computer vision
- #PAPER SCENIC: A JAX Library for Computer Vision Research and Beyond (2021)
- #CODE Pytorch-image-models
- PyTorch image models, scripts, pretrained weights -- ResNet, ResNeXT, EfficientNet, EfficientNetV2, NFNet, Vision Transformer, MixNet, MobileNet-V3/V2, RegNet, DPN, CSPNet, and more
- https://rwightman.github.io/pytorch-image-models/
- #CODE Imgaug. Image augmentation for machine learning experiments
- #CODE Openface. Free and open source face recognition with deep neural networks
References
- #PAPER #REVIEW Deep Learning for Computer Vision: A Brief Review (Voulodimos 2017)
- #PAPER Deep Learning vs. Traditional Computer Vision (O'Mahony 2019)
- #PAPER Deep learning encodes robust discriminative neuroimaging representations to outperform standard machine learning (Abrol 2021)
- #PAPER #REVIEW Deep learning-enabled medical computer vision (Esteva 2021)
- #PAPER Involution: Inverting the Inherence of Convolution for Visual Recognition, a brand new neural operator (Li 2021)
- #CODE https://github.com/d-li14/involution
- #CODE https://github.com/PrivateMaRyan/keras-involution2Ds
- Paper explained
- https://keras.io/examples/vision/involution/
- Involution: Inverting the Inherence of Convolution for Visual Recognition
- involution is a general-purpose neural primitive that is versatile for a spectrum of deep learning models on different vision tasks
- involution bridges convolution and self-attention in design, while being more efficient and effective than convolution, simpler than self-attention in form
- the proposed involution operator could be leveraged as fundamental bricks to build the new generation of neural networks for visual recognition, powering different deep learning models on several prevalent benchmarks
- #PAPER Unifying Nonlocal Blocks for Neural Networks (Zhu 2021)
- #PAPER X-volution: On the unification of convolution and self-attention (Chen 2021)
- #PAPER Bivolution: A Static and Dynamic Coupled Filter (Hu 2022)
- #PAPER Convolution of Convolution: Let Kernels Spatially Collaborate (Zhao 2022)