Deep Learning (DL)
Deep learning (DL), also known as deep structured learning, is part of a broader family of AI/ML methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised. DL uses huge neural networks with many layers of processing units, taking advantage of advances in computing power and improved training techniques to learn complex patterns in large amounts of data
Resources
- https://github.com/ChristosChristofidis/awesome-deep-learning
- https://github.com/endymecy/awesome-deeplearning-resources
- https://en.wikipedia.org/wiki/Deep_learning
- Deep Learning Curriculum
- https://jalammar.github.io/visual-interactive-guide-basics-neural-networks/
- A Quick Introduction to Neural Networks
- Deep Neural Nets: 33 years ago and 33 years from now (Andrej Karpathy)
- Deep learning's diminish returns (Thompson)
- Deep Learning Is Hitting a Wall
- A Brief History of Neural Nets and Deep Learning (2020)
- Time Benchmark of models
- A Recipe for Training Neural Networks
- Computer Scientists Prove Why Bigger Neural Networks Do Better
- No, We Don't Have to Choose Batch Sizes As Powers Of 2
DL news aggregators
Cheatsheets
When to use and not to use deep learning
- When and When Not to Use Deep Learning
- You can probably use deep learning even if your data isn't that big
- When not to use deep learning
- Using ANNs on small data â Deep Learning vs. Xgboost
- The limitations of deep learning
Books
-
#BOOK Deep Learning Interviews: Hundreds of fully solved job interview questions from a wide range of key topics in AI (Kashani 2022)
-
#BOOK The Principles of DL Theory: An Effective Theory Approach to Understanding Neural Networks (Roberts 2022)
-
#BOOK Deep Learning Book (Goodfellow, 2016 MIT)
- The Deep Learning textbook is a resource intended to help students and practitioners enter the field of machine learning in general and deep learning in particular
-
#BOOK Dive into Deep Learning (Zhang)
- An interactive deep learning book for students, engineers, and researchers. Uses MXNet/Gluon, Pytorch and Tensorflow
- Jupyter notebooks for each section
Talks
- #TALK The Future of Sparsity in Deep Learning (Trevor Gale, Phd student Stanford, 2021)
- #TALK Deep Learning (Yoshua Bengio, MLSS 2020):
- #TALK Deep Learning Hardware: Past, Present, and Future (Yann LeCun, ISSCC 2019)
- #TALK Deep Learning and the Future of Artificial Intelligence (Yann LeCun, 2018)
- #TALK AI Breakthroughs & Obstacles to Progress, Mathematical and Otherwise (Yann LeCun, 2018)
- #TALK François Chollet at France is AI 2017: Deep Learning: current limits and future perspectives (Chollet 2017)
- #TALK Power & Limits of Deep Learning (Yann Lecun, 2017)
- #TALK The Deep End of Deep Learning (Hugo Larochelle, TEDxBoston 2016)
- #TALK How deep neural networks work (Brandon Rohrer)
- Simple explanations of DL basics and nice graphics
Courses
- #COURSE nn-zero-to-hero (Karpathy)
- #COURSE Introduction to Deep Learning (COMP0090, UCL)
- #COURSE Full Stack Deep Learning
- #COURSE Deep Learning (NYU)
- #COURSE Deep Learning (CS230, Stanford)
- #COURSE Tensorflow for Deep Learning Research (CS20SI, Stanford)
- #COURSE DeepMind x UCL | Deep Learning Lecture Series 2020
- #COURSE Introduction to Deep Learning (6.S191, MIT)
- #COURSE MIT Deep Learning and Artificial Intelligence Lectures
- #COURSE Introduction to Deep Learning (MIT 6.S191)
- #COURSE Intro to Neural Networks and Machine Learning (CSC 321, UToronto)
- #COURSE Deep Learning nanodegree (Udacity)
- #COURSE Deep Learning with PyTorch: Zero to GANs (Jovian)
- #COURSE Fast AI - Practical Deep Learning For Coders
- Deep Learning for Coders with fastai and PyTorch: AI Applications Without a PhD - the book and the course
- https://github.com/fastai/fastbook
- #COURSE CS 152: Neural Networks (Harvey Mudd college)
- #COURSE Deep Learning course (U Paris-Saclay)
- #COURSE Introduction to Machine Learning and Neural Networks (Uniandes)
- #COURSE Deep learning specialization (deeplearning.ai, Coursera, Andrew Ng)
- #COURSE Neural Networks (U Sherbrooke)
- #COURSE The Neural Aesthetic (ITP-NYU)
Code
State of ML frameworks:
-
TensorFlow, PyTorch, and JAX: Choosing a deep learning framework
-
#CODE Ivy - Convert Machine Learning Code Between Frameworks
-
#CODE Huggingface - Build, train and deploy state of the art models powered by the reference open source in ML
-
#CODE Openvino - open-source toolkit for optimizing and deploying AI inference
-
#CODE Triton - language and compiler for writing highly efficient custom Deep-Learning primitives
- https://openai.com/blog/triton/
- https://www.infoq.com/news/2021/08/openAI-triton/
- Triton uses Python as its base. The developer writes code in Python using Tritonâs libraries, which are then JIT-compiled to run on the GPU. This allows integration with the rest of the Python ecosystem, currently the biggest destination for developing machine-learning solutions
-
#CODE Oneflow - OneFlow is a performance-centered and open-source deep learning framework
-
#CODE Paddle (Baidu) - PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice
-
#CODE Chainer - Chainer is a Python-based deep learning framework aiming at flexibility
-
#CODE PySyft - PySyft is a Python library for secure and private Deep Learning
- PySyft decouples private data from model training, using Federated Learning, Differential Privacy, and Encrypted Computation (like Multi-Party Computation (MPC) and Homomorphic Encryption (HE)) within the main Deep Learning frameworks like PyTorch and TensorFlow.
- #PAPER A generic framework for privacy preserving deep learning
References
- #PAPER Deep learning in NNs: An overview (Schmidhuber 2015)
- #PAPER Deep learning (LeCun 2015)
- #PAPER Deep Neural Decision Forests (Kontschieder 2016)
- #PAPER On the Origin of Deep Learning (Wang 2017)
- #PAPER Representation Learning on Large and Small Data (Chou 2017)
- #PAPER Deep Learning in Neural Networks: An Overview (Schmidhuber, 2018)
- #PAPER Deep Learning as a Mixed Convex-Combinatorial Optimization Problem (Friesen 2018)
- #PAPER Using Deep Neural Networks for Inverse Problems in Imaging: Beyond Analytical Methods (Lucas, 2018)
- #PAPER Neural Tangent Kernel: Convergence and Generalization in Neural Networks (Jacot 2018)
- #PAPER Neural circuit policies enabling auditable autonomy (Lechner 2020)
- #PAPER Implicitly Defined Layers in Neural Networks (Zhang 2020)
- #PAPER A Mathematical Principle of Deep Learning: Learn the Geodesic Curve in the Wasserstein Space (Gai 2021)
- #PAPER Why is AI hard and Physics simple? (Roberts 2021)
- #PAPER Deep Learning for AI (By Yoshua Bengio, Yann Lecun, Geoffrey Hinton, Turing lecture, 2021)
- #PAPER Self-Tuning for Data-Efficient Deep Learning (Wang 2021)
- #PAPER Neural circuit policies enabling auditable autonomy (Lechner 2021)
- #PAPER Controlling Neural Networks with Rule Representations (Seo 2021)
- #PAPER Deep physical neural networks trained with backpropagation (Wrigth 2022)
- #PAPER Ensemble deep learning: A review (Ganaie 2022)
- #PAPER projUNN: efficient method for training deep networks with unitary matrices (Kiani 2022)
- #PAPER LilNetX: Lightweight Networks with EXtreme Model Compression and Structured Sparsification (Girish 2022)
Generalization
-
http://www.inference.vc/everything-that-works-works-because-its-bayesian-2/
-
#PAPER Understanding deep learning requires re-thinking generalization (Zhang 2016)
-
#PAPER A Closer Look at Memorization in Deep Networks (Arpit 2017)
-
#PAPER Deep nets donât learn via memorization (Krueger 2017)
-
#PAPER Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior (Martin 2017)
-
#PAPER Ablation Studies in Artificial Neural Networks (Meyes 2019)
-
#PAPER Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning (Allen-Zhu 2020)
-
#PAPER The Deep Bootstrap Framework: Good Online Learners are Good Offline Generalizers (Nakkiran 2021)
-
#PAPER Predicting trends in the quality of state-of-the-art neural networks without access to training or testing data (Martin 2021)
-
#PAPER Stochastic Training is Not Necessary for Generalization (Geiping 2021)
-
#PAPER Underspecification Presents Challenges for Credibility in Modern Machine Learning (D'Amour 2021)
-
#PAPER Learning in High Dimension Always Amounts to Extrapolation (Balestriero 2021)
- In order for NNs to succeed at solving a task, they have to operate in the âextrapolationâ regime! But not all of them generalise as well as others. So this opens up new questions about the relationship between this specific notion of extrapolation and generalisation more generally.
-
#PAPER Incorporating Symmetry into Deep Dynamics Models for Improved Generalization (Wang 2021)
-
#PAPER Grokking - Generatlization beyond overfitting on small algorithmic datasets (Power 2022)
Regularization
- In general, techniques aimed at reducing overfitting and improve generalization
- Overfit and underfit
- Regularization techniques for training deep neural networks
- https://towardsdatascience.com/regularization-in-deep-learning-l1-l2-and-dropout-377e75acc036
- https://machinelearningmastery.com/how-to-reduce-overfitting-in-deep-learning-with-weight-regularization/
- https://medium.com/intelligentmachines/convolutional-neural-network-and-regularization-techniques-with-tensorflow-and-keras-5a09e6e65dc7
Data augmentation
See AI/Supervised Learning/Data augmentation
Dropout
-
https://pgaleone.eu/deep-learning/regularization/2017/01/10/anaysis-of-dropout/
-
12 Main Dropout Methods: Mathematical and Visual Explanation for DNNs, CNNs, and RNNs
-
#PAPER Dropout: A Simple Way to Prevent Neural Networks from Overfitting (Srivastava 2014)
-
#PAPER Efficient Object Localization Using Convolutional Networks (Tompson 2015)
- Proposed spatial dropout
-
#PAPER Analysis on the Dropout Effect in Convolutional Neural Networks (Park 2017)
-
#PAPER Effective and Efficient Dropout for Deep Convolutional Neural Networks (Cai 2020)
Stochastic depth
- #PAPER Deep Networks with Stochastic Depth (Huang 2016)
- Stochastic depth is a regularization technique that randomly drops a set of layers. During inference, the layers are kept as they are. It is very much similar to Dropout but only that it operates on a block of layers rather than individual nodes present inside a layer
Normalization
- Normalization techniques also improve generalization error, providing some regularization
- Normalization Techniques in Deep Neural Networks
- Different Types of Normalization in Tensorflow
- Normalization in Deep Learning
- https://sebastianraschka.com/faq/docs/scale-training-test.html
- Data normalization/standardization can be used as an alternative (before training) to synch batchnorm (multi-gpu training)
- Spectral normalization
- #PAPER #REVIEW Normalization Techniques in Training DNNs: Methodology, Analysis and Application (Huang 2020)
BatchNorm
- #PAPER Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift (Ioffe 2015)
- #TALK https://www.youtube.com/watch?v=ZOabsYbmBRM&feature=youtu.be
- http://stackoverflow.com/questions/34716454/where-do-i-call-the-batchnormalization-function-in-keras
- Slower convergence w/o BN, BN can be applied on top of standardization
- Synch BatchNorm appears in TF 2.2, for multi-gpu training
- #PAPER Rethinking the Usage of Batch Normalization and Dropout (Chen 2019)
Activations
- Fundamentals of Deep Learning â Activation Functions and When to Use Them?
- https://ml-cheatsheet.readthedocs.io/en/latest/activation_functions.html
- What are the advantages of ReLU over sigmoid function in deep neural networks?
- Two additional major benefits of ReLUs are sparsity and a reduced likelihood of vanishing gradient
- ReLU and Softmax Activation Functions
- The softmax function squashes the outputs of each unit to be between 0 and 1, just like a sigmoid function. But it also divides each output such that the total sum of the outputs is equal to 1
- The output of the softmax function is equivalent to a categorical probability distribution, it tells you the probability that any of the classes are true
- Sigmoid and SoftMax Functions in 5 minutes
- Sigmoid is used for binary classification methods where we only have 2 classes, while SoftMax applies to multiclass problems. In fact, the SoftMax function is an extension of the Sigmoid function
- #PAPER ImageNet Classification with Deep Convolutional Neural Networks (Krizhevsky 2012)
- #PAPER Universal activation function for machine learning (Yuen 2021)
- #PAPER #REVIEW Activation Functions in Deep Learning: A Comprehensive Survey and Benchmark (Dubey 2022)
Loss functions
- Cross entropy
- Perceptual loss, image reconstruction
- https://arxiv.org/pdf/1511.06409.pdf (Learning to Generate Images With Perceptual Similarity Metrics)
- #PAPER Loss Functions for Image Restoration with Neural Networks (Zhao 2018)
- https://medium.com/@sanari85/rediscovery-of-ssim-index-in-image-reconstruction-ssim-as-a-loss-function-a1ffef7d2be
- We use three different metric for comparing each different methods such as DSSIM, MSE, and MAE. Structural dissimilarity(DSSIM) is an image distance metric, that corresponds better to the human perception than MAE or RMSE. Mean Squared Error (MSE) measures the average of the squares of the errors that is, the average squared difference between the estimated values and the actual value. Mean Absolute Error (MAE) is the average distance between each pixel point. https://arxiv.org/abs/2001.05372
- Deep learning image enhancement insights on loss function engineering
- Mean squared logarithmic error
Optimizers and backpropagation
- How to use Learning Curves to Diagnose Machine Learning Model Performance
- https://www.quora.com/Intuitively-how-does-mini-batch-size-affect-the-performance-of-stochastic-gradient-descent
- Keras optimizers
- Adam
- An overview of gradient descent optimization algorithms (2016)
- https://hackernoon.com/some-state-of-the-art-optimizers-in-neural-networks-a3c2ba5a5643
- https://www.jeremyjordan.me/neural-networks-training/
- http://colah.github.io/posts/2015-08-Backprop/
- Back-propagation - Math Simplified
- https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/
- https://venturebeat.com/2020/12/16/at-neurips-2020-researchers-proposed-faster-more-efficient-alternatives-to-backpropagation/amp/
- #PAPER On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima (Shirish Keshkar 2017)
- #PAPER Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour (Goyal 2018)
- #PAPER Decoupled Weight Decay Regularization (Loshchilov 2018)
- #PAPER Deep Double Descent: Where Bigger Models and More Data Hurt (Nakkiran 2019)
- #PAPER Reconciling modern machine learning practice and the bias-variance trade-off (Belkin 2019)
- #PAPER Deep Double Descent: Where Bigger Models and More Data Hurt (Nakkiran 2020)
- #PAPER Descending through a Crowded Valley -- Benchmarking Deep Learning Optimizers (Schmidt 2020)
- #PAPER Early Stopping in Deep Networks: Double Descent and How to Eliminate it (Heckel 2020)
- contrary to model-wise double descent, epoch-wise double descent is not a phenomena tied o over-parameterization
- both under- and overparameterized models can have epoch-wise double descent
- #CODE https://github.com/MLI-lab/early_stopping_double_descent
Efficiency and performance
- #PAPER Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better (Menghani 2021)
Distributed DL
See AI/Data Engineering/Distributed DL
Attention
- See AI/Deep learning/Transformers#For NLP and /AI/Deep learning/CNNs#Visual/Channel attention and Saliency
- #COURSE Attention and Memory in Deep Learning (DeepMind x UCL | Deep Learning Lectures | 8/12)
Explainability methods for Neural Networks
See AI/Deep learning/Explainability methods for NNs
Applications
DL for multi-dimensional data
- See AI/Computer Vision/Video segmentation and prediction, AI/Deep learning/Encoder-decoder networks, AI/Deep learning/Transformers and AI/Generative AI/GenAI
- #PAPER Demystifying Deep Learning in Predictive Spatio-Temporal Analytics: An Information-Theoretic Framework (Tan 2020)
DL for tabular data
- An Introduction to Deep Learning for Tabular Data
- Applying Deep Learning on Tabular Data Using TensorFlow 2.0
- A short chronology of deep learning for tabular data (Sebastian Rschka)
- #CODE Pytorch tabular
- #PAPER Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data (Popov 2019)
- #PAPER TabNet: Attentive Interpretable Tabular Learning (Arik 2020)
- #PAPER Converting tabular data into images for deep learning with convolutional neural networks (Zhu 2021)
- #PAPER Tabular Data: Deep Learning is Not All You Need (Shwartz-Ziv 2021)
- #PAPER XBNet: An Extremely Boosted Neural Network (Sarkar 2021)
- #CODE XBNet - Boosted neural network for tabular data
- https://analyticsindiamag.com/guide-to-xbnet-an-extremely-boosted-neural-network/
- #PAPER Revisiting Deep Learning Models for Tabular Data (Gorishniy 2021)
- #PAPER TABBIE: Pretrained Representations of Tabular Data (Lida 2021)
DL for scientific discovery
See AI/AI for scientific discovery
Multimodal learning
See AI/Deep learning/Multimodal learning
DL for NLP, time series and sequence modelling
See AI/Time Series analysis, AI/Forecasting and "Deep learning approaches" in AI/NLP
Architectures and model families
- The neural network zoo
- Deep Learning Tips and Tricks cheatsheet
- A Visual and Interactive Guide to the Basics of NNs
- A Visual And Interactive Look at Basic Neural Network Math
- #CODE Model Zoo
- #CODE Deep Learning Models (Raschka)
Geometric DL
See AI/Deep learning/Geometric deep learning
MLPs
Deep belief network
See AI/Deep learning/Deep belief network
Autoencoders
See AI/Deep learning/Autoencoders
CNNs
RNNs
CapsNets
GANs
Diffusion models
See AI/Deep learning/Diffusion models
GNNs
Residual and dense neural networks
See AI/Deep learning/Residual and dense neural networks
Neural ODEs
See AI/Deep learning/Neural ODEs
Fourier Neural Operators
See AI/Deep learning/Fourier Neural Operators
Transformers
See AI/Deep learning/Transformers
GFlowNets
See AI/Deep learning/GFlowNets
Neural Cellular Automata
See AI/Deep learning/Neural Cellular Automata
Neural processes
See AI/Deep learning/Neural processes
Bayesian/probabilistic DL
See AI/Deep learning/Probabilistic deep learning
Implicit Neural Representations
See AI/Deep learning/Implicit Neural Representations