Machine Learning (ML)
The use and development of computer systems that are able to learn and adapt without following explicit instructions, by using algorithms and statistical models to analyse and draw inferences from patterns in data
Resources
- https://github.com/josephmisiti/awesome-machine-learning
- The Illustrated Machine Learning website
- Rules of ML (Google)
- Jason's Machine Learning 101 (Google)
- Machine Learning Glossary (Google)
- ML Resources (MIT student)
- Machine Learning & Deep Learning Tutorials
- A visual introduction to machine learning
- ML Algorithms: Strengths and Weaknesses
- A friendly introduction to linear algebra for ML (ML Tech Talks)
- Best practices for ML engineering (Google)
- Training Machine Learning Models More Efficiently with Dataset Distillation
- Codelabs - AI & ML
Cheatsheets and notes
- https://github.com/afshinea/stanford-cs-229-machine-learning/blob/master/en/super-cheatsheet-machine-learning.pdf
- https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-machine-learning-tips-and-tricks
- ML-AI guide
Naive/homemade implementations
- https://github.com/trekhleb/homemade-machine-learning
- https://github.com/anhquan0412/basic_model_scratch
- https://github.com/rushter/MLAlgorithms
- https://github.com/ahmedbesbes/Neural-Network-from-scratch
- https://github.com/eriklindernoren/ML-From-Scratch
Open datasets (for ML, DL and DS)
See AI/Data Engineering/Open ML data
Books
- #BOOK An Introduction to Statistical Learning (James 2013, SPRINGER)
- #BOOK The elements of statistical learning (Hastie 2015, SPRINGER)
- #BOOK Mathematics for ML (Deisenroth, 2020 CAMBRIDGE)
- #BOOK Introduction to Machine Learning with Python - A Guide for Data Scientists (Muller, 2016 O'REILLY)
- https://github.com/amueller/introduction_to_ml_with_python - #BOOK Machine Learning for Dummies (Hurwitz, 2018 WILEY-IBM)
- #BOOK Python Machine Learning (Raschka 2019, PACKT)
- #BOOK Mastering Machine Learning with scikit-learn (Hackeling 2014, PACKT)
- #BOOK Designing Machine Learning Systems with Python (Julian 2016, PACKT)
- #BOOK Evaluating Machine Learning Models (Zheng 2015, OREILLY)
- #BOOK Introduction to Machine Learning Interviews Book
Courses
- #COURSE Machine Learning (CS229, Stanford)
- #COURSE Machine Learning (Coursera-Stanford)
- #COURSE Machine Learning Crash Course with TensorFlow APIs (Google)
- #COURSE Data Mining and Machine Learning (STAT 365/665, Yale)
- #COURSE Applied machine learning (U Columbia)
- #COURSE L'apprentissage face à la malédiction de la grande dimension (College de France)
- #COURSE The Machine Learning Summer School, MLSS Tubingen 2020 (virtual)
Code
- See AI/Data Engineering/ML Ops#Code and AI/Data Engineering/Cloud platforms
- #CODE Benchmarks of ML libraries
- #CODE Ludwig - declarative machine learning framework
- #CODE Scikit-learn
- http://scikit-learn.org/stable/
- Contrib packages
- #TALK PyData tutorial by Sebastian Raschka
- #CODE Lightning
- Large-scale linear classification, AI/Supervised Learning/Regression and ranking (AI/Learning to rank] in Python
- #CODE MAPIE
- #PAPER MAPIE: an open-source library for distribution-free uncertainty quantification
- A scikit-learn-compatible module for estimating prediction intervals for single-output regression or multi-class classification settings
- #CODE scikit-learn-intelex
- Intel(R) Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application
- https://intel.github.io/scikit-learn-intelex/
- #CODE mlinsights
- #CODE PyCaret
- https://pycaret.org/
- PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows
- PyCaret is essentially a Python wrapper around several machine learning libraries and frameworks such as scikit-learn, XGBoost, LightGBM, CatBoost, spaCy, Optuna, Hyperopt, Ray, and many more
- PyCaret >= 2.2 provides the option to use GPU for select model training and hyperparameter tuning
- #CODE Hypertools - Python toolbox for visualizing and manipulating high-dimensional data
- #CODE PySAL - Python Spatial Analysis Library Meta-Package
- #CODE MLxtend - Library of extension and helper modules for Python's data analysis and machine learning libraries
- #CODE H2O
- #CODE Dlib (C++ with python interface)
- #CODE Shogun
- #CODE Vowpal Wabbit
- ML system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning
- http://hunch.net/~vw/
- https://github.com/JohnLangford/vowpal_wabbit/wiki
- #CODE DMTK (Microsoft)
- #CODE [RAPIDS](https://github.com/rapidsai, https://rapids.ai/) - GPU data science
- #CODE ArrayFire - High performance library for parallel computing with an easy-to-use API
- It enables users to write scientific computing code that is portable across CUDA, OpenCL and CPU devices. This project provides Python bindings for the ArrayFire library.
- https://arrayfire.com/
- #CODE ThunderSVM - A Fast SVM Library on GPUs and CPUs
- #CODE PyGAM - Generalized Additive Models in Python
- #CODE Facets
- visualizations for understanding and analyzing machine learning datasets: Facets Overview and Facets Dive. The visualizations are implemented as Polymer web components, backed by Typescript code and can be easily embedded into Jupyter notebooks or webpages
- https://pair-code.github.io/facets/
- #CODE PyCM
- PyCM is a multi-class confusion matrix library written in Python that supports both input data vectors and direct matrix, and a proper tool for post-classification model evaluation that supports most classes and overall statistics parameters
- http://www.pycm.ir/
- #CODE Pycircular - Python module for circular data analysis
References
- See AI/AI for scientific discovery
- #PAPER Machine Learning in Python: Main developments and technology trends in data science, machine learning, and artificial intelligence (Raschka 2020)
- #PAPER How to avoid machine learning pitfalls: a guide for academic researchers (Lones 2021)
- #PAPER Pen and Paper Exercises in Machine Learning (Gutmann 2022)
- #PAPER How to avoid machine learning pitfalls: a guide for academic researchers (Lones 2023)
- #PAPER Questionable practices in machine learning (2024)
- The paper categorizes the 43 questionable research practices (QRPs) in machine learning into several broad areas, such as:
- Data Handling: Issues like cherry-picking data, inappropriate data splits, and using test data in training.
- Model Evaluation: Inadequate baselines, selective reporting of results, and misuse of metrics.
- Experimental Design: Running multiple experiments and only reporting successful ones.
- Reproducibility: Lack of transparency in code and data sharing.
- Publication Practices: Hype-driven narratives and insufficient detail in methods sections.
- The paper categorizes the 43 questionable research practices (QRPs) in machine learning into several broad areas, such as:
Subtopics
Feature selection
See AI/Supervised Learning/Feature selection
Feature learning
Anomaly and Outlier Detection
See AI/Anomaly and Outlier Detection
Time Series analysis and forecasting
See AI/Time Series/Time Series analysis and AI/Time Series/Forecasting
AutoML
See AI/AutoML
Deep Learning
Reinforcement learning
Unsupervised learning
See AI/Unsupervised Learning/Unsupervised learning
Supervised learning
See AI/Supervised Learning/Supervised learning
Weakly-supervised learning
See AI/Weakly-supervised learning. It includes these topics: AI/Semi-supervised learning, AI/Active learning and AI/Transfer learning
One, few-shot learning
Self-supervised learning
See AI/Self-supervised learning
Learning to rank and ordinal regression
Multi task learning
Generative modelling
Explainable AI
See AI/XAI
Federated learning
Quantum ML
See AI/QML