Long Short-Term Memory networks (LSTMs)

One of the most innovative works in the NLP space is LSTMs, which can remember information from the way past and also selectively forget stuff that is not required. There are several architectures of LSTM units. A common architecture is composed of a cell (the memory part of the LSTM unit) and three "regulators", usually called gates, of the flow of information inside the LSTM unit: an input gate, an output gate and a forget gate. Some variations of the LSTM unit do not have one or more of these gates or maybe have other gates (for instance, GRUs do not have an output gate). The Forget gate decides what is relevant to keep from prior steps. The input gate decides what information is relevant to add from the current step. The output gate determines what the next hidden state should be

Resources

References