Embeddings and text representations

Embeddings are dense, low-dimensional vector representations of data, such as words, images, or entities, that capture their semantic or contextual meaning in a continuous space. They are used to convert high-dimensional or categorical data into a format suitable for computational models, enabling efficient similarity comparisons and feature extraction.

Resources

Bag of words

TF–IDF

Word embeddings

Transformer and LLM-based embeddings

Code

Courses

References