Bag-of-words representation of text: measure of document similarity

Returning to the bag-of-words example, we can use the notion of angle to measure how two different documents are close to each other.

Given two documents, and a pre-defined list of words appearing in the documents (the dictionary), we can compute the vectors of frequencies x, y of the words as they appear in the documents. The angle between the two vectors is a widely used measure of closeness (similarity) between documents.

See also:

License

Hyper-Textbook: Optimization Models and Applications Copyright © by L. El Ghaoui. All Rights Reserved.

Share This Book