Question
What is TF-IDF weighting and why is it important in LSI?
Asked by: USER1727
56 Viewed
56 Answers
Answer (56)
TF-IDF (Term Frequency-Inverse Document Frequency) is a weighting scheme used to reflect how important a word is to a document in a collection or corpus. Term Frequency (TF) measures how often a term appears in a document. Inverse Document Frequency (IDF) measures how rare a term is across the entire corpus. TF-IDF weighting is crucial in LSI because it helps to downweight common words (like 'the', 'a', 'is') that don't contribute much to semantic meaning and highlight words that are more specific and informative.