In this section, I will explore some fundamental topics about Information Retrieval (IR) and its related subjects.
I will use the following notation throughout all this section:
– is a collection of (text) documents;
– is the dictionary of unique terms as extracted from (a.k.a. vocabulary or lexicon);
– is a query used to ask the retrieval system for those documents in that are relevant to . (Note that here we are not making any assumption on what does it really mean for a document to be relevant to a query);
– is the collection of results (i.e., documents) returned by the IR system, which supposes are relevant to the query . This may be either an unranked set (i.e., where there’s no order relation between any pair of retrieved documents) or a ranked list (i.e., retrieved documents are scored and sorted according to their relevance to the query).