Home » Miscellanea » Information Retrieval

# Information Retrieval

In this section, I will explore some fundamental topics about Information Retrieval (IR) and its related subjects.

I will use the following notation throughout all this section:
$D = \{d_1,\ldots,d_m\}$ is a collection of (text) documents;
$V = \{t_1,\ldots,t_n\}$ is the dictionary of unique terms as extracted from $D$ (a.k.a. vocabulary or lexicon);
$q$ is a query used to ask the retrieval system for those documents in $D$ that are relevant to $q$. (Note that here we are not making any assumption on what does it really mean for a document to be relevant to a query);
$D_q\subseteq D$ is the collection of results (i.e., documents) returned by the IR system, which supposes are relevant to the query $q$. This may be either an unranked set (i.e., where there’s no order relation between any pair of retrieved documents) or a ranked list (i.e., retrieved documents are scored and sorted according to their relevance to the query).