Similarity Search in Big Data
The complexity of next-generation retrieval systems originates from the requirement to organize massive and ever growing volumes of heterogeneous data and meta-data, together with the need to provide distributed management prevalently based on similarity matching. The problem starts with data acquisition of weakly structured or completely unstructured data, such as images and video, which necessarily need innovative techniques for information extraction and classification to increase their findability. In principle, we consider the object findability and the actual search process as two fundamental and synergic aspects of the retrieval. Both of them pose effectiveness and efficiency challenges which need innovative theories and technologies, and must be studied together to converge to qualitatively new retrieval tools of the future. The dissertation topic is foundational in nature as it addresses the theoretical limits of similarity retrieval in context of the Big Data problem. Fundamental to the thesis is the development of scalable solutions.