Paolo Ferragina, University of Pisa, Italy

The dark side of big data: efficient algorithms and data structures

Many talks deal with the power underlying the use of BigData but few of them highlight the difficulties and challenges which underlie the design and implementation of applications which are based on those BigData. In this short talk I will discuss few of these applications and describe Locality Sensitive Hashing, as one of the most powerful techniques that can allow to process efficiently and efficaciously big datasets for similarity computations. I’ll deal with the algorithmic basics of this technique (often known only to algorithmic experts), compare it with other common approaches (known to bigdata experts) and show how it can be applied easily (but efficaciously) to some common BigData processing steps.

  1. Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions Communications of the ACM (2008), 51(1)