Recent News

Education & Training

  • Ph.D. 2016

    Ph.D. in Information Engineering

    University of Padova

    Adivisor: Professor Gianfranco Bilardi

    Thesis title: "On Space Constrained Computations"

  • MS2012

    MS in Computer Engineering

    University of Padova

    Adivisor: Professor Gianfranco Bilardi

    Thesis title: "On the Space Complexity of Computational Directed Acyclic Graphs"

    Final grade: 110/110 cum laude

  • BS2009

    BS in Computer Engineering

    University of Padova

    Adivisor: Professor Giorgio Maria Di Nunzio

    Thesis title: "Study on the classification of documents retrieved by information retrieval systems using linear regression analysis" (in Italian)

    Final grade: 110/110 cum laude

Honors, Awards and Grants

  • 2016
    20th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD16)
    Best Student Paper Award, Research Track for
    "TRIÈST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fixed Memory Size"
    We present TRIÈST, a suite of one-pass streaming algorithms to compute unbiased, low-variance, high-quality approximations of the global and local (i.e., incident to each vertex) number of triangles in a fully-dynamic graph represented as an adversarial stream of edge insertions and deletions. Our algorithms use reservoir sampling and its variants to exploit the user-specified memory space at all times. This is in contrast with previous approaches which use hard-to-choose parameters (e.g., a fixed sampling probability) and offer no guarantees on the amount of memory they will use. We show a full analysis of the variance of the estimations and novel concentration bounds for these quantities. Our experimental results on very large graphs show that TRI\`EST outperforms state-of-the-art approaches in accuracy and exhibits a small update time.
    This is joint work with Alessandro Epasto, Matteo Riondato and Eli Upfal.
  • 2016
    20th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD16)
    Student Travel Award
  • 2014
    Full Scolarship for Ph.D. progam in Computer Science, Brown University
  • 2012
    Full Scolarship for Ph.D. progam in Information Engineering, University of Padova

Filter by type:

Sort by year:

The I/O complexity of Toom-Cook Integer Multiplication

Gianfranco Bilardi, Lorenzo De Stefani
Conference Paper ACM-SIAM Symposium on Discrete Algorithms - SODA 19 |January 2019

Abstract

Nearly matching upper and lower bounds are derived for the I/O complexity of the Toom-Cook-k (or Toom-k) algorithm computing the products of two integers, each represented with n digits in a given base s, in a two-level storage hierarchy with M words of fast memory, with different digits stored in different memory words. An IO_{A,k}(n,M) = (n/M)^{log_k 2k-􀀀1}M lower bound on the I/O complexity is established, by a technique that combines an analysis of the size of the dominators of suitable sub-CDAGs of the Toom-Cook-k CDAG (Computational Directed Acyclic Graph) and the analysis of a function, which we call Partial Grigoriev's flow", which captures the amount of information to be transferred between specific subsets of input and output variables, by any algorithm that solves the integer multiplication problem. The lower bound applies even if the recomputation of partial results is allowed. A careful implementation of the Toom-Cook-k algorithm, assuming that M = 􀀀^k3 log_s k, is also developed and analyzed, leading to an I/O complexity upper bound that is within a factor O(k^2) of the corresponding lower bound, hence asymptotically optimal for fixed k. Both the lower and the upper bounds are actually derived in the more general case where the value of k is allowed to vary with the level of recursion, although the quantitative expressions become more involved. Extensions of the lower bound for a parallel model with P processors are also discussed.

The I/O complexity of Strassen’s matrix multiplication with recomputation

Gianfranco Bilardi, Lorenzo De Stefani
Conference Paper Algorithms and Data Structures Symposium - WADS 17 |August 2017

Abstract

A tight Ω((n/√M)^log_2^7 M) lower bound is derived on the I/O complexity of Strassen's algorithm to multiply two n×n matrices, in a two-level storage hierarchy with M words of fast memory. A proof technique is introduced, which exploits the Grigoriev's flow of the matrix multiplication function as well as some combinatorial properties of the Strassen computational directed acyclic graph (CDAG). Applications to parallel computation are also developed. The result generalizes a similar bound previously obtained under the constraint of no-recomputation, that is, that intermediate results cannot be computed more than once. For this restricted case, another lower bound technique is presented, which leads to a simpler analysis of the I/O complexity of Strassen's algorithm and can be readily extended to other "Strassen-like" algorithms.

Tiered sampling: An efficient method for approximate counting sparse motifs in massive graph streams

Lorenzo De Stefani,Erisa Terolli and Eli Upfal,
Conference Paper IEEE International Conference on Big Data - BigData 17 |December 2017
We introduce Tiered Sampling, a novel technique for approximate counting sparse motifs in massive graphs whose edges are observed in a stream. Our technique requires only a single pass on the data and uses a memory of fixed size M, which can be magnitudes smaller than the number of edges. Our methods addresses the challenging task of counting sparse motifs - sub-graph patterns that have low probability to appear in a sample of M edges in the graph, which is the maximum amount of data available to the algorithms in each step. To obtain an unbiased and low variance estimate of the count we partition the available memory to tiers (layers) of reservoir samples. While the base layer is a standard reservoir sample of edges, other layers are reservoir samples of sub-structures of the desired motif. By storing more frequent sub-structures of the motif, we increase the probability of detecting an occurrence of the sparse motif we are counting, thus decreasing the variance and error of the estimate. We demonstrate the advantage of our method in the specific applications of counting sparse 4 and 5-cliques in massive graphs. We present a complete analytical analysis and extensive experimental results using both synthetic and real-world data. Our results demonstrate the advantage of our method in obtaining high-quality approximations for the number of 4 and 5-cliques for large graphs using a very limited amount of memory, significantly outperforming the single edge sample approach for counting sparse motifs in large scale graphs.

Controlling False Discoveries During Interactive Data Exploration

Lorenzo De Stefani,Emanuel Zgraggen, Zheguang Zhao, Carsten Binning, Tim Kraska, Eli Upfal,
Conference Paper Proceedings of the 38th ACM SIGMOD International Conference on Management of Data - SIGMOD/PODS 17 |May 2017
Recent tools for interactive data exploration significantly increase the chance that users make false discoveries. The crux is that these tools implicitly allow the user to test a large body of different hypotheses with just a few clicks thus incurring in the issue commonly known in statistics as the multiple hypothesis testing error. In this paper, we propose solutions to integrate multiple hypothesis testing control into interactive data exploration tools. A key insight is that existing methods for controlling the false discovery rate (such as FDR) are not directly applicable for interactive data exploration. We therefore discuss a set of new control procedures that are better suited and integrated them in our system called Aware. By means of extensive experiments using both real-world and synthetic data sets we demonstrate how Aware can help experts and novice users alike to efficiently control false discoveries.

Safe Visual Data Exploration

Zheguang Zhao, Emanuel Zgraggen, Lorenzo De Stefani, Carsten Binning, Eli Upfal and Tim Kraska
Conference Paper Proceedings of the 38th ACM SIGMOD International Conference on Management of Data - SIGMOD/PODS 17 |May 2017
Exploring data via visualization has become a popular way to understand complex data. Features or patterns in visualization can be perceived as relevant insights by users, even though they may actually arise from random noise. Moreover, interactive data exploration and visualization recommendation tools can examine a large number of observations, and therefore result in further increasing chance of spurious insights. Thus without proper statistical control, the risk of false discovery renders visual data exploration unsafe and makes users susceptible to questionable inference.To address these problems, we present QUDE, a visual data exploration system that interacts with users to formulate hypotheses based on visualizations and provides interactive control of false discoveries.

TRIÈST: Counting Local and Global Triangles in Fully-dynamic Streams with Fixed Memory Size

Lorenzo De Stefani, Alessandro Epasto, Matteo Riondato, Eli Upfal
Journal Paper ACM Transactions on Knowledge Discovery from Data - TKDD | August 2017

Abstract

We present TRIÈST, a suite of one-pass streaming algorithms to compute unbiased, low-variance, highquality approximations of the global and local (i.e., incident to each vertex) number of triangles in a fully-dynamic graph represented as an adversarial stream of edge insertions and deletions. Our algorithms use reservoir sampling and its variants to exploit the user-specified memory space at all times. This is in contrast with previous approaches, which require hard-to-choose parameters (e.g., a fixed sampling probability) and offer no guarantees on the amount of memory they use. We analyze the variance of the estimations and show novel concentration bounds for these quantities. Our experimental results on very large graphs demonstrate that trièst outperforms state-of-the-art approaches in accuracy and exhibits a small update time.

Towards Sustainable Insights or why polygamy is bad for you

Carsten Binnig, Lorenzo De Stefani, Tim Kraska, Eli Upfal, Emanuel Zgraggen, Zheguang Zhao
Conference Paper Proceedings of the 7th biennial Conference on Innovative Data Systems Research - CIDR. |January 2017
Have you ever been in a sauna? If yes, according to our recent survey conducted on Amazon Mechanical Turk, people who go to saunas are more likely to know that Mike Stonebraker is not a character in “The Simpsons”. While this result clearly makes no sense, recently proposed tools to automatically suggest visualizations, correlations, or perform visual data exploration, significantly increase the chance that a user makes a false discovery like this one. In this paper, we first show how current tools mislead users to consider random fluctuations as significant discoveries. We then describe our vision and early results for QUDE, a new system for automatically controlling the various risk factors during the data exploration process.

TRIÈST: Counting Local and Global Triangles in Fully-dynamic Streams with Fixed Memory Size

Lorenzo De Stefani, Alessandro Epasto, Matteo Riondato and Eli Upfal
Conference Paper Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD 16 | July 2016

Abstract

Best student paper award, research track.
We present TRIÈST, a suite of one-pass streaming algorithms to compute unbiased, low-variance, highquality approximations of the global and local (i.e., incident to each vertex) number of triangles in a fully-dynamic graph represented as an adversarial stream of edge insertions and deletions. Our algorithms use reservoir sampling and its variants to exploit the user-specified memory space at all times. This is in contrast with previous approaches, which require hard-to-choose parameters (e.g., a fixed sampling probability) and offer no guarantees on the amount of memory they use. We analyze the variance of the estimations and show novel concentration bounds for these quantities. Our experimental results on very large graphs demonstrate that TRIÈST outperforms state-of-the-art approaches in accuracy and exhibits a small update time.

Reconstructing Hidden Permutations Using the Average-Precision (AP) Correlation Statistic

Lorenzo De Stefani, Alessandro Epasto and Eli Upfal
Conference Paper Proceedings of the 30th AAAI Conference on Artificial Intelligence - AAAI 16 | February 2016

Abstract

We study the problem of learning probabilistic models for permutations, where the order between highly ranked items in the observed permutations is more reliable (i.e., consistent in different rankings) than the order between lower ranked items, a typical phenomena observed in many applications such as web search results and product ranking. We introduce and study a variant of the Mallows model where the distribution is a function of the widely used Average-Precision (AP) Correlation statistic, instead of the standard Kendall's tau distance. We present a generative model for constructing samples from this distribution and prove useful properties of that distribution. Using these properties we develop an efficient algorithm that provably computes an asymptotically unbiased estimate of the center permutation, and a faster algorithm that learns with high probability the hidden central permutation for a wide range of the parameters of the model. We complement our theoretical analysis with extensive experiments showing that unsupervised methods based on our model can precisely identify ground-truth clusters of rankings in real-world data. In particular, when compared to the Kendall's tau based methods, our methods are less affected by noise in low-rank items.

Exploiting non-constant safe memory in resilient algorithms and data structures

Lorenzo De Stefani and Francesco Silvestri
Journal Paper Theoretical Computer Science Volume 583 Issue C - TCS | June 2015

Abstract

We extend the Faulty RAM model by Finocchi and Italiano (2008) by adding a safe memory of arbitrary size S, and we then derive tradeoffs between the performance of resilient algorithmic techniques and the size of the safe memory. Let δ and α denote, respectively, the maximum amount of faults which can happen during the execution of an algorithm and the actual number of occurred faults, with α≤δ. We propose a resilient algorithm for sorting n entries which requires O(n log n + α(δ/S+log S)) time and uses Θ(S) safe memory words. Our algorithm outperforms previous resilient sorting algorithms which do not exploit the available safe memory and require O(n log n + αδ) time. Finally, we exploit our sorting algorithm for deriving a resilient priority queue. Our implementation uses Θ(S) safe memory words and Θ(n) faulty memory words for storing n keys, and requires O(log n+δ/S) amortized time for each insert and deletemin operation. Our resilient priority queue improves the O(log n+δ) amortized time required by the state of the art.

A Visualization Tool of Probabilistic Models for Information Access Components

Lorenzo De Stefani, Giorgio Maria Di Nunzio and Giorgio Vezzaro
Conference Paper Proceedings of the 13th European Conference on Research and Advanced Technology for Digital Libraries - ECDL 09 | 2009

Abstract

Since massive collections of textual documents become more and more available in digital format, the organization and classification of these documents in Digital Library Management System (DLMS) becomes an important issue. Information access components of a DLMS, such as automatic categorization and retrieval components of digital objects, allow users to interact with the system in order to browse, explore, and retrieve resources from collections of objects. The demonstration presents a two-dimensional visualization tool of Naïve Bayes (NB) probabilistic models for Automated Text Categorization (ATC) and Information Retrieval (IR) useful to explore raw data and interpret results.

Current Teaching

Teaching History

  • 2017 2016

    CS1550: Probabilistic Methods in Computer Science

    In 2016 and 2017, I served as Grad TA for CS450, which is offered to undergraudate and graduate students at Brown University. The main instructor for the class is Professor Eli Upfal.

  • 2014 2013

    Parallel Computing

    In 2013 and 2014, I served as Teaching Assistant for Parallel Computing, which is offered to graduate students at the Department of Information Engineering at the University of Padova. The main instructor for the class is Professor Gianfranco Bilardi

www.000webhost.com