Big Data
The Big Data Group examines algorithmic and statistical aspects of large-scale data analysis. Many of the most interesting and challenging problems in machine learning and data analysis lie at the intersection of statistics and computer science on the one hand and at the intersection of theory, implementations, and applications on the other hand. By its nature, the research that we perform is highly interdisciplinary.
Our research has examined:
- Social network modeling
- Random sampling and random projection methods
- Communication-avoiding algorithms
- Locally-biased graph algorithms
- Numerically-intensive machine learning
- Randomized linear algebra
- Convex and non-convex optimization
- Second-order optimization
- Terabyte-scale implementations
- Graph algorithm theory
- Geometric network analysis
- Parallel and distributed computing
In addition to the above, we are interested in a range of applications, and recent work has focused on problems in Astronomy, Climate science, Mass Spec imaging, Genetics, Multimedia Analysis, and Social Network Analysis.
The Big Data group is led by Michael Mahoney.

This project focuses on intelligent ML-based data reduction and processing as close as possible to the data source. Per sensor compression and efficient aggregation of information while preserving scientific fidelity can have a huge impact on data rates further downstream and the way that experiments are designed and operated. The research team is concentrating on powerful, specialized compute hardware at the extreme edge—such as FPGAs, ASICs, and systems-on-chip—which are typical initial processing layers of many experiments.