Big Data

The Big Data Group examines algorithmic and statistical aspects of large-scale data analysis. Many of the most interesting and challenging problems in machine learning and data analysis lie at the intersection of statistics and computer science on the one hand and at the intersection of theory, implementations, and applications on the other hand. By its nature, the research that we perform is highly interdisciplinary.

Our research has examined:

Social network modeling
Random sampling and random projection methods
Communication-avoiding algorithms
Locally-biased graph algorithms
Numerically-intensive machine learning
Randomized linear algebra
Convex and non-convex optimization
Second-order optimization
Terabyte-scale implementations
Graph algorithm theory
Geometric network analysis
Parallel and distributed computing

In addition to the above, we are interested in a range of applications, and recent work has focused on problems in Astronomy, Climate science, Mass Spec imaging, Genetics, Multimedia Analysis, and Social Network Analysis.

The Big Data group is led by Michael Mahoney.

Group Members

Big Data Research Projects

Real-Time Data Reduction Codesign at the Extreme Edge for Science

Real-Time Data Reduction Codesign This project focuses on intelligent ML-based data reduction and processing as close as possible to the data source. Per sensor compression and efficient aggregation of information while preserving scientific fidelity can have a huge impact on data rates further downstream and the way that experiments are designed and operated. The research team is concentrating on powerful, specialized compute hardware at the extreme edge—such as FPGAs, ASICs, and systems-on-chip—which are typical initial processing layers of many experiments.

Scalable linear algebra and neural network theory

While deep learning methods have in no doubt transformed certain applications of machine learning (ML) such as Computer Vision (CV) and Natural Language Processing (NLP), its promised impact on many other areas has yet to be seen. The reason for this is the flip side of why it has been successful where it has.

Scalable Second-order Methods for Training, Designing, and Deploying Machine Learning Models

Scalable algorithms that can handle the large-scale nature of modern datasets are an integral part of many applications of machine learning (ML). Among these, efficient optimization algorithms, as the bread and butter of many ML methods, hold a special place. Optimization methods that use only first derivative information, i.e., first-order methods, are the most common tools used in training ML models. This is despite the fact that many of these methods come with inherent disadvantages such as slow convergence, poor communication, and the need for laborious hyper-parameter tuning.

More Big Data Projects >>

Main menu

Big Data Links

Big Data

Big Data Research Projects

Quick Links

Research Areas

Projects

Visitor Information

Follow ICSI

Search form

Main menu

Big Data Links

Big Data

Big Data Research Projects

Quick Links

Research Areas

Projects

Visitor Information

Follow ICSI