I work on randomized algorithms for scalable machine learning. By replacing expensive exact algorithms with lightweight approximate methods, we can substantially reduce the resources needed to run a program. Machine learning is an ideal application area because learning algorithms can adapt to the noise introduced by the approximation.

My current work is on efficient approximate algorithms for low-level building blocks of machine learning, such as kernel sums or near-neighbor search. I am particularly interested in simple methods with theoretical guarantees that also work well in a web-scale production environment.

Selected Publications [Google Scholar]

Sub-linear RACE Sketches for Approximate Kernel Density Estimation on Streaming Data
Benjamin Coleman, Anshumali Shriavastava 2020. The Web Conference. (WWW20)
Sub-linear Memory Sketches for Near Neighbor Search on Streaming Data
Benjamin Coleman, Richard G Baraniuk, Anshumali Shriavastava 2020. International Conference on Machine Learning. (ICML20)
Diversified RACE Sampling on Data Streams Applied to Metagenomic Sequence Analysis
Benjamin Coleman*, Benito Geordie*, Li Chou, R. A. Leo Elworth, Todd J. Treangen, and Anshumali Shrivastava (preprint, under review)
*Equal contribution
A One Pass Private Sketch for Most Machine Learning Tasks
Benjamin Coleman, Anshumali Shrivastava (preprint, under review)
STORM: Foundations of End-to-End Empirical Risk Minimization on the Edge
Benjamin Coleman*, Gaurav Gupta*, John Chen, Anshumali Shrivastava (preprint, under review)
*Equal contribution
Revisiting Consistent Hashing with Bounded Loads
John Chen, Benjamin Coleman, Anshumali Shrivastava (preprint, under review)