I work on randomized algorithms for scalable machine learning. By replacing expensive exact algorithms with lightweight approximate methods, we can substantially reduce the resources needed to run a program. Machine learning is an ideal application area because learning algorithms can adapt to the noise introduced by the approximation.

My current work is on efficient approximate algorithms for low-level building blocks of machine learning, such as kernel sums and near-neighbor search, as well as fast training and inference. I am particularly interested in simple methods with theoretical guarantees that also work well in a web-scale production environment.

Selected Publications [Google Scholar]

One-Pass Diversified Sampling with Application to Terabyte-Scale Genomic Sequence Streams
Benjamin Coleman*, Benito Geordie*, Li Chou, R. A. Leo Elworth, Todd J. Treangen, and Anshumali Shrivastava 2022. International Conference on Machine Learning. (ICML22)
*Equal contribution, random order
Practical Near Neighbor Search via Group Testing
Joshua Engels*, Benjamin Coleman*, Anshumali Shrivastava 2021. Neural Information Processing Systems. (NeurIPS21) [Spotlight Talk - Top 3%]
*Equal contribution, random order
A One-Pass Distributed and Private Sketch for Kernel Sums with Applications to Machine Learning at Scale
Benjamin Coleman, Anshumali Shrivastava 2021. ACM Conference on Computer and Communications Security. (CCS21)
Sub-linear RACE Sketches for Approximate Kernel Density Estimation on Streaming Data
Benjamin Coleman, Anshumali Shriavastava 2020. The Web Conference. (WWW20)
Sub-linear Memory Sketches for Near Neighbor Search on Streaming Data
Benjamin Coleman, Richard G Baraniuk, Anshumali Shriavastava 2020. International Conference on Machine Learning. (ICML20)
Revisiting Consistent Hashing with Bounded Loads
John Chen, Benjamin Coleman, Anshumali Shrivastava 2021. Association for the Advancement of Artificial Intelligence. (AAAI21)
Fast processing and querying of 170TB of genomics data via a repeated and merged bloom filter (RAMBO)
Gaurav Gupta, Minghao Yan, Benjamin Coleman, Bryce Kille, RA Leo Elworth, Tharun Medini, Todd Treangen, Anshumali Shrivastava 2021. ACM Special Interest Group on Management of Data. (SIGMOD21)