Name: Min Priest
Pronouns: they/them
Biography:
Min’s research interests reside in the intersection of high performance computing (HPC) and data science. Their ongoing technical work includes graph embedding and clustering; streaming and sketching algorithms; and machine learning, statistical modeling, and uncertainty quantification, all focused on support the analysis of huge data problems on distributed memory HPC systems. Min’s work is applied to several mission areas of interest to the Department of Energy, including space sensing and surveillance; social, computer and internet network modeling; active learning across several experimental domains; climate modeling; and epidemiology and biosecurity. Min maintains several HPC software capabilities, including big data Gaussian processes (https://github.com/LLNL/MuyGPyS), a data sketching and dimensionality reduction toolkit (https://github.com/LLNL/krowkee), distributed memory K nearest neighbors algorithms (https://github.com/LLNL/saltatlas), and an HPC communication runtime for irregular communication patterns (https://github.com/LLNL/ygm).
Institution/Lab: Lawrence Livermore National Laboratory
Website: https://github.com/bwpriest
SRP Collaboration Topic/Title: HPC-scale data science for applications
Field or research area: high performance computing, scientific computing, statistics, machine learning
Please select all the topical areas that apply to your project:
Data Science (i.e., data analytics, data management & storage systems, visualization); High-Performance Computing; Machine Learning and AI
Brief Abstract:
Modern Department of Energy science missions from cosmology to biosecurity to climate to computer security to fusion are collecting increasingly enormous datasets whose exploitation requires novel approaches and algorithms. For example, the Vera C. Rubin Observatory (https://www.lsst.org/) will produce 20 trillion bytes of cosmology image data per night for 10 years, which is well beyond the throughput capabilities of conventional astronomy codes. Furthermore, nucleotide, protein, and antibody sequence catalogues continue to grow in both observation size and count, and so require novel solutions for even basic machine learning tasks such as clustering. Our group works on solutions to these and other problems that are scalable, fast, and both theoretically and statistically motivated. This summer project will involve implementing data science codes for deployment on Lawrence Livermore’s statoe-of-the-art computers in service to improving the nation’s ability to handle one of these critical problems. We expect the project to involve collaborating with several subject matter experts depending on the specifics of the task, which could include physicists, applied mathematicians, computer scientists, and statisticians.
Desired relevant skills, background, or interests:
Python and/or C++, high performance computing, statistics and/or graph algorithms, applied mathematics, interest in applications such as space, climate, or network analysis.
Other comments:
Remote internships strongly preferred.
Do any special requirements apply? other
Other, specify: Remote interships strongly preferred
Keywords:
data science; graph algorithms; high performance computing; Gaussian processes; clustering; uncertainty quantification; computational astronomy;
Lightning Talk Title: Data Science at HPC Scale