NERSC Data and Analytics Internship Projects

Data Dashboard Enhancement

Science/CS domain : Advanced Scientific Computing Research
Project description : Supercomputers use distributed file systems that face unique challenges in serving data to tens of thousands of compute nodes. Even operations as simple as calculating the amount of disk space used is a significant challenge. This project develops the NERSC Data Dashboard – a user-facing tool that gives information about how the user is using the NERSC file systems. The project will include elements of Big Data analysis, visualisation
and web design.
Desired Skills/Background : python, databases, php, spark (optional)
DAS mentor : Lisa Gerhardt

Containing HPC with Shifter
Science/CS domain: Advanced Scientific Computing Research
Project description : NERSC has pioneered using containers (such as Docker) for HPC with Shifter . This project will level-up the existing Shifter implementation by adding support for new image formats and using new kernel features.
Desired Skills/Background : python, C, system programming
DAS mentor : Shane Canon

Deep Learning the Universe
Science/CS domain : Cosmology/Machine Learning
Project description : Deep Learning tools offer a new way to understand the structure of our Universe. In this project we will use Deep Learning in combination with simulated maps of the universe to determine the underlying theoretical model that produced those maps.
Desired Skills/Background : Python, Deep Learning tools e.g. TensorFlow
DAS mentor : Debbie Bard

New methods for finding New physics at the LHC
Science/CS domain :Particle Physics/ Stats-Machine learning
Project description : Use advanced deep learning or probabilistic programming techniques for discovering anomalies in LHC data that could be signs of new physics.
Desired Skills/Background : Stats/ML , HEP , Python
DAS mentor : Wahid Bhimji

Topic modeling for Scientific Literature
Science/CS domain : NLP/Climate Science
Project description : We are interested in analyzing climate science publications to learn semantic concepts, followed by clustering (to determine co-authorship relationship) and regression (predicting funding source). The intern will work with NERSC/DAS staff and Climate scientists.
Desired Skills/Background : Python, Deep Learning tools e.g. TensorFlow
DAS mentor : Prabhat

Scaling Deep Learning Frameworks
Science/CS domain : Computer Science/Machine Learning
Project description : We are interested in understanding single node performance and multi-node scaling of various Deep Learning frameworks on our 9000-node KNL system Cori. The project will involve analyzing a range of 2D/3D/Graph CNNs and LSTMs to identify scaling bottlenecks. The intern will work with NERSC/DAS and Intel staff to optimize performance.
Desired Skills/Background : Code optimization, Python, Multiple Deep Learning tools e.g. TensorFlow, Caffe, Theano, Torch
DAS mentor : Prabhat


Please e-mail resumes to Deborah Baird


Berkeley Lab is a thriving national laboratory renowned for performing cutting-edge scientific
research in a wide variety of fields. The National Energy Research Scientific Computing Center
(NERSC), a division of Berkeley Lab, is the primary scientific computing facility for the Office of
Science in the U.S. Department of Energy and is one of the largest facilities in the world
dedicated to providing computational resources and expertise for basic scientific research.
NERSC is a world leader in accelerating scientific discovery through computation. In the Data
and Analytics (DAS, ) Group at NERSC we partner with science leaders at Berkeley Lab and at
research institutions around the country to develop new approaches to data-intensive scientific
computing problems. Our portfolio includes projects related to machine learning, databases,
scientific workflows and data visualization, in fields ranging from genomics to geophysics,
climate science to cosmology.