Name: John Wu
Pronouns: he/him/his
Biography:
Dr. Wu works at the intersection between Big Data and mathematics. One theme of his work is how to find the right data for a user task. On this front, he has developed efficient indexing techniques and turned these algorithms into a software named FastBit. The FastBit indexing software has won an R&D 100 Award and is counted among 40 major works funded by US Department of Energy (DOE), Office of Science, as a part of its 40th Anniversary celebration in 2018. The second theme of John’s work is on how to effectively utilize the data storage systems for Big Data applications. Take the example of compression. The conventional storage systems treat user data as bytes while much of the sensor data and instrumental measurements are numerical values. Treating these numerical values as bytes makes them nearly impossible to compress; while treating these values are numbers, he was able to reduce the storage requirement by over 100-fold.
Institution/Lab: Lawrence Berkeley National Laboratory
Website: http://crd.lbl.gov/wu/
SRP Collaboration Topic/Title: Effective Data Management for Large Scientific Workflows
Field or research area: Data Management
Please select all the topical areas that apply to your project:
Data Science (i.e., data analytics, data management & storage systems, visualization); Machine Learning and AI
Brief Abstract:
The Scientific Data Management (SDM) research group is broadly interested in enabling and accelerating scientific discoveries through effective data management and analysis tools and libraries. The SDM group’s research and development efforts focus on (1) scalable storage and I/O strategies, (2) autonomous data management infrastructure, (3) data life-cycle management, and (4) workflow optimization and automation. Our group actively works with data generation and analysis workflows to reduce the complexity of large scientific analyses, including complex real-time workflows that could drive the next generation of scientific user facilities. Members of the SDM group work closely with application scientists throughout the DOE community, academic and industry researchers around the world. The group has a strong history of publications and contributes to many widely used software systems. We have strong contributions to well-known I/O libraries including HDF5 and ADIOS; and are the primary developers of FastBit, FasTensor, and so on.
Desired relevant skills, background, or interests:
Proficient in Python program – Familiar with organization of the data systems on HPCs – Willingness in digging through terabytes of data to find something interesting
Other comments:
Do any special requirements apply? In-Person Only
Other, specify:
Keywords:
Data management; data analysis; machine learning
Lightning Talk Title: Scientific Data Management at LBNL