Amal Gueroudji

Name: Amal Gueroudji
Pronouns: she/her/hers

Biography:
Amal Gueroudji is a Postdoctoral Appointee at Argonne National Laboratory. She received her Ph.D. from the University of Grenoble Alpes under the Marie Sklodowska Curie Actions (MSCA) funding at the French Atomic Energy and Alternative Energies Commission (CEA). Her research interests include programming models for distributed computing, in situ processing, and data analytics workflows, with a specific focus on HPC, AI, and Big Data convergence.

Institution/Lab: Argonne National Laboratory
Website:

SRP Collaboration Topic/Title: Performance Analytics of Dask Workloads

Field or research area: Computer science

Please select all the topical areas that apply to your project:
Computer Science (i.e., architectures, compilers/languages, networks, workflow/edge, experiment automation, containers, neuromorphic computing, programming models, operating systems, sustainable software); Data Science (i.e., data analytics, data management & storage systems, visualization); High-Performance Computing

Brief Abstract:
Dask is a task-based distributed Python framework with the unique feature of offering distributed versions of well-known libraries like NumPy, Pandas, and Scikit-learn. It has been utilized for high-performance data analytics within high-performance computing (HPC) workflows. While it boasts a strong performance record, it currently lacks an analysis of I/O performance, which is critical for HPC workloads. In this project, the selected student will utilize Darshan to extract and analyze I/O operations in Dask. The primary objectives are to gain an understanding of I/O performance and subsequently propose solutions for its improvement.

Desired relevant skills, background, or interests:
Knowledge of distributed programming and data analytics is good, as well as curiosity to discover new topics and fields, in order to generate future research directions.

Other comments:

Do any special requirements apply? International OK
Other, specify:

Keywords:
Distributed task-based programming models; Dask; MPI; Code Coupling; Scientific workflows; Darshan; IO performance

Lightning Talk Title: Divergent Programming models for converging HPC/ML/Big Data sciences