Guided Affinity Groups for BE@MDS22 - Sustainable Horizons Institute

Guided Affinity GroupS (GAGS) are designed to help students get more out of SIAM MDS conference sessions. Led by MDS volunteer community members, learning groups explore conference topics from an entry level perspective by meeting prior to the conference session, attending the conference session together, and then meeting afterwards. BE attendees will meet with affinity group leads virtually prior to the conference and then meet daily with leads. Attendees will provide a 5-10 minute presentation on what was discovered and learned from the experience at the BE Wrap up session. All conference attendees are invited to attend the morning affinity group stand-ups and participate in expanding the educational experience of our BE attendees. Students who wish to attend the GAGS must pre-register by sending an email to info@shinstitute.org.

Guided Affinity Group Standups

First thing every morning at SIAM MDS22, the affinity groups will gather to discuss three questions:

What did we learn yesterday?
What are we planning on learning today?
What do we need to do today to get the final presentation complete?

Guided Affinity Group Presentation

Each group will present for a maximum of 10 minutes during the BE Wrap Up Session. Best practices in presentations say that no more than 5 slides should be used. Students will be required to create and deliver this presentation. In the slides we’d like the group to talk about:

What was their affinity group?
What were the pedagogical goals of the team?
Who was involved? (Leader, team, others?)
What did they learn?
What were the most effective ways to learn?
Particular talks/researchers they felt helped them?
What’s next?

BE Student Presenting Research at Poster Session

Guided Affinity Groups

Inverse problems and applications
Network science: connection, computation, and complex systems
Going deep with AI and machine learning for science and engineering
Implementing AI in healthcare: A natural language processing approach
When probabilistic graphical models meet deep learning to advance machine learning for science
Optimize everything!

Guided Affinity Groups

Inverse problems and applications

Sean Ryan Breckling,Nevada National Security Site and Malena Espanol, Arizona State University

Abstract

In many physical systems, the internal structure of a material can only be observed by analyzing measurements obtained on the exterior of it. For example, we might be able to describe the electrical properties of the organs inside a human body from measurements obtained by electrodes put around the body. Another example is to characterize the different densities inside a solid material object by analyzing X-ray images of it. These are examples of inverse problems. Furthermore, these are called ill-posed inverse problems because they are very sensitive to modeling and measurement errors. Therefore, robust, reliable, and efficient regularization methods need to be developed to be able to compute fast meaningful solutions. This GAG will explore the different inverse problems that appear in science and engineering, and review some of the more standard methods and discuss the current challenges. As part of this GAG we will attend different talks showcasing the current research being done in the area of inverse problems.

Relevant conference themes:

Data assimilation
Inverse problems
Statistical inference

Sean Ryan Breckling

I am a Senior Scientist at the Nevada National Security Site. There, I work with a variety of application spaces, several of which are at the intersection of data science and inverse problems. My PhD is in Applied Mathematics from the University of Nevada-Las Vegas, and my undergraduate work was also in Mathematics from the University of Wisconsin-Milwaukee.

The GAG program provides the kind of “professional networking” training that I wish had been available to me as a student.

Malena Espanol

Dr. Malena I. Espanol earned her B.S. in mathematics from the University of Buenos Aires, Argentina, and M.S. and Ph.D. in mathematics from Tufts University. After graduation, Dr. Espanol was a postdoctoral scholar at the California Institute of Technology. In 2012, she started a faculty position at The University of Akron where she stayed until 2019 when she joint the School of Mathematical and Statistical Sciences at Arizona State University. Her research interests are in applied and computational mathematics. More specifically, she is interested in the development, analysis, and application of mathematical models and numerical methods for solving problems arising in science and engineering, with a focus on problems related to materials science, image processing, and medical applications. Dr. Espanol co-organizes sessions at SIAM (Society of Industrial and Applied Mathematics) conferences, served as an NSF review panelist, was an MAA Project NExT (New Experiences in Teaching) Fellow from 2012-2013, and is a managing editor of the journal Electronic Transaction on Numerical Analysis (ETNA). In 2016, she co-organized a Networking Luncheon for Women in Math of Materials (WIMM) and helped to create a research community for WIMM.

The BE is a great program and in particular GAG is a great way to help and meet the participants in a more interactive way!

Network science: connection, computation, and complex systems

Philip Samuel Chodrow, Middlebury College & Heather Zinn Brooks, Harvey Mudd College

Abstract:

Network science is the study of complex interconnected systems through the lens of graphs and their generalizations, including hypergraphs and simplicial complexes. Network *data* science seeks to learn insightful or useful signals from empirical networks. Network data science tasks arise in a variety of areas, including epidemiology, (mis)information spread, collective decision-making, data visualization, and social justice. Common tasks include predicting of new relationships or interactions, detecting densely-interconnected communities, ranking important nodes, inferring network structure from observed dynamics, and quantifying similarity between different network data sets. These problems raise a wide range of rich questions for methodologists and domain experts, including:

– What **random models** of graphs and their generalizations reliably reproduce phenomena found in the real world? What are the mathematical properties of these models? – What **algorithms** are appropriate for finding different kinds of structure and pattern in large networks? – What can we infer about possible **dynamics** evolving an observed network structure? Given observed dynamics, what can we infer about the **structure** of the substrate network? – What **computational tools** are needed in order to execute these algorithms at scale?

This Guided Affinity Group supports early-career scholars interested in these or any other aspects of the data science of networks. Our overall aim will be to understand the motivation and methodology of several recent advances in network data science by way of comparison and contrast to more established, classical methods. As part of the GAG, we will attend a range of talks and tutorials related to the mathematics of network data science.

Relevant conference themes:

Applications of data science across science, engineering, technology, and society
Applied probability
Data assimilation, inverse problems, and statistical inference
Data mining
Network science
Signal processing and information theory

Phil Chodrow

Dr. Phil Chodrow is an incoming Assistant Professor of Computer Science at Middlebury College in Middlebury, Vermont. He was a postdoctoral scholar in the Department of Mathematics at UCLA. He earned his PhD in operations research from MIT and his BA in mathematics and philosophy from Swarthmore College. Phil’s research interests include network science, nonlinear dynamics, applied probability, and machine learning. Much of his recent work in network data science treats the extension of random graph models and inference methods to the setting of hypergraphs. His research has been supported by the Fulbright Foundation and the NSF Graduate Research Fellowship Program (GRFP). Phil is also passionate about STEM education and the role of data science in promoting a more humane world. He is an MAA Project NExT Fellow (Gold ’21). When he’s not working, Phil can be found drinking tea, practicing traditional martial arts, cooking, or watching *Star Trek: Deep Space Nine*.

Like most scientific communities, math and network science have a lot of work to do in order to become accessible, safe, and diverse spaces. We’re humbled to have the opportunity to contribute to this work, and look forward to learning with and from our participants!

Heather Zinn Brooks

Dr. Heather Zinn Brooks is an assistant professor of mathematics at Harvey Mudd College. She is a first-generation college student who earned both her Bachelor’s degree and Ph.D. in Mathematics from the University of Utah. Before joining the faculty at Harvey Mudd, she held a CAM postdoctoral position in the Mathematics department at UCLA. She specializes in mathematical modeling of social and biological systems and strives to create and communicate mathematics in a way that is exciting, relevant, and accessible to all. In her work, Brooks uses a combination of analytical and computational techniques to study phenomena in social and biological applications. She pairs tools from dynamical systems, differential equations, network theory, and stochastic processes with numerical simulation and data-driven computational techniques. She is especially enthusiastic about problems that involve the interplay of dynamics and structure. A few examples of this from her work include voltage fluctuations in ion channels, pattern formation of intracellular proteins, parasite spreading in animal grooming networks, and models of opinion dynamics on social networks.

Like most scientific communities, math and network science have a lot of work to do in order to become accessible and safe spaces for everyone. I’m humbled to have the opportunity to contribute to this work, and look forward to learning from our participants!

Going deep with AI and machine learning for science and engineering

Alina Lazar, Youngstown State University & Xiangyang Ju, Berkeley Lab

Abstract:

Currently, there are 2.5 quintillion bytes of electronic data created every day and this pace is not going to decrease in the future. Scientific datasets including scientific observations, experiments, and large-scale simulations in many domains, such as earth and space science, astronomy, genomics, environment, and physics, follow the same trend. The size of these datasets typically ranges from hundreds of gigabytes to tens of petabytes. For example, the Large Hadron Collider (LHC) collects 15 petabytes of data annually. Applied science institutions and companies generate extensive experimental and observational scientific data that require computational, networking and storage resources for processing. This Guided Affinity Group will address topics related to artificial intelligence and machine learning applications in science and engineering. We will talk about the process of developing models and implementations that give high performance computers the capabilities of sifting through huge amounts of data, learning from it, and in the end guiding future scientific discoveries. Career opportunities in academia, industry and research labs will be highlighted.

Relevant conference themes:

Applications of data science across science
Engineering
Technology
Society Machine learning, including active, deep, reinforcement, and transfer learning

Alina Lazar

Alina Lazar is a professor in the Department of Computer Science and Information Systems at Youngstown State University and an affiliate faculty in the Scientific Data Management Research Group at Lawrence Berkeley National Lab. Her research interests are machine learning and data science. Lately, she has been working on applying machine learning algorithms to large scientific datasets, networking and software engineering. She is also interested in adapting learning algorithms to scale well in order to deal with large datasets, missing values and noise. Dr. Lazar has been teaching database and machine learning courses at both undergraduate and graduate levels. She enjoys working with talented undergraduate students on multidisciplinary research projects.

Students may choose to study AI and machine learning for many reasons. Maybe they see it all the time in the media, news articles and advertisements where AI is usually portrayed as the science field just out of science fiction movies or books. The field has already impacted our lives in so many ways and things will not stop here. To build the confidence and persistence required to get through studying AI and machine learning it is helpful to have a community to rely on. Our Guided Affinity Group will serve as a community for students exploring or interested in pursuing careers in AI.

Xiangyang Ju

Xiangyang Ju is a Computing System Engineer in the Physics & X-Ray science computing group at Lawrence Berkeley National Laboratory, focusing on particle physics and Machine Learning. In particle physics, his research interest is to study the properties of the Higgs boson and to understand the electroweak interactions through the ATLAS experiment at the Large Hadron Collider, located at the border of Switzerland and France. In machine learning, he develops deep learning models and high-performance software to enable physics analyses. Dr. Ju has been enjoying working with undergraduate and graduate students coming from global universities on various projects.

I am happy to join the guided affinity groups with Dr. Lazar. I am looking forward to hearing from group members about their experience in Machine Learning and AI for science and engineering.

Implementing AI in healthcare: A natural language processing approach

Destinee Summer Morrow, Lawrence Berkeley National Laboratory

Abstract:

The basis for this GAG will be exploring how artificial intelligence is used within healthcare. This will include an overview of major applications and the clinical implications involved, such as robotics, GWAS and COVID. However, most of the focus will be on deep learning (DL) and its implementation on electronic health records (EHR). This will include the discussion of how various clinical data types such as physician notes, radiology reports, sleep tests, etc. can be used in conjunction with natural language processing (NLP) to gather important information for downstream processing. Extraction of information from unstructured data has been proven to improve performance among many DL models. Further insight in what NLP tools and DL models are available and how to use and scale them with respect to healthcare research will be provided. Use cases such as cancer research will be discussed. The complexity of EHR, requiring data mining and high performance computing (HPC) will also be discussed.

Relevant conference themes:

Applications of data science across science, engineering, technology, and society
Data mining
Machine learning, including active, deep, reinforcement, and transfer learning
Scalable algorithms and high-performance computing

Destinee Summer Morrow

I have a varied background that contains both biological and computational studies. I received my bachelor of science degree in animal science with a minor in microbiology in 2018 at North Carolina State University. I then received my master of science degree in bioinformatics in 2020 at Hood College in Frederick Maryland. The Sustainable Horizons Institute’s Sustainable Research Pathways program partnered with Lawrence Berkeley National Laboratory lead to a full-time position as a computer systems engineer at LBNL, where I am currently employed. My research focuses on analysis of electronic health records, particularly using deep learning and natural language processing. As a result of my previous education and experiences, I can say with much concern, the importance of effective collaboration and communication between various domain experts. As my career and research advances, I would like to continue collaborations with cross domains to help accelerate and advance human health.

I would like to lead a guided affinity group because I feel I can provide a certain perspective that other leaders may not be able to. I am new to this field and thus have a fresh look on current problems and applications. With an interdisciplinary background, I understand the importance and need for collaboration between domains and want to be able to share the knowledge I have and inform prospective pupils of potential research areas they can be become involved in or at least make them aware of the problems we face and how AI helps address them. I participated in the SIAM CSE21 conference as a presenter and unfortunately because it was virtual, I felt as if I did not get the chance to experience the conference to its fullest. I participated in a GAG at that time and I am generally aware of what this position involves. With this opportunity I hope that both myself and the participants in my group will have a unforgettable experience because of the this GAG. Additionally, I believe this would be a great leadership opportunity for me that will help me develop skills that I typically don’t get to work on because I work remote from home.

When probabilistic graphical models meet deep learning to advance machine learning for science

Talita Perciano, Lawrence Berkeley National Lab

Abstract:

The growth in data size and complexity coming from experiments across DOE overwhelms current statistical and learning approaches for analysis and understanding. Consequently, scientific discoveries are falling behind the pace of data generation. In several scientific fields, researchers combine simulations and experiments to analyze data. Data acquired from different sources are often complementary and provide different types or levels of information. These levels of information may come from different spatial dimensions, scales, or time steps. In this case, even though multimodal, multiscale, and temporal correlated information may exist theoretically, finding that feature correlation automatically and using it for learning and analysis is challenging. In this Guided Affinity Group we will discover how probabilistic graphical models can be combined with deep learning architectures to tackle such problems. More generally, we will discuss about ways to “inform” machine learning algorithms with scientific knowledge to increase its accuracy, reliability, efficiency, and interpretability. We will also discuss the challenges and opportunities in this area of research, not only at National Laboratories, but also in industry and academia.

Relevant conference themes:

Applied probability
Knowledge representation and reasoning
Machine learning, including active, deep, reinforcement, and transfer learning
Scalable algorithms and high-performance computing

Talita Perciano

Talita Perciano is a Research Scientist in the Machine Learning and Analytics group and the Computational Biosciences group, at Lawrence Berkeley National Laboratory. She conducts research in the areas of image analysis, machine/deep learning, quantum image processing and machine learning, and high-performance computing motivated by the incredible challenges around scientific data generated by computational models, simulations, and experiments. Her research focuses on mathematical foundations for new methods, on the implementation of scalable methods, and on platform-portability. Her goal is to develop powerful, mathematically-grounded, scalable algorithms that meet the requirements needed to analyze current and future scientific datasets acquired in user data facilities. She has built a collaboration network throughout the years in fields such as materials science, biosciences, geosciences, among others.

Optimize everything!

Stefan Wild, Argonne National Laboratory

Abstract:

Optimization and control are key data science enabling technologies. In this GAG, we will discuss the breadth of different types of optimization/control problems and approaches that arise in data science. We will explore MDS22 sessions that directly and indirectly involve research in optimization/control.

Relevant conference themes:

Applications of data science across science, engineering, technology, and society
Data assimilation, inverse problems, and statistical inference
Optimization and control

Stefan Wild

Stefan Wild is a Senior Computational Mathematician and Deputy Division Director of the Mathematics and Computer Science Division at Argonne National Laboratory and a Senior Fellow in the Northwestern Argonne Institute for Science and Engineering at Northwestern University. Wild joined Argonne as a Director’s Postdoctoral Fellow in September 2008. Prior to this, he obtained his Ph.D. in operations research from Cornell University and his M.S. and B.S. in applied mathematics from the University of Colorado. Wild’s primary research focus is developing model-based algorithms and software for challenging numerical optimization problems. He applies these techniques for data analysis, machine learning, and the solution of nonlinear inverse problems. At Argonne he leads a number of multidisciplinary computational science projects and shapes strategy for applied mathematics, numerical software, and statistics.

I love the idea of making technical conferences more accessible and engaging. I am always looking to learn more about how different people experience things.