In 2016 my work was focused on the all elusive protein structure prediction problem. I had been
working on this problem for many years and my group had applied machine learning techniques
to advance the field but we had not tried deep learning (DL). I thought it was time for it and I was
looking for a collaborator with knowledge in DL. In December of that year, I met with Xinlian Liu
of Hood College at the SRP workshop and I thought I had found that person. He presented a
poster on the topic of applying Convolutional Neural Networks (CNNs) to 2D images of
cancerous tumors and I asked him whether it would be possible to apply equivalent techniques
to 3D images of protein structures. This was at a time when 3D CNNs were not mainstream.
Our brief interaction during that workshop started a fruitful collaboration that has lasted through
today. Xinlian and I worked on a proposal and he applied and was selected for Visiting Faculty
Program (VFP) funding. In preparation for the summer internship at the lab, he taught machine
learning in the spring of 2017 and selected 2 outstanding students, Rafael Zamora-Resendiz
and Tom Corcoran, who came to the lab with him. They applied 3D CNNs and then GCNNs to
RAS protein classification. We were working at the bleeding edge of the field. At the end of the
summer, the two students who had just graduated with a BS from Hood College stayed at the
lab as research associates and later I hired one of them, Rafael, as a Computer Science
Towards the end of the summer of 2017, Kathy Yelick, then director of CS, invited a group of lab
researchers to discuss a new project called Million Veterans Project (MVP), that DOE and VA
were spearheading to apply AI and data science technologies and the tremendous computing
resources of DOE to improve healthcare for US Veterans. The task was daunting. The VA has
20 years of Electronic Health Care (EHR) records for 24 million Veterans. These records consist
of structured data (demographics, vitals, lab work) and unstructured data (physicians and
nurses notes). The collaboration would focus on 3 exemplars: cardiovascular diseases, prostate
cancer, and suicide. In addition, VA has genomic data for about 800K patients.
Computational healthcare was a field that I did not know. We had been working on deep
learning techniques but applied to proteins, not to EHRs. I discussed this challenge with Prof.
Liu and his students many times and, after careful consideration, we decided to take on this
challenge. I ended up doing what I was preaching and went way out of my comfort zone! I would
have never done this without the support and collaboration of Xinlian, Tom, and Rafael. I wrote
a proposal and Kathy selected me as the technical lead for the project.
Fast forward to 2020, we are still working with the VA and received funding from them to
support our effort. We have focused on suicide prevention and we have learned a lot about
EHRs and mental health. Almost 20 US Veterans commit suicide every day and VA doctors
have developed predictive models to identify those at high risk. Their models are based on the
structured part of the EHRs but this data does not capture all the high-risk patients. Our group is
working on applying Natural Language Processing techniques to extract information from the
unstructured data and we are ready to incorporate our results into the VA models.