Machine Learning for Proteome-Wide Targeted Covalent Drug Discovery
Advisor
Date
Embargo until
Language
Book title
Publisher
Peer Reviewed
Type
Research Area
Jurisdiction
Other Titles
See at
Abstract
Covalent inhibitors have the potential to be more potent and selective than traditional reversible inhibitors and may also target the traditionally undruggable sites such as shallow pockets and protein-protein interfaces. However, identifying covalently-ligandable sites is challenging due to the complex and dynamic nature of protein-ligand interactions. While activity-based chemoproteomics has emerged as a powerful tool to probe proteome-wide ligandabilities, it has limitations such as cost, false negatives, and sometimes it identifies the protein but not the specific ligandable site. To address these challenges, we developed machine learning (ML) models based on decision trees and three-dimensional convolution neural networks using the crystal structures containing the ligandable cysteines in the entire Protein Data Bank. The models achieved AUCs above 90%, and the external validation against a proteomics dataset showed recall about 80% even with AlphaFold2 structure models. Our work paves the way for the integration of big structural data, ML models, and chemoproteomics to interrogate the proteome space for novel TCI discoveries.