Machine Learning for Proteome-Wide Targeted Covalent Drug Discovery
Abstract
Covalent inhibitors have the potential to be more potent and selective than traditional reversible inhibitors and may also target the traditionally undruggable sites such as shallow pockets and protein-protein interfaces. However, identifying covalently-ligandable sites is challenging due to the complex and dynamic nature of protein-ligand interactions. While activity-based chemoproteomics has emerged as a powerful tool to probe proteome-wide ligandabilities, it has limitations such as cost, false negatives, and sometimes it identifies the protein but not the specific ligandable site. To address these challenges, we developed machine learning (ML) models based on decision trees and three-dimensional convolution neural networks using the crystal structures containing the ligandable cysteines in the entire Protein Data Bank. The models achieved AUCs above 90%, and the external validation against a proteomics dataset showed recall about 80% even with AlphaFold2 structure models. Our work paves the way for the integration of big structural data, ML models, and chemoproteomics to interrogate the proteome space for novel TCI discoveries.Description
ACS FALL 2023: Harnessing the Power of Data August 15th, 2023Rights/Terms
Attribution-NonCommercial-NoDerivatives 4.0 InternationalIdentifier to cite or link to this item
http://hdl.handle.net/10713/20612The following license files are associated with this item:
- Creative Commons
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivatives 4.0 International