In shotgun proteomics, liquid chromatography coupled to tandem mass spectrometry
(LC-MS/MS) is used to identify and quantify peptides and proteins. LC-MS/MS produces mass spectra, which must be searched by one or more engines, which employ
algorithms to match spectra to theoretical spectra derived from a reference database.
These engines identify and characterize proteins and their component peptides. By
training a convolutional neural network on a dataset of over 6 million MS/MS spectra
derived from human proteins, we aim to create a tool that can quickly and effectively
identify spectra as peptides prior to database searching. This can significantly reduce search space and thus run time for database searches, thereby accelerating LCMS/MS-based proteomics data acquisition. Additionally, by training neural networks
on labels derived from the search results of three different database search engines, we
aim to examine and compare which features are best identified by individual search
engines, a neural network, or a combination of these.