Inverse folding based pre-training for the reliable identification of intrinsic transcription terminators

  • It is well-established that neural networks can predict or identify structural motifs of non-coding RNAs (ncRNAs). Yet, the neural network based identification of RNA structural motifs is limited by the availability of training data that are often insufficient for learning features of specific ncRNA families or structural motifs. Aiming to reliably identify intrinsic transcription terminators in bacteria, we introduce a novel pre-training approach that uses inverse folding to generate training data for predicting or identifying a specific family or structural motif of ncRNA. We assess the ability of neural networks to identify secondary structure by systematic \(\textit {in silico}\) mutagenesis experiments. In a study to identify intrinsic transcription terminators as functionally well-understood RNA structural motifs, our inverse folding based pre-training approach significantly boosts the performance of neural network topologies, which outperform previous approaches to identify intrinsic transcription terminators. Inverse-folding based pre-training provides a simple, yet highly effective way to integrate the well-established thermodynamic energy model into deep neural networks for identifying ncRNA families or motifs. The pre-training technique is broadly applicable to a range of network topologies as well as different types of ncRNA families and motifs.

Download full text files

Export metadata

Additional Services

Share in Twitter Search Google Scholar
Metadaten
Author:Vivian Bernadette BrandenburgORCiDGND, Franz NarberhausORCiDGND, Axel MosigORCiDGND
URN:urn:nbn:de:hbz:294-103004
DOI:https://doi.org/10.1371/journal.pcbi.1010240
Parent Title (English):PLoS computational biology
Publisher:Public Library of Science
Place of publication:San Francisco, Kalifornien, USA
Document Type:Article
Language:English
Date of Publication (online):2023/10/27
Date of first Publication:2022/07/07
Publishing Institution:Ruhr-Universität Bochum, Universitätsbibliothek
Tag:Open Access Fonds
Volume:18
Issue:7, Article e1010240
First Page:e1010240-1
Last Page:e1010240-19
Note:
Article Processing Charge funded by the Deutsche Forschungsgemeinschaft (DFG) and the Open Access Publication Fund of Ruhr-Universität Bochum.
Institutes/Facilities:Lehrstuhl für Biophysik, Arbeitsgruppe Bioinformatik
Lehrstuhl für Biophysik
Zentrum für Protein-Diagnostik (PRODI)
Dewey Decimal Classification:Naturwissenschaften und Mathematik / Biowissenschaften, Biologie, Biochemie
open_access (DINI-Set):open_access
faculties:Fakultät für Biologie und Biotechnologie
Licence (English):License LogoCreative Commons - CC BY 4.0 - Attribution 4.0 International