Skip navigation
Por favor, use este identificador para citar o enlazar este ítem: https://repositorio.ufpe.br/handle/123456789/62504

Comparte esta pagina

Título : Multi-head attention classifier trained on protein-level for detecting viruses infecting cassava from RNA-seq reads
Autor : SILVA, Elisson Lima Gomes da
Palabras clave : Detecção de vírus; Dados de RNA-seq; Classificação de leituras de sequenciamento; Aprendizado profundo; Métodos livres de alinhamento
Fecha de publicación : 13-sep-2024
Editorial : Universidade Federal de Pernambuco
Citación : SILVA, Elisson Lima Gomes da. Multi-head attention classifier trained on protein-level for detecting viruses infecting cassava from RNA-seq reads. 2024. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de Pernambuco, Recife, 2024.
Resumen : This work applies artificial neural networks for classifying reads from high-throughput sequencing (HTS) data, with a particular focus on detecting plant viruses in cassava (Manihot esculenta). Viral diseases pose significant threats to crop health and food production, and cassava, a crucial crop for food security and industrial applications in Brazil and globally is no exception. Traditional bioinformatics pipelines for virus discov- ery primarily rely on alignment-based methods, which become increasingly computa- tionally expensive as the volume of genomic reference data grows. Alignment-free (AF) methodologies, especially those based on k-mer analysis, offer a promising alternative but often face challenges related to interpretability and memory demands. To address these challenges, we propose a multi-headed attention classifier model designed to detect viral presence in RNA sequencing data obtained from plant sam- ples and translated to the protein level. This model, trained for a specific host plant, leverages the attention mechanism to enhance feature extraction from k-mer distri- butions. This approach enables a more context-dependent encoding of sequencing reads, thereby improving the classification of the short genetic sequences typical of HTS data. Additionally, we implemented a cutting-edge phytosanitary pipeline on the Amazon Web Services Cloud to evaluate the performance of our proposed model. The model achieved 99% accuracy during training, effectively filtering out millions of reads from the host and other organisms, and retaining only viral reads. This sub- stantial reduction in computational demand for identifying new viruses underscores the efficiency of our approach. Our findings demonstrate that deep learning models, partic- ularly those employing the attention mechanism, can efficiently classify viral sequences in short reads, significantly lowering the computational costs associated with traditional AF methods. This work advances genetic analysis and bioinformatics, providing a more accurate and efficient method for classifying HTS reads in plant pathogen discovery.
URI : https://repositorio.ufpe.br/handle/123456789/62504
Aparece en las colecciones: Dissertações de Mestrado - Ciência da Computação

Ficheros en este ítem:
Fichero Descripción Tamaño Formato  
DISSERTAÇÃO Elisson Lima Gomes da Silva.pdf11,59 MBAdobe PDFVista previa
Visualizar/Abrir


Este ítem está protegido por copyright original



Este ítem está sujeto a una licencia Creative Commons Licencia Creative Commons Creative Commons