Skip navigation
Use este identificador para citar ou linkar para este item: https://repositorio.ufpe.br/handle/123456789/16011

Compartilhe esta página

Título: Associating genotype sequence properties to haplotype inference errors
Autor(es): ROSA, Rogério dos Santos
Palavras-chave: Regressão Linear; Análises Estatística; SNPs; Haplótipos; Dados Genotípicos; Inferência de Haplótipos; Linear Regression; Statistical Analysis; SNPs; Haplotypes; Genotype Data; Haplotype Inference
Data do documento: 12-Mar-2015
Editor: Universidade Federal de Pernambuco
Abstract: Haplotype information has a central role in the understanding and diagnosis of certain illnesses, and also for evolution studies. Since that type of information is hard to obtain directly, computational methods to infer haplotype from genotype data have received great attention from the computational biology community. Unfortunately, haplotype inference is a very hard computational biology problem and the existing methods can only partially identify correct solutions. I present neural network models that use different properties of the data to predict when a method is more prone to make errors. I construct models for three different Haplotype Inference approaches and I show that our models are accurate and statistically relevant. The results of our experiments offer valuable insights on the performance of those methods, opening opportunity for a combination of strategies or improvement of individual approaches. I formally demonstrate that Linkage Disequilibrium (LD) and heterozygosity are very strong indicators of Switch Error tendency for four methods studied, and I delineate scenarios based on LD measures, that reveal a higher or smaller propension of the HI methods to present inference errors, so the correlation between LD and the occurrence of errors varies among regions along the genotypes. I present evidence that considering windows of length 10, immediately to the left of a SNP (upstream region), and eliminating the non-informative SNPs through Fisher’s Test leads to a more suitable correlation between LD and Inference Errors. I apply Multiple Linear Regression to explore the relevance of several biologically meaningful properties of the genotype sequences for the accuracy of the haplotype inference results, developing models for two databases (considering only Humans) and using two error metrics. The accuracy of our results and the stability of our proposed models are supported by statistical evidence.
URI: https://repositorio.ufpe.br/handle/123456789/16011
Aparece nas coleções:Teses de Doutorado - Ciência da Computação

Arquivos associados a este item:
Arquivo Descrição TamanhoFormato 
RogerioSantosRosa_Tese.pdf1,7 MBAdobe PDFThumbnail
Visualizar/Abrir


Este arquivo é protegido por direitos autorais



Este item está licenciada sob uma Licença Creative Commons Creative Commons