Por favor, use este identificador para citar o enlazar este ítem:
https://repositorio.ufpe.br/handle/123456789/16011
Comparte esta pagina
Título : | Associating genotype sequence properties to haplotype inference errors |
Autor : | ROSA, Rogério dos Santos |
Palabras clave : | Regressão Linear; Análises Estatística; SNPs; Haplótipos; Dados Genotípicos; Inferência de Haplótipos; Linear Regression; Statistical Analysis; SNPs; Haplotypes; Genotype Data; Haplotype Inference |
Fecha de publicación : | 12-mar-2015 |
Editorial : | Universidade Federal de Pernambuco |
Resumen : | Haplotype information has a central role in the understanding and diagnosis of certain illnesses, and also for evolution studies. Since that type of information is hard to obtain directly, computational methods to infer haplotype from genotype data have received great attention from the computational biology community. Unfortunately, haplotype inference is a very hard computational biology problem and the existing methods can only partially identify correct solutions. I present neural network models that use different properties of the data to predict when a method is more prone to make errors. I construct models for three different Haplotype Inference approaches and I show that our models are accurate and statistically relevant. The results of our experiments offer valuable insights on the performance of those methods, opening opportunity for a combination of strategies or improvement of individual approaches. I formally demonstrate that Linkage Disequilibrium (LD) and heterozygosity are very strong indicators of Switch Error tendency for four methods studied, and I delineate scenarios based on LD measures, that reveal a higher or smaller propension of the HI methods to present inference errors, so the correlation between LD and the occurrence of errors varies among regions along the genotypes. I present evidence that considering windows of length 10, immediately to the left of a SNP (upstream region), and eliminating the non-informative SNPs through Fisher’s Test leads to a more suitable correlation between LD and Inference Errors. I apply Multiple Linear Regression to explore the relevance of several biologically meaningful properties of the genotype sequences for the accuracy of the haplotype inference results, developing models for two databases (considering only Humans) and using two error metrics. The accuracy of our results and the stability of our proposed models are supported by statistical evidence. |
URI : | https://repositorio.ufpe.br/handle/123456789/16011 |
Aparece en las colecciones: | Teses de Doutorado - Ciência da Computação |
Ficheros en este ítem:
Fichero | Descripción | Tamaño | Formato | |
---|---|---|---|---|
RogerioSantosRosa_Tese.pdf | 1,7 MB | Adobe PDF | ![]() Visualizar/Abrir |
Este ítem está protegido por copyright original |
Este ítem está sujeto a una licencia Creative Commons Licencia Creative Commons