Extraction of Sequence Conservation Features for the Prioritization of Candidate Single Amino Acid Polymorphisms

Full Text (PDF, 211KB), PP.1-10

Views: 0 Downloads: 0

Author(s)

Jiaxin WU 1,* Mingxin Gan 2 Wangshu ZHANG 1 Rui JIANG 1

1. MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST/Department of Automation, Tsinghua University, Beijing 100084, China

2. School of Economics and Management University of Science and Technology Beijing Beijing 100083, China

* Corresponding author.

DOI: https://doi.org/10.5815/ijieeb.2011.02.01

Received: 18 Dec. 2010 / Revised: 5 Jan. 2011 / Accepted: 13 Feb. 2011 / Published: 8 Mar. 2011

Index Terms

Single amino acid polymorphisms, prioritization, guilt-by-association, Euclidean distance, Manhattan distance

Abstract

Although remarkable success has been achieved by genome-wide association (GWA) studies over the past few years, genetic variants discovered in GWA studies can typically account for only a small fraction of heritability of most common diseases. As such, the identification of multiple rare variants that are associated with complex diseases has been receiving more and more attentions. However, most of the recently developed statistical approaches for detecting association of rare variants with diseases require the selection of functional variants before the successive analysis, making an effective bioinformatics method for filtering out non-relevant rare variants indispensible. In this paper, we focus on a specific type of genetic variants called single amino acid polymorphisms (SAAPs). We propose to prioritize candidate SAAPs for a specific disease according to their association scores that are calculated using a guilt-by-association model with a set of features derived from protein sequences. We validate the proposed approach in a systematic way and demonstrate that the proposed model is powerful in distinguishing disease-associated SAAPs for the specific disease of interest.

Cite This Paper

Jiaxin Wu, Mingxin Gan, Wangshu Zhang, Rui Jiang, "Extraction of Sequence Conservation Features for the Prioritization of Candidate Single Amino Acid Polymorphisms", International Journal of Information Engineering and Electronic Business(IJIEEB), vol.3, no.2, pp.1-10, 2011. DOI:10.5815/ijieeb.2011.02.01

Reference

[1]Robinson R, "Common disease, multiple rare (and distant) variants," PLoS Biol, vol. 8, 2010: e1000293.

[2]Naoshi K, Shigeru H, Shin‐ichi K, and Akira N, "Positive association of common variants in CD36 with neovascular age‐related macular degeneration," Aging, February 2009, vol. 1. No. 2.

[3]Liana KB and Jose CF, "The genetics of type 2 diabetes: what have we learned from GWAS?" Ann. N.Y. Acad. Sci, 1212 (2010), pp. 59–77.

[4]Adeyemo A, Gerry N, Chen G, Herbert A, Doumatey A, et al., "A Genome-Wide Association Study of Hypertension and Blood Pressure in African Americans," PLoS Genet 5(7), 2009: e1000564. doi:10.1371/journal.pgen.1000564.

[5]Marian B, Christa N, H. Eka DS, et al, "Genome-wide association study (GWAS)-identified disease risk alleles do not compromise human longevity," PNAS October 4, 2010, doi: 10.1073/pnas.1003540107 .

[6]Eleftherohorinou H, Wright V, Hoggart C, Hartikainen AL, Jarvelin MR, Balding D, Coin L, Levin M.," Pathway analysis of GWAS provides new insights into genetic susceptibility to 3 inflammatory diseases," PLoS One. 2009 Nov 30;4(11):e8068.

[7]Parkes M, Barrett JC, Prescott NJ, Tremelling M, Anderson CA,et al , " Sequence variants in the autophagy gene IRGM and multiple other replicating loci contribute to Crohn's disease susceptibility," Nat Genet. 2007 Jul;39(7):830-2. Epub 2007 Jun 6.

[8]Li B, Leal SM, "Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data," Am J Hum Genet, vol. 83, pp. 311-321, 2008.

[9]Gaurav B, Vikas B, Olivier H, Nicholas J, Erik J, Kelly F, Vineet B., "A Covering Method for Detecting Genetic Associations between Rare Variants and Common Phenotypes," PLoS Comput Biol, vol. 6(10), 2010.

[10]Bentley DR, "Whole-genome re-sequencing. Curr Opin Genet Dev,"16:545–552, 2006.

[11]Bodmer W, Bonilla C, "Common and rare variants in multifactorial susceptibility to common diseases," Nat Genet 40:695–701, 2008.

[12]Xiong M, Zhao J, Boerwinkle E, "Generalized T2 test for genome association studies," Am J Hum Genet, vol. 70, pp. 1257-1268, 2002.

[13]Bo EM and Sharon RB, "A groupwise association test for rare mutations using a weighted sum statistic," PLoS Genet.,vol. 5(2), 2009.

[14]Morgenthaler S, Thilly WG, "A strategy to discover genes that carry multi-allelic or mono-allelic risk for common diseases: A cohort allelic sums test (CAST)," Mutat Res, 615:28-56, 2007.

[15]Haller G, Torgerson DG, Ober C, Thompson EE, "Sequencing the IL4 locus in African Americans implicates rare noncoding variants in asthma susceptibility, " J Allergy Clin Immunol, 2009, 124:1204-1209.e9.

[16]Ji W, Foo JN, O’Roak BJ, Zhao H, Larson MG, et al., "Rare independent mutations in renal salt handling genes contribute to blood pressure variation," Nat Genet, 40:592-599, 2008.

[17]Kotowski I, Pertsemlidis A, Luke A, Cooper R, Vega G, Cohen J, Hobbs H, "A Spectrum of PCSK9 Alleles Contributes to Plasma Levels of Low-Density Lipoprotein Cholesterol," Am J Hum Genet, 78:410-422, 2006.

[18]Ramensky V, Bork P, Sunyaev S, "Human non-synonymous SNPs: server and survey," Nucl Acids Res, vol. 30, pp. 3894-3900, 2002.

[19]Ng PC, Henikoff S, "SIFT: Predicting amino acid changes that affect protein function," Nucl Acids Res, vol. 31, pp. 3812-3814, 2003.

[20]Liu DJ, Leal SM, "A novel adaptive method for the analysis of next-generation sequencing data to detect complex trait associations with rare variants due to gene main effects and interactions," PLoS Genet, vol. 6: e1001156, 2010. 

[21]Jiang R, Yang H, Zhou L, Kuo CC, Sun F, et al., "Sequence-based prioritization of nonsynonymous single-nucleotide polymorphisms for the study of disease mutations," Am J Hum Genet, vol. 81, pp.346-360, 2007.

[22]Altshuler D, Daly M, Kruglyak L, "Guilt by association," Nat Genet, vol. 26, pp.135-137, 2000.

[23]Consortium TU, "The Universal Protein Resource (UniProt) in 2010," Nucl Acids Res, vol. 38, pp. D142-148, 2010. 

[24]Finn RD, Mistry J, Schuster-Bockler B, Griffiths-Jones S, Hollich V, et al., "Pfam: clans, web tools and services," Nucl Acids Res, vol. 34, pp. D247-251, 2006.

[25]Wu J, Zhang W, Jiang R, "Comparative study of ensemble learning approaches in the identification of disease mutations," BMEI 2010.

[26]Jiang R, Yang H, Sun F, Chen T, "Searching for interpretable rules for disease mutations: a simulated annealing bump hunting strategy," BMC Bioinformatics, vol. 7, pp.417, 2006.

[27]Stenström P, "High performance embedded architectures and compilers", third international conference, HiPEAC 2008, Göteborg, Sweden, January 27-29, 2008 : proceedings. Berlin ; New York: Springer. xiii, pp. 400.

[28]Bourbon M, Duarte MA, Alves AC, Medeiros AM, Marques L, et al., "Genetic diagnosis of familial hypercholesterolaemia: the importance of functional analysis of potential splice-site mutations," J Med Genet , vol. 46, pp.352-357, 2009.

[29]Taylor A, Tabrah S, Wang D, Sozen M, Duxbury N, et al. , "Multiplex ARMS analysis to detect 13 common mutations in familial hypercholesterolaemia," Clin Genet, vol. 71, pp. 561-568.

[30]Humphries SE, Neely RD, Whittall RA, Troutt JS, Konrad RJ, et al., "Healthy individuals carrying the PCSK9 p.R46L variant and familial hypercholesterolemia patients carrying PCSK9 p.D374Y exhibit lower plasma concentrations of PCSK9," Clin Chem, vol. 55, pp.2153-2161, 2009.

[31]DNA Mutation Diseases, "DNA Mutation Diseases," http://wwwexplorednacouk/dna-mutation-diseaseshtml.

[32]Altschul SF, Gish W, Miller W, Myers EW & Lipman DJ, "Basic local alignment search tool, " J. Mol. Biol., vol. 215, pp.403-410,1990.

[33]Altschul SF, Madden T, Schaffer A, Zhang J, Zhang Z, Miller W, Lipman D, "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, " Nucl Acids Res., vol. 25, pp. 3389–3402,1997.

[34]Jason SP and Richa A, "COBALT: constraint-based alignment tool for multiple protein sequences," Bioinformatics, vol. 23, pp. 1073–1079,2007.