Structural Protein Function Prediction - A Comprehensive Review

Full Text (PDF, 395KB), PP.49-57

Views: 0 Downloads: 0

Author(s)

Huda A. Maghawry 1,* Mostafa G. M. Mostafa 1 Mohamed H. Abdul-Aziz 1 Tarek F. Gharib 1,2

1. Faculty of Computer and Information Sciences, Ain Shams University, Cairo 11566, Egypt

2. Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia

* Corresponding author.

DOI: https://doi.org/10.5815/ijmecs.2015.10.07

Received: 24 Mar. 2015 / Revised: 13 May 2015 / Accepted: 17 Sep. 2015 / Published: 8 Oct. 2015

Index Terms

Protein function prediction, protein structure, structure alignment and comparison, distance matrix, binding sites, classification

Abstract

The large amounts of available protein structures emerges the need for computational methods for protein function prediction. Predicting protein function is mainly based on finding similarities between proteins with unknown function with already annotated proteins. This may be achieved using different protein characteristics: sequences, interactions, localization, structure and or psychochemical. A lot of review papers mainly focus on sequence and psychochemical features-based methods. This is because sequence and psychochemical data are easy to deal with and to interpret the results, and much available compared to protein structures. However, structure-based computational methods provide additional accuracy and reliability of protein function prediction. Therefore, unlike many review papers, this paper presents an up-to-date review on the structure-based protein function prediction. The aim was to provide a recent and comprehensive review of protein structure related topics: function aspects, structural classification, databases, tools and methods.

Cite This Paper

Huda A. Maghawry, Mostafa G. M. Mostafa, Mohamed H. Abdul-Aziz, Tarek F. Gharib, "Structural Protein Function Prediction - A Comprehensive Review", International Journal of Modern Education and Computer Science (IJMECS), vol.7, no.10, pp.49-57, 2015. DOI:10.5815/ijmecs.2015.10.07

Reference

[1]E.W. Sayers, T. Barrett, D.A. Benson, et al. “Database resources of the national center for biotechnology information,” Nucleic Acids Res, vol. 40, pp. D13–D25, 2012.
[2]H. Vuong, R.M. Stephens and N. Volfovsky, “AVIA: An interactive web-server for annotation, visualization and impact analysis of genomic variations,” BMC Proceedings, vol. 6, pp. 37, 2012.
[3]D. Barrell, E. Dimmer, R. P. Huntley, D., Binns, C. O’Donovan and R. Apweiler, “The GOA database in 2009—an integrated gene ontology annotation resource,” Nucleic Acids Res, vol. 37, pp. D396–D403, 2009.
[4]T. Hawkins and D. Kihara, “Function prediction of uncharacterized proteins,” J Bioinform Comput Biol, vol. 5, pp. 1-30, 2007.
[5]S. Erdin, A. M. Lisewski and O. Lichtarge, “Protein function prediction: towards integration of similarity metrics,” Curr Opin Struc Biol, vol. 21, pp. 180–188, 2011.
[6]S.C. Rastogi, P. Rastogi and N. Mendiratta, Bioinformatics Methods and Applications: Genomics Proteomics and Drug Discovery. 3rd edition. PHI Learning Pvt. Ltd. 2008.
[7]B. Boeckmann, A. Bairoch, R. Apweiler, et al. “The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003,” Nucleic Acids Res, vol. 31, pp. 365–370, 2003.
[8]H. M. Berman, J. Westbrook, Z. Feng, et al. “The protein data bank,” Nucleic Acids Res, vol. 28, pp. 235–242, 2000.
[9]A. B. Murzin, “SCOP: a structural classification of proteins database for the investigation of sequences and structures,” J Mol Biol, vol. 247, pp. 536–540, 1995.
[10]C. A. Orengo, A. D. Michie, S. Jones, D. T. Jones, M. B. Swindells and J. M. Thornton, “CATH--a hierarchic classification of protein domain structures,” Structure, vol. 5, pp. 1093–1108, 1997.
[11]M. Levitt and C. Chothia, “Structural patterns in globular proteins,” Nature, vol. 261, pp. 552–558, 1976.
[12]N. K. Fox, S.E. Brenner and J.M. Chandonia, “SCOPe: structural classification of proteins—extended, integrating SCOP and ASTRAL data and classification of new structures,” Nucleic Acids Res, vol. 42, D304-309, 2014.
[13]I. Sillitoe, A. L. Cuff, B. H. Dessailly, N. L. Dawson, N. Furnham, D. Lee, J. G. Lees, T. E. Lewis, R. A. Studer, R. Rentzsch, C. Yeats, J. M. Thornton and C. A. Orengo, “New functional families (FunFams) in CATH to improve the mapping of conserved functional sites to 3D structures,” Nucleic Acids Res, vol. 41(Database issue): D490–D498, 2013).
[14]J. Bonet, J. Planas-Iglesias, J. Garcia-Garcia, M. A. Marín-López, N. Fernandez-Fuentes and B. Oliva, “ArchDB 2014: structural classification of loops in proteins,” Nucleic Acids Res, vol. 42 (Database issue), D315-9, 2014.
[15]H. Cheng, R. D. Schaeffer, Y. Liao, L. N. Kinch, J. Pei, S. Shi, B. Kim and N. V. Grishin, “ECOD: an evolutionary classification of protein domains,” Plos Comput Biol, vol. 10, e1003926, 2014.
[16]F. J. Burkowski, Structural Bioinformatics An algorithmic Approach. Chapman and Hall/CRC Mathematical & Computational Biology Series, 2009.
[17]D. E. Almonacid, E. R. Yera, J. B. O. Mitchell and P. C. Babbitt, “Quantitative comparison of catalytic mechanisms and overall reactions in convergently evolved enzymes: implications for classification of enzyme function,” Plos Comput Biol, vol. 6, No. 3, pp. e1000700, 2010.
[18]M. Ashburner, C. A. Ball, J. A. Blake, et al. “Gene ontology: tool for the unification of biology, The Gene Ontology Consortium,” Nat Genet, vol. 25, pp. 25–29, 2000.
[19]Z. P. Feng, “An overview on predicting subcellular location of a protein,” In Silico Biol, vol. 2, pp. 291-303, 2002.
[20]D. L. Wild and M. A. S. Saqi, “Structural proteomics: inferring function from protein structure,” Current Proteomics, vol. 1, No. 1, pp. 59–65, 2004.
[21]J. Han and M. Kamber, Data Mining: Concepts and Techniques. San Francisco, CA: Elsevier, 2008.
[22]P. Larranaga, B. Calvo, R. Santana, R. et al. “Machine learning in bioinformatics,” Brief Bioinform, vol. 7, No. 1, pp. 86–112, 2006.
[23]W. Ewens and G. Grant, “Statistical methods in bioinformatics: an introduction,” in Statistics for biology and health, M. Gail, K. Krickeberg, J. Samet, A. Tsiatis, and W. Wong, Eds. 2nd ed., Springer, 2005.
[24]A. K. Tiwari and R. Srivastava, “A survey of computational intelligence techniques in protein function prediction,” International Journal of Proteomics, vol. 2014, 2014.
[25]A. Cuff, O. Redfern, B. Dessailly and C. Orengo, “Exploiting protein structures to predict protein functions,” in Protein Function Prediction for Omics Era, D. Kihara, Ed. USA: Springer, 2011.
[26]O. C. Redfern, B. Dessailly and C. A. Orengo, “Exploring the structure and function paradigm,” Curr Opin Struct Biol, vol. 18, pp. 394–402, 2008.
[27]D. Lee, O. Redfern and C. Orengo, “Predicting protein function from sequence and structure,” Nat Rev Mol Cell Bio, vol. 8, pp. 995–1005, 2007.
[28]J. D. Watson, R. A. Laskowski and J. M. Thornton, “Predicting protein function from sequence and structural data,” Curr Opin Struct Biol, vol. 15, pp. 275–284, 2005.
[29]G. J. Bartlett, A. E. Todd and J. M. Thornton, “Inferring protein function from structure,” in Structural Bioinformatics, P. E. Bourne and H. Weissig, Eds. Hoboken, New Jersey: Wiley-Liss, 2003.
[30]R. A. Laskowski, J. D. Watson and J. M. Thornton, “ProFunc: a server for predicting protein function from 3D structure,” Nucleic Acids Res, vol. 33, pp. W89-W93, 2005.
[31]J. Yang, R. Yan, A. Roy, D. Xu, J. Poisson and Y. Zhang, “The I-TASSER Suite: Protein structure and function prediction,” Nat Methods, vol. 12, pp. 7-8, 2015.
[32]D. Xu, J. Zhang, A. Roy and Y. Zhang, “Automated protein structure modeling in CASP9 by I-TASSER pipeline combined with QUARK-based ab initio folding and FG-MD-based structure refinement,” Proteins: Struct, Func, Bioinf, vol. 79, (Suppl 10), pp. 147-160, 2011.
[33]M. Boaretoa, M. Yamagishib, N. Catichaa, and V. Leite, “Relationship between global structural parameters and Enzyme Commission hierarchy: Implications for function prediction,” Comput Biol Chem, vol. 40, pp. 15–19, 2012.
[34]R. Wang and S. C. Schmidler, “Bayesian multiple protein structure alignment,” Research in Computational Molecular Biology. Lecture Notes in Computer Science, vol. 8394, pp. 326-339, 2014.
[35]D. W. Ritchie, A. W. Ghoorah, L. Mavridis and V. Venkatraman, “Fast protein structure alignment using Gaussian overlap scoring of backbone peptide fragment similarity,” Bioinformatics, vol. 28, pp. 3274–3281, 2012.
[36]M. Wiederstein, M. Gruber, K. Frank, F. Melo and M. J. Sippl, “Structure-based characterization of multiprotein complexes,” Structure, vol. 22, pp. 1063–1070, 2014.
[37]K. Olechnovič and Č. Venclovas, “The CAD-score web server: contact area-based comparison of structures and interfaces of proteins, nucleic acids and their complexes,” Nucleic Acids Res, vol. 42 (Web Server issue), W259-W2, 2014.
[38]S. Minami, K. Sawada and G. Chikenji, “MICAN: a protein structure alignment algorithm that can handle Multiple-chains, Inverse alignments, Ca only models, Alternative alignments, and Non-sequential alignments,” BMC Bioinformatics, vol. 14, 24, 2013.
[39]M. J. Sippl and M. Wiederstein, “Detection of spatial correlations in protein structures and molecular complexes,” Structure, vol. 20, pp.718–728, 2012.
[40]M. N. Nguyen, K. P. Tan and M. S. Madhusudhan, “CLICK - Topology independent comparison of biomolecular 3D structures,” Nucleic Acids Res, vol. 39, Issue suppl 2, pp. W24-W28, 2011.
[41]R. Potestio, T. Aleksiev, F. Pontiggia, S. Cozzini and C. Micheletti, “ALADYN: a web server for aligning proteins by matching their large-scale motion,” Nucleic Acids Res, vol. 38(Web Server issue), W41-5, 2010.
[42]Y. Zhang and J. Skolnick, “TM-align: A protein structure alignment algorithm based on TM-score,” Nucleic Acids Res, vol. 33, pp. 2302-2309, 2005.
[43]I. N. Shindyalov and P. E. Bourne, “Protein structure alignment by incremental combinatorial extension (CE) of the optimal path,” Protein Eng, vol. 11, pp. 739–747, 1998.
[44]L. Holm and C. Sander, “Dali: a network tool for protein structure comparison,” Trends Biochem Sci, vol. 20, No. 11, pp. 478–80, 1995.
[45]T. Madej, C. J. Lanczycki, D. Zhang, P. A. Thiessen, R. C. Geer, A. Marchler-Bauer and S. H. Bryant, “MMDB and VAST+: tracking structural similarities between macromolecular complexes,” Nucleic Acids Res, vol. 42, pp. D297-303, 2014).
[46]J. Razmara, S. Deris and S. Parvizpour, “TS-AMIR: a topology string alignment method for intensive rapid protein structure comparison,” Algorithm Mol Biol, vol. 7, 4, 2012.
[47]Z. H. Zhang, K. Bharatham, W. A. Sherman and I. Mihalek, “deconSTRUCT: general purpose protein database search on the substructure level,” Nucleic Acids Res, vol. 38(Web Server issue), W590-W594, 2010.
[48]Z. H. Zhang, H. K. Lee and I. Mihalek, “Reduced representation of protein structure: implications on efficiency and scope of detection of structural similarity,” BMC Bioinformatics, vol. 11, 155, 2010.
[49]S. Shi, B. Chitturi and N. V. Grishin, “ProSMoS server: a pattern-based search using interaction matrix representation of protein structures,” Nucleic Acids Res, vol. 37(Web Server issue), W526-31, 2009.
[50]C. A. Orengo and W. R. Taylor, “SSAP: sequential structure alignment program for protein structure comparison,” Methods Enzymol, vol. 266, pp. 617–635, 1996.
[51]J. F. Gibrat, T. Madej and S. H. Bryant, “Surprising similarities in structure comparison,” Curr Opin Struct Biol, vol. 6, pp. 377-85, 1996.
[52]G. Mayr, F. Domingues and P. Lackner, “Comparative analysis of protein structure alignments,” BMC Struct Biol, vol. 7, 50, 2007.
[53]L. Holm and P. Rosenström, “Dali server: conservation mapping in 3D,” Nucleic Acids Res, vol. 38, pp. W545-549, 2010.
[54]A. Prlic, S. Bliven, P. W. Rose, W. F. Bluhm, C. Bizon, A. Godzik, P. E. Bourne, “Pre-calculated protein structure alignments at the RCSB PDB website,” Bioinformatics, vol. 26, pp. 2983-5, 2010.
[55]L. Bartoli, E. Capriotti, P. Fariselli, P. L. Martelli and R. Casadio, “The pros and cons of predicting protein contact maps,” in: Protein Structure Prediction, M. J. Zaki and C. Bystroff, Eds. 2nd ed., Totowa, New Jersey: Humana Press, 2008.
[56]J. Hu, X. Shen, Y. Shao, C. Bystroff and M. J. Zaki, “Mining protein contact maps,” in Proceedings of BIOKDD02: Workshop on Data Mining in Bioinformatics, with SIGKDD02 Conference, M. Zaki, J. Wang and H. Toivonen, Eds. Edmonton, Alberta, Canada, 2002.
[57]D. E. Pires, R. C. Melo-Minardi, M. A. Santos, C. H. Silveira, M. M. Santoro and W. Meira, “Cutoff Scanning Matrix (CSM): structural classification and function prediction by protein inter-residue distance patterns,” BMC Genomics, vol. 12, S12, 2011.
[58]D. Bhavani, K. Suvarnavani and S. Sinha, “Mining of protein contact maps for protein fold prediction,” Wiley Int. Review on Data Mining and Knowledge Discovery, vol. 1, No. 4, pp. 362–368, 2011.
[59]H. A. Maghawry, M. G. Mostafa and T. F. Gharib, “A new protein structure representation for efficient protein function prediction,” J Comput Biol, vol. 21, pp. 936-46, 2014.
[60]K. Marsolo and K. Ramamohanarao, “Structure based querying of proteins using wavelets,” Proceedings of the 15th ACM International Conference on Information and Knowledge Management (CIKM '06.), November 5–11; Arlington, VA, USA. USA: ACM New York, 2006.
[61]G. Mirceva, I. Cingovska, Z. Dimov and D. Davcev, “Efficient approaches for retrieving protein tertiary structures,” IEEE Trans on Computational Biology and Bioinformatics, vol. 9, No. 4, pp. 1166–1179, 2012.
[62]B. J. Polacco and P. C. Babbitt “Automated discovery of 3D motifs for protein function annotation,” Bioinformatics, vol. 22, pp. 723–730, 2006.
[63]J. Yang, A. Roy and Y. Zhang, “BioLiP: a semi-manually curated database for biologically relevant ligand-protein interactions,” Nucleic Acids Res, vol. 41, D1096-D1103, 2013.
[64]J. Konc and D. Janezic, “ProBiS–2012: web server and web services for detection of structurally similar binding sites in proteins,” Nucleic Acids Res, vol. 40, pp. W214-W221, 2012.
[65]J. Dundas, Z. Ouyang, J. Tseng, A. Binkowski, Y. Turpaz and J. Liang, “CASTp: computed atlas of surface topography of proteins with structural and topographical mapping of functionally annotated residues,” Nucleic Acids Res, vol. 34, W116-W118, 2006.
[66]N. Furnham, G. L. Holliday, T. A. de Beer, J. O. Jacobsen, W. R. Pearson and J. M. Thornton, “The Catalytic Site Atlas 2.0: cataloging catalytic sites and residues identified in enzymes,” Nucleic Acids Res, vol. 42(Database issue), D485-9, 2014.
[67]K. Kinoshita and H. Nakamura, “Identification of the ligand binding sites on the molecular surface of proteins,” Protein Sci, vol. 14, pp. 711-718, 2005.
[68]Y. Jia, J. Huan, V. Buhr, J. Zhang and L. N. Carayannopoulos, “Towards comprehensive structural motif mining for better fold annotation in the "twilight zone" of sequence dissimilarity,” BMC Bioinformatics, vol. 10, No. 1, S46, 2009.
[69]S. Ku and Y. Hu, “Structural alphabet motif discovery and a structural motif database,” Comput Biol Med, vol. 42, pp. 93–105, 2012.
[70]J. Shi and Y. Zhang, “Fast SCOP classification of structural class and fold using secondary structure mining in distance matrix” in Proceedings of fourth IAPR International Conference (PRIB 2009), V. Kadirkamanathan, G. Sanguinetti, M. Girolami, M., Niranjan and J. Noirel, Eds. September 7–9; Sheffield, UK. Heidelberg: Springer, pp. 344–353, 2009.
[71]A. Rahimi, A. Madadkar-Sobhani, R. Touserkani and B. Goliaei, “Efficacy of function specific 3D-motifs in enzyme classification according to their EC-numbers,” J Theor Biol, vol. 336, pp. 36–43, 2013.
[72]D.R. Livesay, D. KC and D. La, “Predicting protein functional sites with phylogenetic motifs: past, present and beyond,” in Protein Function Prediction for Omics Era, D. Kihara, Ed. USA: Springer, 2011.
[73]D. KC and D. R. Livesay, “A spectrum of phylogenetic-based approaches for predicting protein functional sites,” in Bioinformatics for Systems Biology. S. Krawetz, Ed. New York: Humana Press, 2009.
[74]B. Nisius, F. Sha and H. Gohlke, “Structure-based computational analysis of protein binding sites for function and druggability prediction,” J Biotechnol, vol. 159, No. 3, pp. 123–134, 2012.
[75]M. N. Wass, L. A. Kelley and M. J. Sternberg, “3DLigandSite: predicting ligand-binding sites using similar structures,” Nucleic Acids Res, vol. 38, W469-73, 2010.
[76]N. Nadzirin, E. Gardiner, P. Willett, P. J. Artymiuk and M. Firdaus-Raih, “SPRITE and ASSAM: web servers for side chain 3D-motif searching in protein structures,” Nucleic Acids Res, vol. 40(Web Server issue), W380-6, 2012.
[77]L. Xie and P. E. Bourne, “A robust and efficient algorithm for the shape description of protein structures and its application in predicting ligand binding sites,” BMC Bioinformatics, vol. 8(Suppl 4):S9, 2007.
[78]L. Sael, D. La, B. Li, R. Rustamov and D. Kihara, “Rapid comparison of properties on protein surface,” Proteins, vol. 73, pp. 1–10, 2008.
[79]L. Sael, B. Li, D. La, et al., “Fast protein tertiary structure retrieval based on global surface shape similarity,” Proteins, vol. 72, pp. 1259–1273, 2008.
[80]D. Kihara, L. Sael, R. Chikhi and J. Esquivel-Rodriguez, “Molecular surface representation using 3d Zernike descriptors for protein shape comparison and docking,” Curr Protein Pept Sc, vol. 12, pp. 520–530, 2011.
[81]R. Chikhi, L. Sael and D. Kihara, “Protein binding ligand prediction using moments-based methods,” in Protein Function Prediction for Omics Era, D. Kihara, Ed. USA: Springer, 2011.
[82]H. Zhao, Y. Yang and Y. Zhou, “Structure-based prediction of DNA-binding proteins by structural alignment and a volume-fraction corrected DFIRE-based energy function,” Bioinformatics, vol. 26, pp. 1857–1863, 2010.
[83]M. E. Bock, C. Garutti and C. Guerra, “Discovery of similar regions on protein surfaces,” J Comput Biol, vol. 14, No. 3, pp. 285–99, 2007.