TBM encompasses the strategies that have been called homology modeling, comparative modeling and fold recognition. Such methods include simulated folding using physics-based or empirically derived energy functions, construction of models from small fragments of known structure, threading where the compatibility of a sequence with an experimentally derived fold is determined using similar energy functions and template-based modeling (TBM), in which a sequence is aligned to a sequence of known structure on the basis of patterns of evolutionary variation.
These principles permit the protein structure prediction problem to be considered as a problem of matching a sequence of interest to a library of known structures, rather than the more complex and error-prone approach of simulated folding.įor over 30 years researchers have developed and refined computational methods for protein structure prediction.
The key principles on which such techniques work are: (i) that protein structure is more conserved in evolution than protein sequence, and (ii) that there is evidence of a finite and relatively small (1,000–10,000) number of unique protein folds in nature 2. On average, 50–70% of a typical genome can be structurally modeled using such techniques 1. This ever-widening gap between our knowledge of sequence space and structure space poses serious challenges for researchers who seek the structure and function of a protein sequence of interest.įortunately, advances in computational techniques to predict protein structure and function can substantially shrink this gap. The Protein Data Bank contains just over 100,000 experimentally determined 3D structures. In September 2014, The UniProtKB/TrEMBL protein database contained over 80 million protein sequences. A typical structure prediction will be returned between 30 min and 2 h after submission. A range of additional available tools is described to find a protein structure in a genome, to submit large number of sequences at once and to automatically run weekly searches for proteins that are difficult to model. This protocol will guide users from submitting a protein sequence to interpreting the secondary and tertiary structure of their models, their domain composition and model quality. Users are guided through results by a simple interface at a level of detail they determine. In this updated protocol, we describe Phyre2, which uses advanced remote homology detection methods to build 3D models, predict ligand binding sites and analyze the effect of amino acid variants (e.g., nonsynonymous SNPs (nsSNPs)) for a user's protein sequence. Phyre2 replaces Phyre, the original version of the server for which we previously published a paper in Nature Protocols. The focus of Phyre2 is to provide biologists with a simple and intuitive interface to state-of-the-art protein bioinformatics tools.
Phyre2 is a suite of tools available on the web to predict and analyze protein structure, function and mutations.