Homology Modeling and Generation of 3D-structure of Protein

 

Akshay R. Yadav*, Dr. Shrinivas K. Mohite

Department of Pharmaceutical Chemistry, Rajarambapu College of Pharmacy, Kasegaon, Sangli, Maharashtra, India- 415404.

*Corresponding Author E-mail: akshayyadav24197@gmail.com

 

ABSTRACT:

The ultimate goal of protein modeling is to predict a structure from its sequence with accuracy equivalent to the best experimentally obtained results. In all contexts where today only experimental structures provide solid foundations, it would allow users to use quickly in-silico protein models safely: structural drug design, protein function analysis, interacting, antigenic behaviors and the rational conception of proteins with increased stability or novel functions. Moreover, protein models can only be obtained if experimental techniques fail. Many proteins are just too large for an NMR test, and for X-ray diffraction they can't crystallize. In cases where there are difficulty in obtaining experimental structures for a given protein, the comparative modeling of protein structures offers an efficient alternative to determining experimental structure. Normally a model with an estimated RMSD of 1 to 4 to the experimental structure may be obtained if you find a structural template that is more than 50% identical to the query sequence.

 

KEYWORDS: Protein modeling, protein structures, homology, loop modeling, model validation.

 

 


INTRODUCTION:

Geneticists and biologists of molecules and cells routinely uncover new important proteins in specific biological processes/pathways1. Nevertheless, because of a lack of information about its atomic structures, either the molecular roles or the mechanisms of many of these proteins remain unknown. Yet it poses technological challenges to establish experimental structures of several proteins.

 

Current methods for obtaining biomolecular atomic-resolution structures (X-ray crystallography and NMR spectroscopy) allow pure protein preparations at concentrations much higher than those at which the proteins reside in a physiological environment2. Additionally, NMR has limitations in thickness, with current technologies restricted to evaluating protein structures with masses up to 15kDa. Despite of this, atomic structures of several medically and biologically significant proteins may not exist. However, the structures of such proteins are crucial for many purposes, including in the design of silico medicines, understanding the effects of disease mutations and designing experiments to check protein functional mechanisms. Comparative modeling has become increasingly important as a method that bridges the gap between sequence and structure space, enabling researchers to create structural protein models that are difficult to crystallize or for which structure determination by exploiting the fact that two proteins whose sequences are evolutionarily related show similar structural features3. Consequently, the known protein structure (template) can be used to generate a molecular protein model (query) whose experimental structure is not known. The applicability of comparative modeling in structural biology was validated by several community observations, e.g. that a small number of protein folds are found in nature and that nature can reuse specific folds for different protein functions4. Thus, the already available breadth of structural knowledge has been used by many researchers to construct structural models of several proteins whose experimental structures have not been determined. For example, ModBase and SWISS-MODEL, repositories of comparative models generated using automated protocols, have structural models for 3.4 million and 2.2 million unique sequences respectively; for comparison, there are 67,728 experimental structures in the repository for experimental structures, protein data bank (PDB). The burgeoning number of structural models in repositories such as ModBase and SWISS-MODEL demonstrates the usefulness of comparative modeling in closing the gap between the number of known sequences and known structures considerably. The protein structure initiative aims to further close this gap by defining the experimental structures for representative members of protein families who do not yet have any structural models in the PDB5. Structural models created through modeling of homology may be of direct medical and biological relevance. Structural models can be used to predict the effects uncovered from genome-wide association studies of single nucleotide polymorphisms, helping to delineate the molecular etiology of genetically transmitted diseases. Structural models based on homology have already been widely used in the screening of in-silico drugs. Structural models may be used for biological experiments to design mutations that lead to different changes in the function or stability of the modeled protein. Importantly, homology models can be used in X-ray crystallography as the starting models for molecular replacement, leading to better experimental structures. These structural models can also be used in conjunction with methods such as FRET which provide inter-residue distances and for mapping experimental data at the residue level, such as accessibility measured through EPR and mass spectrometry of H-D exchanges6.

 

In practice, homology modeling is a multistep process that can be summarized in seven steps:

1    Template recognition and initial alignment

2    Alignment correction

3    Backbone generation

4    Loop modeling

5    Side-chain modeling

6    Model optimization

7    Model validation

 

Choices have to be made at almost all steps. The modeler can never be sure of making the best ones, so much of the modeling process consists of serious thinking about how to play between multiple apparently similar choices7. Much research has been spent on teaching the computer how to make these decisions, so that models of homology can be constructed fully automatically. This currently allows modelers to build models for approximately 25% of the amino acids in a genome, thereby complementing the efforts of structural genomics projects (Sanchez and Sali, 1999, Peitsch, Schwede, and Guex, 2000). This average of 25% varies greatly between individual genomes, varying from 16% (Mycoplasma pneumoniae) to 30% (Haemophilus influenzae) and gradually growing as a result of the PDB's continuous rise8. For the remaining 75% of a genome, no template with a known structure is available (or can not be detected with a simple BLAST run), and folding techniques or simply an experiment must be used to obtain structural data. While automated model building delivers high throughput, the evaluation of these methods during CASP indicated that human expertise is still helpful, particularly if the alignment is close to the twilight zone (Fischer et al., 1999)9.

 

Fig 1. Homology modelling concept

 

The Seven steps to Homology modeling:

Step 1: Template Recognition and Initial Alignment:

The percentage identity between the sequence of interest and a possible template is high enough for simple sequence alignment programs such as BLAST (Altschul et al., 1990) or FASTA (Pearson, 1990) to detect in the safe homology modeling zone. The software compares the query sequence with all the sequences of known structures in the PDB using mainly two matrices to classify those hits:

 

1. A matrix of exchanged residues. The elements of this 2020 matrix describe the likelihood of aligning any two of the 20 amino acids. It is clearly shown that the values along the diagonal (representing preserved residues) are the best, but it can also be observed that exchanges between residue types with similar physicochemical properties (e.g. F→Y) have a better score than exchanges between residue types which vary widely in their properties10.

 

2. A matrix of an orientation. The axes of this matrix correspond to the two sequences to be aligned, and the elements of the matrix are simply the values for a given pair of residues from the exchange matrix. One tries to find the best route through this matrix during the alignment process, starting from a point near the top left, and heading down to the bottom right. To ensure no residue is used twice, at least one step to the right and one step down must always be taken. Had the dashed path in the bottom right corner led to a higher score at first sight. However, an additional gap in sequence A (Gly of sequence B is skipped) is required to open. By comparing thousands of sequences and sequence families, it became clear that gap opening is roughly as unlikely as at least a few non-identical residues in a row. However, the jump roughly in the middle of the matrix is justified because we earn lots of points after the jump, which would have been without the jump. Therefore, the alignment algorithm subtracts a "opening penalty" for each new gap, and a much lower "gap extension penalty" for each residue that is skipped in the alignment. The penalty for gap-extension is smaller simply because one gap of three residues is much more likely than three residue gaps each11.

 

Step 2: Alignment Correction:

It is time to consider more sophisticated methods to arrive at a better alignment, having identified one or more possible modeling templates using the rapid methods described above. Sometimes, aligning two sequences in a region where the identity of the percentage sequence is very low can be difficult. Next, one can use other homologousprotein sequences to find a solution12.

 

E.g: Suppose you 'd like to align the LTLTLTLT sequence with YAYAYAYAY. There are two alternatives that are equally bad, and only a third sequence, TYTYTYTYTYT, can solve the problem easily aligns to both. The above example introduced a very useful idea called "multiple sequence alignment." Several programs are available to coordinate a variety of similar sequences, such as CLUSTALW (Thompson, Higgins, and Gibson, 1994), and there is a lot of additional information in the resulting alignment. Think of a mutation from Ala→Glu. Based on the matrix this exchange always receives a score of 1. However, it is very unlikely to see such an exchange in the hydrophobic core in the 3D structure of the protein, but this mutation is perfectly normal on the surface. The alignment of multiple sequences implicitly contains information on that structural context. If at a certain position only exchanges between hydrophobic residues are observed, it is highly likely that this residue is buried. To take this knowledge into consideration during alignment, the multiple sequence alignment is used to derive position-specific scoring matrices, also called profiles (Taylor, 1986, Dodge, Schneider, and Sander, 1998). We are in the lucky circumstance of having an almost ideal profile when constructing a homology model-the template 's established structure. We simply know that some alanine is in the center of the protein and must not be combined with a glutamate as such. Nonetheless, multiple sequence alignments are useful in homology modeling, such as putting deletions (missing residues in the model) or insertions (additional residues in the model) only in areas where the sequences are highly divergent13.

 

Step 3: Backbone Generation:

The actual model building can commence when the alignment is ready. For most model, creating the backbone is trivial: one simply copies the co-ordinates of those residues of templates that appear in alignment with the model sequence. If two aligned residues differ, then it is possible to copy only the backbone coordinates (N, Cα, C, and O). If they are the same, the side chain (at least the more rigid side chains, as rotamers tend to be preserved) can also be included. Protein structures which have been experimentally determined are not ideal (but in most cases much better than models). There are countless sources of errors, ranging from poor electron density in the X-ray diffraction map to simple human errors when preparing the PDB file for submission. A lot of work has been spent on writing software to detect these errors (correcting them is even more difficult), and the current count is at more than 10,000,000 problems in the 17,000 structures deposited in the PDB by the end of 2001. It's obvious that choosing the template with the fewest errors is a straightforward way to build a good model (the PDBREPORT database [Hooft et al., 1996] can be very helpful at www.cmbi.nl/gv/pdbreport). But what if there are two templates available, and each has a poorly determined region, but these are not the same regions? Clearly, one should combine the good parts of both templates in one model — an approach known as multiple modeling. (The same applies where there are strong matches in various regions between the alignments between the model sequence and potential templates). Although multiple template modeling is, in principle, simple (and done by automated servers such as the Swiss-Model [Peitsch, Schwede, and Guex, 2000]), it is difficult in practice to achieve results that are really closer to the true structure than all templates. Nevertheless, it is possible, as shown in CASP4 by the group of Andrej Salis14.

 

Step 4: Loop Modeling:

In the majority of cases, the alignment between model and template sequence contains gaps. Either gaps in the model sequence or in the template sequence (insertions). In the first case, one simply omits residuesfrom the template, creating a hole in the model that must be closed. In the second case, one takes the continuous backbone from the template, cuts it, and inserts the missing residues. Both cases imply a conformational change of the backbone. The good news is that conformational changes cannot happen within regular secondary structure elements15. Therefore it is possible to move all insertions or deletions from helices and strands in line, positioning them in loops and turns. The bad news is that it is notoriously difficult to predict these changes in loop conformation (the main unresolved problem in homology modeling). To make matters worse we also find very different loop conformations in template and target, often without insertions or deletions. One can identify three main reasons (Rodriguez, http:/www.cmbi.kun.nl/gv/articles/text/gambling.html):

 

1. Surface loops tend to be involved in crystal contacts, resulting in a significant change of conformation between template and target.

 

2. The exchange of small to bulky side chains under the loop throws it aside. Mutation of a proline loop residue, or from glycine to any other residue. In both cases , the new residue must fit into a more confined region in the Ramachandran plot, which involves conformational loop changes most of the time16.

 

There are two main approaches to loop modeling:

1. Knowledge-based: the PDB is searched for known loops with endpoints matching the residues from which the loop must be inserted and the loop conformation is simply copied. This method is assisted by all major molecular modeling programs and servers (e.g., 3D-Jigsaw [Bates and Sternberg, 1999], Insight [Dayringer, Tramontano and Fletterick, 1986], Modeller [Sali and Blundell, 1993], Swiss-Model [Peitsch, Schwede, and Guex, 2000] or WHAT IF [Vriend, 1990])17.

 

2. Energy based: as in the true ab initio fold prediction, the efficiency of a loop is measured using an energy function. This function is then reduced, using Monte Carlo (Simons et al., 1999) or the techniques of molecular dynamics (Fiser, Do, and Sali, 2000) to arrive at the best loop conformation. The energy function is often modified (e.g., smoothed) to make searching easier (Tappura, 2001).

 

For short loops (up to 5–8 residues), at least, the different methods have a reasonable chance of predicting a loop conformation that well superimposes on the true structure. As mentioned above, owing to crystal contacts surface loops tend to change their conformation. But if the prediction is made for an isolated protein and then found to differ from the structure of the crystal, it may still be correct18.

 

Step 5: Side-Chain Modeling:

When we compare the side-chain conformations (rotamers) of residues that are preserved in structurally similar proteins, we find that they often have similar Š1-angles (i.e., the angle of torsion about the Cα−Cβ bond). Therefore it is possible to copy preserved residues entirely from the template to the model (see also step 3) and achieve a higher accuracy than just copying the backbone and reproaching the side chains. In practice, this thumb rule holds only when the conserved residues form contact networks at high levels of sequence identity. When isolated (< 35% sequence identity), conserved residue rotamers can differ in up to 45% of cases (Sanchez and Sali, 1997). Practically all effective side-chain placement strategies are based, at least in part, on information. They use libraries of common rotamers that are extracted from structures with high resolution X-rays. The different rotamers are successively tried and rated with a variety of energy functions. Intuitively, due to the combinatorial explosion, one might expect rotamer prediction to be computationally demanding-the choice of a certain rotamer automatically affects the rotamers of all neighboring residues, which in turn affect their neighbors, and soon. With 100 residues and on average 5 rotamers per residue, one would already end up at 5100 different combinations to score. Much research has been invested in developing methods to make this huge search space tractable (Desmet et al., 1992). In addition, the number of combinations is so that that even nature during the folding process does not seek them all, which means that mechanisms must exist to shrink the search space. In addition to the trivial fact that copying preserved rotamers from the template often splits the protein into distinct regions where rotamers can be independently predicted, the key to handling the combinatorial explosion lies in the backbone protein. Certain backbone conformation strongly favors certain rotamers (allowing, for example, a hydrogen bond between side chain and backbone) and thus greatly reduces the search area. There may be only one strongly populated rotamer for a given backbone conformation which can be modeled immediately, thus providing an anchor for surrounding, more flexible side chains. An example for a conformation whichfavors two different tyrosine rotamers. Today, these role specific rotamer libraries are commonly used (de Filippis, Sander, and Vriend, 1994, Stites, Meeker, and Shortle, 1994, Dunbrack, and Karplus, 1994). In order to build such a library, one takes structures of high resolution and collects all stretches of three to seven residues (depending on the method) with a given amino acid in the centre. To predict a rotamer, all collected examples are superimposed on the corresponding backbone stretch in the template (Chinea et al., 1995). Further proof that the rotamer prediction combinatorial problem is much smaller than originally believed to have been recently identified. Xiang and Honig (2001) first extracted from established systems a single side chain, and reproached it. They removed all the side chains in a second step, and used the same simple search strategy to add them again. Interestingly it turned out that the accuracy in the much simpler first case was only slightly higher. The predictive accuracy is typically very high for residues in the hydrophobic core where more than 90% of all χ1-angles fall from experimental values within ±20◦, but much lower for surface residues where the percentage is often only below 50%. For that there are two reasons:

 

1. Experimental reasons: flexible side chains on the surface tend to adopt multiple conformations, which are additionally influenced by crystal contacts. So even experiment cannot provide one single correct answer.

 

2. Theoretical reasons: the energy functions used to score rotamers can easily handle the core hydrophobic packing (mainly Van der Waals interactions), but are not sufficiently precise to get the complicated electrostatic interactions right on the surface, including water molecular hydrogen bonds and associated entropic effects.

 

It is important to note that in real-life applications, prediction accuracies provided in most publications are not reachable. This situation is largely due to the fact that the methods are tested by taking a defined structure, removing and reproaching side chains. Therefore the algorithms rely on the correct backbone, which is not available in modeling homology. Template backbone often differs substantially from target. Thus, the rotamers must be predicted on the basis of an incorrect backbone and in this case, prediction accuracies tend to be lower19.

 

Step 6: Model Optimization:

A classic chicken-and - egg scenario leads to the question already described above. To predict high precision side-chain rotamers, we need the appropriate backbone, which in effect depends on the rotamers and their packaging. The popular approach to such an problem is an iterative one: forecast the rotamers, then the subsequent backbone changes, then the new backbone rotamers, and so on, until the process converges. This method boils down to a rotamer prediction sequence and steps to reduce energy. The latter use the methods from the above stage of loop-modeling, but this time they have to be applied to the entire structure of the protein, not just a single loop. This requires enormous precision in the function of energy, because there are many more paths leading away from the answer (the target structure) than towards it, which is why it is necessary to use energy minimization. A few big errors (such as bumps, i.e. too short atomic distances) are removed at each minimization step, while many small errors are introduced. When the big mistakes are gone the small ones begin to accumulate and the model moves away from the target. Therefore, as a rule, the simulation programs of today either restrict the positions of the atom and/or implement only a few hundred measures of energy minimisation. In short, optimizing the model does not work until the functions of energy (force fields) become more accurate. There are actually two ways to achieve the precision:

 

1. Quantum force fields: protein force fields must be quick to handle these large molecules efficiently, hence energy is normally expressed only as a function of the atomic nuclei positions. The continuous increase in computer power has finally allowed the application of quantum chemistry methods to whole proteins, resulting in more precise charging distribution descriptions (Liu et al., 2001). However, overcoming the inherent approximations of today's quantum chemical calculations remains difficult. For example, attractive Van der Waals forces are so hard to treat that they often need to be omitted altogether. The overall precision achieved while providing more accurate electrostatics is still about the same as in the fields of classical force.

 

2. Fields of self-parameterizing force: the precision of a field of force largely depends on its parameters (e.g., Van der Waals radii, atomic charges). Following elaborate rules (Wang, Cieplak, and Kollman, 2000), these parameters are usually obtained from quantum chemical calculations on small molecules and fitting to experimental data. By applying the force field to proteins, one implicitly assumes that a peptide chain is just the sum of the building blocks of its individual small molecule-the amino acids. Alternatively, one may, for example, clearly state a goal of improving the models during an energy minimization, and then allow the force field to parameterize itself when attempting to achieve this goal optimally (Krieger, Koraimann, and Vriend, 2002). This approach leads to a very complex, computational process. Take initial parameters (such as from an existing force field), randomly change a parameter, minimize energy models, see if the result has improved, keep the new force field if yes, otherwise go back to the previous force field. With this procedure, the precision of the force field increases sufficiently to go in the right direction during an energy minimisation process, but experimental precision is still far out of reach.

 

The simplest approach to model optimisation is simply to run the model's simulation of molecular dynamics. Such a simulation follows a femtosecond (10−15s) time scale of the protein's motions and mimics the true folding process. Therefore one hopes that during simulation the model will complete its folding and "home in" to the true structure. The advantage is that a simulation of molecular dynamics implicitly contains entropic effects that would otherwise be difficult to treat; the disadvantage is that the fields of force are again not accurate enough to make it work. Nevertheless, one of the main tasks of Blue Gene, the forthcoming fastest computer in the world, will be to run exactly this type of molecular dynamics simulations (IBM Blue Gene team, 2001). More precise force fields will have to be available when Blue Gene goes online in 200520.

 

Step 7: Model Validation:

Every homology model contains errors. The number of errors (for a given method) mainly depends on two values:

1. The percentage identity of the sequence between template and target. Except for a few individual side chains (Chothia and Lesk, 1986; Sippl, 1993), the accuracy of the model can be compared with crystallographically determined structures if it is greater than 90%. From 50% to 90% identity, with considerably greater local errors, the rms error in the modeled coordinates can be as high as 1.5 uA. The alignment turns out to be the main bottleneck for homology modeling if the sequence identity drops to 25%, often leading to very large errors.

 

2. The number of errors in the template.

Model errors become less of an issue if they can be localized. It is hardly important, for example, that a loop far from the active site of an enzyme is incorrectly placed. So testing of the model is an important step in the homology modeling process. There are two mainly distinct ways of estimating errors in a structure:

 

i. Calculating the energy of the model based on a force field: This method checks whether the bond lengths and bond angles are within normal ranges, and whether there are lots of bumps in the model (corresponding to a high energy of Van der Waals). Essential questions like, "Is the layout correctly folded? "This can not be addressed yet, because fully misguided but well-minimized models frequently achieve the same energy field intensity as the target structure (Novotny, Rashin, and Bruccoleri, 1988). This finding is partly due to the fact that fields of force in molecular dynamics do not directly include entropic terms (such as the hydrophobic effect), but depend on the simulation to produce them. Although this problem can be solved by, for example, expanding the area of force and adding solvation, the main downside is that one only gets a single number for the entire protein and can not easily track problems down to individual residues.

 

ii. Determination of normality indices that describe how well a given characteristic of the model resembles the same characteristic in real structures. Many features of protein structures are well suited for normality analysis. Most of them are directly or indirectly based on the analysis of interatomic distances and contacts. Some published examples are:

 

a.General checks on the normality of bond lengths, bond and torsion angles (Morris et al., 1992; Czaplewski et al., 2000) are good checks on the quality of experimentally determined structures, but are less suitable for model evaluation because better model-building programs simply do not make this type of error.

 

b.     Polar and apolar residue distributions can be used inside / outside to detect completely misfolded models (Baumann, Frommel, and Sander, 1989).

 

c. For a given type of atom, the radial distribution function (i.e., the probability of finding those other atoms at a given distance) can be extracted from the library of known structures and transformed into an energy-like quantity, called a "potential of mean force" (Sippl 1990). Sucha potential caneasily distinguish good contacts (e.g., between a valineCf and an isoleucine Cf) from poor contacts (e.g., between the same valineCf and the positively charged amino lysine group).

 

d.     Taking into account not only the distance but also the direction of atomic contacts, one arrives at 3D distribution functions which can also easily recognize misfolded proteins and are strong indicators of local model building problems (Vriend and Sander, 1993).

 

Most of the methods used for model testing can also be extended to experimental structures (and hence to the models used for model construction). When attempting to extract new knowledge from the model, a thorough verification is necessary to either interpret or predict experimental outcomes or plan new experiments. In summary, it is fair to conclude that, sadly, homology modeling is not as simple as it was initially claimed. Ideally, homology modeling uses threading to boost alignment, and ab initio folding to predict loops and simulations of molecular dynamics with a perfect field of force to house the real structure. Doing all that right will keep researchers busy for a long time, leaving plenty of fascinating discoveries to good old experiment21.

 

Experimental Constraints to Improve/ Verify Homology Models:

Any experimental data that can, even indirectly, be used as a structural parameter helps to create a better homology-based structural model. If a structural model is available, further experiments can be conceived using model insights. Therefore, designing experiments using structural models and constructing models that fulfill experimental constraints is an iterative method that leads to a deeper understanding of the structure – function relationships of a protein. The experimental constraints that can be used in model building are diverse, and many examples are discussed here. Experimental constraints are usually sparse, and not sufficient by themselves to lead to an unambiguous structural model. Thus, a given set of constraints may be met by several models. The subset of models that do not meet a given experimental constraint, however, may be removed from consideration. The experimental constraints may either be at the level of the residue or provide structural details in general. Some of the residue-level constraints include distance limits between specific residues obtained by FRET and cross-linking directed at the site. Iterative model construction is possible using FRET and site-driven cross-linking, as a structural model allows a much smaller sub-set of residue-pairs to be tested for distance measurements as opposed to random residue-pairs. These distance measurements often provide direct confirmation of a given structural model. Accessibilities of residues obtained through EPR spectroscopy and H-D exchange mass spectrometry also aid in sample refinement. Small angle X-ray scattering (SAXS), cryo-electron microscopy (CryoEM), and circular dichroism (CD) spectroscopy, among others, are experiments that provide information on the overall protein structure. SAXS provides the molecular envelope or overall protein form in solution that can help differentiate between the structurally diverse models used to generate the structural model greatly on the resolution of the electron density map obtained for a given protein. Recent developments in Cryo EM have led to density maps of subnanometer resolutions that can be used to explicitly refine structural All-Atom models. CryoEM densities are normally deposited at the database of electron microscopy (http:/www.emdatabank. org/), and programs have been developed to perform robust docking of a structural model to EM densities. Thus, at present high-resolution cryoEM offers the best alternative to X-ray crystallography and NMR for accurate atomic structure of a given protein. CD spectroscopy is used to determine the secondary structure content of a given protein, and CD measurements may be used to evaluate the overall accuracy of the structural model's secondary structure content. Indirect structural constraints include protein mutation studies which evaluate changes in function and stability. These constraints can only be incorporated qualitatively in model building, but still provide the means to eliminate inaccurate models22.

 

CONCLUSION:

There is homology to a structural template, comparative modeling is a powerful technique to better understand a given protein's structure-function relationships and functional mechanisms. Importantly, landmark structural studies have provided a sufficient number of templates to model many variants for clinically relevant proteins that are hard to crystallize, such as G-protein coupled receptors (GPCRs) and ion channels. Such variants' structural models have been instrumental in furthering our knowledge of various functional mechanisms (in KC channels)andinvirtual-ligand screening(GPCRs). The most influential effect of comparative structural models aredevelopments in structural understanding of GPCRs and ion channels. In several other instances these models have been used to provide biologically valuable insights. We need to take several precautions during the model building process, and assess the quality of the model at each stage. Most significantly, to gain significance all structural models need some form of experimental validation. Therefore, an iterative process of model building and experimental testing offers the best scenario for understanding many biological proteins' structural and functional aspects, whose experimental structures remain unsolved.

 

REFERENCES:

1.      Hubner Z, Arakaki A, Skolnick J. Ontheoriginandhighly likely completeness of single-domain protein structures. Proc. Natl. Acad. Sci. 2006; 103: 2605–2610.

2.      Todd, A.E., Orengo, C.A., Thornton, J. M. Evolution off unction in protein super families, from a structural perspective. J. Mol. Biol. 2001; 307: 1113–1143.

3.      Pieper, U., Webb, B.M., Barkan, D.T., Schneidman-Duhovny, D., Schlessinger, A., Braberg, H., Yang, Z., Meng, E.C., Pettersen, E.F., Huang, C.C. ModBase, adatabase of annotated comparative protein structure models, and associated resources. Nucleic Acids Res. 2011; 39: 465–474.

4.      Kiefer, F., Arnold, K., Kunzli, M., Bordoli, L., Schwede, T. The SWISS-MODEL Repository and associated resources. Nucleic Acids Res. 2009; 37: 387–392.

5.      Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The protein data bank. Nucleic Acids Res. 2000; 28: 235–24.

6.      Chandonia, J.M., Brenner, S.E. The impact of structural genomics: expectations and outcomes. Sci. 2006; 311: 347–351.

7.      Becker, O.M., Dhanoa, D.S., Marantz, Y., Chen, D., Shacham, S., Cheruku, S., Heifetz, A., Mohanty, P., Fichman, M., Sharadendu, A. An integrated in silico 3D model-driven discovery of a novel, potent, and selective amidosulfonamide 5-HT1A agonist (PRX-00023) for the treatment of anxiety and depression. J. Med. Chem. 2006; 49: 3116–3135.

8.      Brylinski, M., Skolnick, J.: Q-Dock: low-resolution flexible ligand docking with pocketspecific threading restraints. J. Comput. Chem. 2008; 29: 1574–1588.

9.      Ekins, S., Mestres, J., Testa, B. In silico pharmacology for drug discovery: applications to targets and beyond. Br. J. Pharmacol. 2007; 152: 21–37.

10.   Labro, A.J., Boulet, I.R., Choveau, F.S., Mayeur, E., Bruyns, T., Loussouarn, G., Raes, A.L., Snyders, D.J.: The S4-S5 linker of KCNQ1 channels forms a structural scaffold with the S6 segment controlling gate closure. J. Biol. Chem. 2011; 286: 717–725.

11.   Szklarz, G.D., Halpert, J.R. Use of homology modeling in conjunction with site-directed mutagenesis for analysis of structure-function relationships of mammalian cytochromes P450. Life Sci. 1997; 61: 2507–2520.

12.   Claude, J.B., Suhre, K., Notredame, C., Claverie, J.M., Abergel, C.: CaspR: a web server for automated molecular replacement using homology modelling. Nucleic Acids Res. 2004; 32: 606–609.

13.   Dong, J., Yang, G., McHaourab, H. Structural basis of energy transduction in the transport cycle of MsbA. Sci. 2005; 308: 1023–1028.

14.   Sander C, Schneider R. Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins. 1998; 9: 56–68.

15.   Simons KT, Bonneau R, Ruczinski I, Baker D. Ab initio structure prediction of CASP III targets using ROSETTA. Proteins. 1999; 3 :171–176.

16.   Sippl MJ. Calculation of conformational ensembles from potentials of mean force. J Mol Biol. 1990; 213: 859–862.

17.   Stites WE, Meeker AK, Shortle D. Evidence for strained interactions between side-chains and the polypeptide backbone. J Mol Biol. 1994; 235: 27–32.

18.   Tappura K. Influence of rotational energy barriers to the conformational search of protein loops in molecular dynamics and ranking the conformations. Proteins. 2001; 44: 167–79.

19.   Taylor WR. Identification of protein sequence homology by consensus template alignment. J Mol Biol. 1986; 188:233–258.

20.   Thompson JD, Higgins DG, Gibson TJ. ClustalW: improving the sensitivity of progressive multiple sequence alignments through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994; 22:4673–4680.

21.   Vriend G. WHAT IF-A molecular modeling and drug design program. J Molec Graphics. 1994; 8:52–56.

22.   Wang J, Cieplak P, Kollman PA. How well does a restrained electrostatic potential (RESP) model perform in calculating conformational energies of organic and biological molecules? J Comput Chem. 2000; 21:1049–1074.

 

 

 

Received on 18.06.2020         Modified on 16.07.2020

Accepted on 01.08.2020       ©A&V Publications All right reserved

Res.  J. Pharma. Dosage Forms and Tech.2020; 12(4):313-320.

DOI: 10.5958/0975-4377.2020.00052.X