Skip to Main Content

Bioinformatics Tools: Protein

Protein Book Suggestions

Protein Research Tools

About General Protein Analysis Tools:

A wealth of general use applications for the study of proteins and protein sequences have been produced and shared by other researchers.  Several of these are described below.

ProtParam Tool (ExPASy):

This simple, but excellent web based tool allows users to enter a protein sequence of interest and have basic physical properties of that protein returned.  Properties calculated by the program include molecular weight, theoretical isoelectric point (pI), amino acid composition, atomic composition, extinction coefficient for A280 in water, estimated half-life, intability index, aliphatic index and grand average of hydropathicity (GRAVY).

The algorithms used to calculate properties like isoelectric point, extinction coefficient and estimated half-life for a protein sequence require the application of some seminal biochemical observations.  The papers cited describing these concepts should be accessible to any student with a modest background in chemistry/biochemistry and are well worth reading.

Translate Tool (ExPASy):

You can use the the Translate Tool for in silico translation of the 3 forward and 3 reverse reading frames of an entered  RNA or DNA sequence.

Options for output include options with or without inclusion of the provided nucleic acid sequence.

15 options for codon codes beside the standard codon table include vertebrate, invertebrate and yeast mitochondrial genetic codes.

HeliQuest Protein Helix Analysis Tool:

HeliQuest uses a user-provided amino acid sequence of a helix (α-helix, 3-10 helix, 3-11 helix or π helix) to calculate its physicochemical properties and amino acid composition.  These results are visualized in a Helical Wheel Projection diagram.

The resulting information about the input protein helix can be used to screen a specified databank to identify protein segments possessing similar features. 

 


References:

Protein Half-Life Prediction

Bachmair A, Finley D, Varshavsky A. 1986. In Vivo Half-Life of a Protein is a Function of its Amino-Terminal Residue. Science 234(4773):179-186.

Ciechanover A, Schwartz AL. 1989. How are substrates recognized by the ubiquitin-mediated proteolytic system? Trends in Biochemical Sciences 14(12):483-488.

Gonda DK, Bachmair A, Wunning I, Tobias JW, Lane WS, Varshavsky A. 1989. Universality and structure of the N-end rule. J Biol Chem 264(28):16700-12.

Guruprasad K, Reddy BVB, Pandit MW. 1990. Correlation between stability of a protein and its dipeptide composition: A novel approach for predicting in vivo stability of a protein from its primary sequence. Protein Engineering 4(2):155-161.

Tobias JW, Shrader TE, Rocap G, Varshavsky A. 1991. The N-end rule in bacteria. Science 254(5036):1374-1377.

Varshavsky A. 1997. The N-end rule pathway of protein degradation. Genes to cells : devoted to molecular & cellular mechanisms 2(1):13-28.

Protein Extinction Coefficient Prediction

Edelhoch H. 1967. Spectroscopic determination of tryptophan and tyrosine in proteins. Biochemistry 6(7):1948-1954.

Gill SC, von Hippel PH. 1989. Calculation of protein extinction coefficients from amino acid sequence data. Analytical Biochemistry 182(2):319-326.

Gill SC, von Hippel PH. 1990. Errata: Calculation of protein extinction coefficients from amino acid sequence data. Analytical Biochemistry 189(2):283-283.

Pace CN, Vajdos F, Fee L, Grimsley G, Gray T. 1995. How to measure and predict the molar absorption coefficient of a protein. Protein science : a publication of the Protein Society 4(11):2411-2423.

Hydrophobicity Calculations and Thermostability Predictions

Ikai A. 1980. Thermostability and aliphatic index of globular proteins. Journal of Biochemistry 88(6):1895-1898.

Kyte J, Doolittle RF. 1982. A simple method for displaying the hydropathic character of a protein. Journal of molecular biology 157(1):105-132.

Isoelectric Point Predictions

Bjellqvist B, Basse B, Olsen E, Celis JE. 1994. Reference points for comparisons of two-dimensional maps of proteins from different human cell types defined in a pH scale where isoelectric points correlate with polypeptide compositions. Electrophoresis 15(3-4):529-539.

Bjellqvist B, Hughes GJ, Pasquali C, Paquet N, Ravier F, Sanchez JC, Frutiger S, Hochstrasser D. 1993. The focusing positions of polypeptides in immobilized pH gradients can be predicted from their amino acid sequences. Electrophoresis 14(10):1023-1031.

Protein Helix Analysis (HeliQuest)

Gautier R., Douguet D., Antonny B. and Drin G. 2008. HELIQUEST: a web server to screen sequences with specific α-helical properties. Bioinformatics. 24(18):2101-2.

 

Protein Structure Databases:

Research Collaboratory for Structural Bioinformatics Protein Database (RCSB PDB)

Formerly (and still commonly) known as simply "The PDB", the RCSB PDB is arguably the most important and significant collection of high resolution three dimensional protein structures available.  In 2013, the collection contained nearly 100,000 molecular structures with nearly 10,000 being added in 2013 alone.  

Protein structure files can be downloaded in the *.pdb file format for use with 3D structure viewing software or can be viewed directly at the site within java compliant browsers.  Available protein structures can be located via either keyword search or successive selection of catagory facets such as organism, taxonomy, experimental method (i.e. X-ray diffraction, solution NMR, etc.), X-ray resolution, polymer type (i.e. protein, RNA, DNA, etc.) and enzyme classification.

The site is very well organized and contains a wealth of information that is actively supplemented and managed. If you are new to this valuable resource you might consider:

About RCSB PDB

Advanced Search Tutorial

Advanced Tools you might find useful include:

Pairwise Structure Alignment Tool:

Compare proteins by either pairwise alignment or structure comparisons.

Search by Structural Similarity

Search protein structure shapes by BioZernike descriptors as describes by Gurzenko et al 2020.  The search will find structures whose volumes are globally similar to the query structure.

NCBI Protein Structure Database

The NCBI Protein Structure Database (aka the Molecular Modeling Database (MMDB)) contains macromolecular structure files derived from the Research Collaboratory for Structural Bioinformatics Protein Database (RCSB PDB).  However the data from these files is curated differently and is subsequently stored in a different file format (i.e. *.cn3 for NCBI versus the *.pdb file format for RSCB).

3D Macromolecular Structures Gateway

MMDB Help Page

 

3D Structure Viewers:

Protein structure files can generally be downloaded to a personal computer and viewed, annotated and exported using stand alone viewing software.  This has several benefits over a browser based viewer such as eliminating the requirement for internet connectivity and providing the ability to render personalized, publication quality images.  

As with most things, you get what you pay for and 3D structure viewers are no exception.  However, there several high quality and freely available 3D rendering engines are available for MAC, Windows and Linux operating systems.

 

Swiss PDB Viewer

Arguably the preferred method for working with *.pdb files from RCSB PDB.  This robust, stand alone program is freely available for Windows and Mac operating systems and can be configured to work with Linux OS using the Wine emulator.

Swiss PDB Viewer 4.1 User Guide (Online)

Swiss PDB Viewer 4.1 Tutorials

PyMOL

An open source molecular visualization tool which can produce publication quality 3 dimensional images of macromolecules.

Sign-up for an Educational-Use-Only PyMOL download here [Note that this license does not allow you to make figures for publication.]

Find a very good PyMOL Wiki here

Find a PyMOL Cheat Sheet / Reference Card here

 

CHIMERA

The CHIMERA structure viewer is available for Windows, Mac and Linux operating systems and can work with a variety of file types including *.pdb files from the RCSB PDB.  It does not presently work with *.cn3 files.

CHIMERA User's Guide

CHIMERA Tutorials

CN3D ("See in 3D)

CN3D is a structure viewer, annotation and export application available for Windows, Mac anc Linux operating systems designed to work the the *.cn3 file format used by the NCBI Protein Database.  The software can be used as both a stand alone application and  a web browser plugin.

In addition to visual manipulation of macromolecular structure files, CN3D can also display structural alignments of multiple proteins.  This is very cool and frequently useful.

CN3D Tutorial

CN3D Pairwise Structural Alignment Tutorial

About the Protein Family (PFam) Tool:

Protein families are groups of proteins determined to share structural and functional features.  Key underlying principles for assigning proteins to a given family are:

  1. Many structural/functional features exist as discrete domains within a protein.
  2. Most proteins can be considered as modular assemblies of these different elements.
  3. Proteins with very similar kinds and arrangements of these structural/functional modules are considered to belong to the same family.
  • One goal for curating a database of protein family assignments is that it can be used to provide putative structure/function classifications for new protein sequences.  This is important, because the vast amount of sequencing information resulting from genome sequencing projects has provided a wealth of sequence information for which there is minimal (or no) real biochemical, biophysical, cell biological or molecular biological data.
  • Another goal is that the Pfam database can be used to correlate protein mutations with predictable structural and/or functional changes in domains common to that protein family

 

Pfam Home Page:

Protein Sequence Search:

Match your protein sequence against the Pfam database to find the most likely family assignment(s) for your protein here.

Keyword Search:

Search Pfam entry descriptions and comments, sequence descriptions and species fields, and the HEADER and TITLE records from PDB files for key words to discover relevant Pfam entries here.

Browse Pfam:

Browse lists of Pfam families, clans, or proteomes here.

Tutorial: 

A short, web-based tutorial for using Pfam can be found here.


Citing Pfam:

El-Gebali S, Mistry J, Bateman A, Eddy SR, Luciani A, Potter SC, Qureshi M, Richardson LJ, Salazar GA, Smart A et al. 2018. The Pfam protein families database in 2019. Nucleic Acids Res. 47(D1):D427-D432. doi: 10.1093/nar/gky995. PMID: 30357350; PMCID: PMC6324024

Protein Structure Prediction:

AlphaFold Colab

This Colab notebook allows you to easily predict the structure of a protein using a slightly simplified version of AlphaFold v2.1.0.

Reference:

Jumper, J., Evans, R., Pritzel, A. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021). https://doi.org/10.1038/s41586-021-03819-2