Why does the observed protein molecular weight differ from the calculated one?

4th June 2019

Figure 1: Programmed cell death ligand 1 (PD-L1, CD274, or B7-H1) (66248-1-Ig) is a type I transmembrane protein, acting as a key regulator of the adaptive immune response. Full-length PD-L1 molecular weight is 33 kDa. The signal peptide is cleaved off during protein transport to the plasma membrane, and the protein is heavily N-glycosylated with an apparent molecular weight of 45–70 kDa, with the major glycosylated form of 45–50 kDa (PMID: 27572267). CD133, also known as PROM1 (prominin-1) (18470-1-AP), is a transmembrane glycoprotein with an NH2-terminal extracellular domain, five transmembrane loops, and a cytoplasmic tail. The protein is highly glycosylated with an apparent molecular weight of 115–120 kDa. After treatment with PNGase F, CD133 shifts to a protein with a molecular weight of 75–85 kDa, which corresponds to the calculated molecular weight of de-glycosylated CD133 (PMID: 23150174).
Figure 2: Decorin (14667-1-AP) is a member of the small leucine-rich proteoglycan family of proteins, the precursor of which forms a range of 43–47 kDa molecular weight proteins. It contains a cleavable N-terminal peptide signal and can also be glycosylated. The attachment of glycosaminoglycans (chondroitin sulfate or dermatan sulfate) to decorin occurs in the Golgi apparatus prior to secretion of the mature glycanted form from cells.
Figure 3: The serine/threonine-protein kinase AKT plays a role in many cellular processes. Survival factors can suppress apoptosis in a transcription-independent manner by activating the serine/threonine kinase AKT1, which then phosphorylates and inactivates components of the apoptotic machinery. (60203-2-Ig detects all the AKT members with or without phosphorylation, 66444-1-Ig detects the phospho-Ser473 of AKT1 and phospho-S474 of AKT2/phospho-Ser472 of AKT3.)
Figure 4: Ubiquitin B (UBB) (10201-2-AP), a member of the ubiquitin family, is required for ATP-dependent, non-lysosomal intracellular protein degradation of abnormal proteins and normal proteins with a rapid turnover. This gene consists of three direct repeats of the ubiquitin coding sequence, with no spacer sequence.
Figure 5: NQO1 (11451-1-AP) enzyme serves as a quinone reductase together with conjugation reactions of the hydroquinones involved in detoxification pathways, as well as in biosynthetic processes such as the vitamin K-dependent gamma-carboxylation of glutamate residues in prothrombin synthesis. NQO1 has three isoforms: 26, 27, and 31 kDa MW, and the formation of homodimers (66-70 kDa) is needed for its enzymatic activity. Mlx-interacting protein (MLXIP, also known as MONDOA) (13614-1-AP) acts as a transcription factor, forming a heterodimer with MLX protein. This complex binds and activates transcription from CACGTG E boxes, playing a role in the transcriptional activation of the glycolytic target and glucose-responsive gene regulation. MLXIP has three isoforms: 110, 57, and 69 kDa, and the molecular weight of the MLXIP-MLX heterodimer is 130 kDa.
Why does the observed protein molecular weight differ from the calculated one?
Western blotting vs calculation
The first step in Western blotting is sodium dodecyl sulfate–polyacrylamide gel electrophoresis (SDS-PAGE), followed by protein transfer on a membrane and subsequent detection with specific antibodies. Because the SDS-PAGE is conducted in denaturing conditions, proteins migrate according to their molecular weight, irrespective of secondary/tertiary structure, charge or protein–protein interactions. Consequently, the size of the protein can be determined as smaller proteins migrate faster than their larger counterparts.
The predicted molecular weight of a protein can be easily determined, for example using one of a range of tools available, including free online tools such as ExPASy, to calculate the sum of the molecular weights of all amino acids comprising that protein. However, the calculated molecular weight is invariably different from that observed on the Western blot. Here we summarise the most common reasons for why this may occur.
Post-translational modifications 
1. Glycosylation and glycanation
Most proteins that are synthesised on ribosomes associated with the endoplasmic reticulum undergo glycosylation, where sugar moieties are covalently attached to the polypeptide chain. The two most common types of glycosylation in eukaryotes are N-linked glycosylation (to asparagine), and O-linked glycosylation (to serine and threonine). Extensive glycosylation increases molecular weight, slowing protein migration on a Western blot, but is not accounted for in a molecular weight calculation based on protein sequence (Figure 1 & 2).
Enzymatic de-glycosylation is an experimental technique commonly used to verify whether a studied protein is glycosylated. Prior to Western blotting, the protein sample is incubated with an enzyme that is able to remove part or full glycan chains. Protein species from the digested sample are then compared with the undigested sample, and any observed shift in molecular weight indicates protein glycosylation. One commonly used enzyme is PNGase F, which removes N-linked glycans by cleaving the bond between the innermost N-Acetylglucosamine of the glycan chain and the asparagine residue.
Proteoglycans are a special case group of glycoproteins. These extracellular matrix proteins have long, unbranched glycosaminoglycan chains, covalently attached to the amino peptide chain core. Usually, the molecular weight of the sugar group is even larger than the protein component.
2. Phosphorylation
One of the most common post-translational modifications is phosphorylation. Taking place on serine, threonine, and tyrosine residues, phosphorylation is catalysed by phosphatases, regulating protein function, enzymatic activity, protein–protein interactions, and protein localisation. Although the addition of a single phosphoryl group adds just +/- 1 kDa to the molecular weight, which is often beyond the resolution of standard SDS-PAGE, phosphorylation at multiple sites can lead to more noticeable molecular weight changes (Figure 3).
3. Ubiquitination
Ubiquitin is a small (+/-8.6 kDa) protein expressed across almost all tissue types, which covalently binds to lysine, cysteine, serine, threonine, or directly to the protein N-terminus through an enzymatic reaction catalysed by a three-enzyme cascade (E1, E2, and E3). The enzyme cascade provides substrate specificity and activation, conjugation, and ligation steps. Proteins may be mono-ubiquitinated, or additional ubiquitin molecules may bind to the initial ubiquitin molecule, causing poly-ubiquitination. 
Ubiquitination can mark proteins for degradation and is also important for cellular signaling, the internalisation of membrane proteins, and the development and regulation of transcription. Ubiquitin can be removed from proteins by deubiquitinating enzymes, which then lowers the molecular weight (Figure 4). 
Protein complexes
As Western blotting SDS-PAGE is performed in denaturing conditions, most protein complexes that are composed of proteins linked via non-covalent bonds disassociate during sample preparation and electrophoresis, with the component proteins then running as monomers. However, some proteins remain partially or fully present in homo- or hetero-meric complexes, even in the presence of SDS and β-Mercaptoethanol. In these cases the observed molecular weight can be substantially higher than the predicted, calculated monomeric form (Figure 5). Some proteins, especially transmembrane proteins and proteins with hydrophobic domains, can aggregate during cell lysis as they are released from their native protein complexes and lipid membranes. These aggregates have high molecular weights and may not represent interactions that occur in their native states. 
Protein isoforms
Many proteins encoded by a single gene exist in more than one sequence variant, or protein isoform, due to alternative splicing during mRNA maturation. This can result in additional protein-coding sequences and higher molecular weight protein products, or proteins of lower molecular weight owing to premature stop codons. In addition, some proteins have multiple translation start sites, which give rise to isoforms with different N-termini. Protein isoforms can have differing half-life and subcellular localisation, may interact with diverse subsets of proteins, form distinctive protein complexes, and may have altered, even opposite, functions.
Technical obstacles 
Antibody cross-reactivity
It is possible for the selected antibody to recognise not only its target protein, but also to cross-react non-specifically with other proteins in the analysed sample. Protocol optimisation and the implementation of an appropriate controls panel can help to minimize these issues. 
Suggested controls may include:
Positive controls 
- purified target protein 
- lysate from a cell line known to express the target protein 
- lysate from a cell line overexpressing the target protein
Negative controls 
- lysates from cell lines with lower expression of the target protein
- lysates from cell lines with the target protein knocked down (e.g., by siRNA or shRNA) or knocked out (e.g., by CRISPR)
Experimental optimisation can be achieved by adjusting one or a combination of the following:
extraction buffers (e.g., RIPA buffer)
blocking buffers (e.g., 5% skimmed milk or BSA)
incubation and washing times (e.g., overnight at 4C or 1.5h at room temperature (RT)) 
secondary antibodies used for detection (e.g., dilution factor)
membrane type (nitrocellulose vs. PVDF).
Non-specific proteolytic cleavage and protein degradation
Proteins can undergo non-specific proteolytic digestion if the protein sample is not handled correctly. Proteases released during cell lysis or tissue extraction can then cause protein fragmentation, resulting in smaller fragments of lower molecular weights being run on the Western blot. Some proteins are more susceptible to degradation than others, and consequently the choice of cell/tissue lysis buffers and lysis conditions, along with supplementation with protease inhibitors, are vital for efficient protein extraction.
This article has described the most commonly encountered reasons for observed discrepancies in the molecular weight of examined proteins. Identifying these will not only help with analysis of Western blotting results, but can also provide valuable insight into protein function, and physiology of studied biological processes.
For more information visit the Proteintech website.




Twitter Icon © Setform Limited