Computational analysis of CDH1 missense mutations in the cause of hereditary diffuse gastric cancer
In this present study, we computationally identified the germline missense mutation in the E-cadherin (CDH1) gene causing hereditary diffuse gastric cancer (HDGC). The analysis was initiated with SIFT followed by PolyPhen and I-Mutant2.0 programs with the help of 68 CDH1 variants retrieved from dbSNP. The analysis indicates that 10 variants such as P201R, A298T, E336D, C695R, N751K, Y755C, D768N, G879S, D882N and R169H were commonly found to be less stable and damaging by SIFT, PolyPhen and I-Mutant2.0 programs. Furthermore, SNPs and GO was used to predict the disease related mutations from the protein sequence. Finally, the affinities for the cetuximab with CDH1 variants were examined by using molecular docking algorithm. The result showed that P201R, A298T, E336D and R169H variants were found to be highly significant than the other mutations considered in our analysis. We sincerely hope that these findings certainly helpful for the experimental biologist working in HDGC drug development.
E-Cadherin (CDH1) is cell-cell adhesion molecule localized in adherens junctions. The function involves polarity, cell differentiation, tissue integrity and regulating signal transduction pathways (Qian et al., 2004). The extracellular portion of the protein mediates homophilic cellular interactions, and intracellular part provides a link to the actin cytoskeleton through b-catenins, a multifunctional protein associated with CDH1 gene (Keller et al., 1999).
Loss of function or expression of E-cadherin increases the invasion and metastasis of tumors. It is being called as “Suppression of invasion” gene. It includes dysfunction of cell-cell adhesion, loss of tissue integrity, morphological changes, Loss of heterozygosity (LOH) and increased proliferation. The presence of CDH1 in chromosome 16q22.1 was confirmed by the researchers (Pećina, 2003). The literature evidences available suggest that missense mutation in the CDH1 gene might causes gastric, breast, colorectal, thyroid, endometrium and ovarian cancers (Berx et al., 1998). Moreover, gastric cancer is more predominant than the other cancer types especially caused by the missense mutation in CDH1 gene (Corso et al., 2012). It is the 4th most common cancer and 2nd most cancer death world-wide (Zhang et al., 2006). The germline missense mutations of E-cadherin resulting in E-cadherin inactivation was identified as the supreme importance for the Hereditary Diffuse Gastric Cancer (HDGC) (Kim et al., 2000).
The majority of the families with autosomal dominant gastric cancer susceptibility have HDGC (Oliveira et al., 2004; Brooks et al., 2004; Kaurah et al., 2007). E-cadherin deficiency provides an obvious explanation for the diffuse, scattered growth of HDGC tumours, as the protein is the central component of epithelial cell-to-cell adhesion junctions and as such is required for the integrity of epithelial layers (More et al., 2007). In gastric cancer a series of trials have produced evidence that chemotherapy increases survival. There are so many drugs available for the chemotherapy treatment (Nishiyama and Wada, 2009; Graziano et al., 2004; Zhang and Wang, 2013). However, the cetuximab proved to be effective drug for the treatment of HDGC (Cunningham et al., 2004). E-cadherin expression increases the sensitivity to cetuximab in gastric cancer cell lines. Cetuximab monotherapy has the improved treatment outcome compare to other chemotherapy drugs (Heindl et al., 2012). By stimulating an immune system mediated anti tumour response, cetuximab inhibits cancer-cell proliferation, angiogenic growth factor production and tumor-induced angiogenesis, and cancer cell invasion.
In gastric cancer treatment, cetuximab is over expressed with the target cells (Lordick et al., 2010). The literature evidences indicates that mutation in E-cadherin leads to the improper binding of cetuximab and leads to the cetuximab resistance. Therefore, monitoring the cetuximab resistance is a key area for the treatment of HDGC. This would be certainly helpful for the development of long acting drug molecule. Hence, in this present study, we identified detrimental missense mutations in E-cadherin using different genomic algorithms. Subsequently, the sensitivity of cetuximab with mutated cells was also examined by docking analysis.
Materials and Methods
Investigations of structural and functional conse-quences of coding nsSNPs by computational analysis
The SNP occurring in the protein coding region normally leads to the deleterious consequences in its 3D structure and hence may prone to disease-associated phenomena. In the present study, we used the genomic tools such as SIFT (Ng and Henikoff, 2003), PolyPhen-2 (Ramensky et al., 2002), I-Mutant2.0 (Capriotti et al., 2005) and SNPs & Go (Calabrese et al., 2009) to detect the deleterious coding nsSNPs, and FireDock (Mashiach et al., 2008) to calculate the binding free energy.
Tolerance analysis of missense mutations by SIFT
SIFT (Sorting Intolerant From Tolerant) is a sequence homology based tool available at http://www.blocks.fhcrc.org/sift/SIFT.html. It presumes that important amino acids will be conserved in the protein family. Thus, changes at well conserved positions tend to be predicted as deleterious. We submitted the query in the form of SNP IDs or as protein sequences. The underlying principle of this program is that SIFT takes a query sequence and uses multiple alignment information to predict tolerated and deleterious substitutions for every position of the query sequence. SIFT is a multistep procedure that, given a protein sequence, i) searches for similar sequences, ii) chooses closely related sequences that may share similar functions, iii) obtains the multiple alignment of the chosen sequences, and iv) calculates normalized probabilities for all possible substitutions at each position from the alignment. Substitutions at each position which normalized probabilities less than a chosen cutoff are predicted to be deleterious and those greater than or equal to the cutoff are predicted to be tolerated (Ng and Henikoff, 2003). The cutoff value in the SIFT program is a tolerance index of ≥0.05. The higher the tolerance index, the less function impact a particular amino acid substitution is likely to have.
Prediction by PolyPhen-2
The structural level analysis of coding nsSNPs at is considered to be very important to understand the functional activity of the protein. In the present study, structural level analysis was performed with the aid of PolyPhen-2 (Ramensky et al., 2002), which is available at http://coot.embl.de/Polyphen/. Input options for the PolyPhen-2 program are protein sequence or accession number together with sequence position with two amino acid variants. We submitted the query in the form of protein sequence with mutational position and two amino acid variants. Sequence based characterization of the substitution site, profile analysis of homologous sequences, and mapping of substitution site to a known protein three dimensional structure are the parameters taken into account by the PolyPhen-2 program to calculate the score. It calculates PSIC scores for each of the two variants and then computes the difference between them. The higher the PSIC score difference, the higher is the possible functional impact of a particular amino acid substitution.
Stability analysis with I-Mutant2.0
I-Mutant2.0 is a support vector machine (SVM) based tool for the automatic prediction of protein stability changes caused by single point mutations. The predictions were performed starting either from the protein structure or, more importantly, from the protein sequence (Capriotti et al., 2005). The output files show the predicted free energy change value (ΔΔG), which was calculated from the unfolding Gibbs free energy value of the mutated protein minus the unfolding Gibbs free energy value of the native protein (kcal/mol). Positive ΔΔG values meant that the mutated protein has higher stability and negative values are the indication of lesser stability.
Prediction of disease related mutations using SNPs & GO
Furthermore, we have used Single Nucleotide Polymorphism Database & Gene Ontology are support vector machine based accurate methods used to predict the disease related mutations from protein sequences with a scoring accuracy of 82% and Matthews correlation coefficient of 0.63 (Calabrese et al., 2009). The FASTA sequence of whole protein is considered to be an input option and output will be the prediction results based on the discrimination among disease related and neutral variations of protein sequence. The Reliability Index higher than 5 reveals the disease related effect of mutation on the parent protein function.
Homology modelling and RMSD analysis
The sequence version of the human E-cadherin protein was retrieved from Swiss-prot (http://www.expasy.ch/sprot/). Then a BLAST (http://www.ncbi.nlm.nih.gov/blast/) sequence analysis was performed against the whole PDB to select the template that could be used to generate the model of E-cadherin. Subsequently, the three dimensional structure of the model for the E-cadherin was generated by the homology modeling software from the Swissmodel work-space (http://swissmodel.expasy.org/). Furthermore, the mutated model structure was generated by means of SwissPDB viewer. We used conjugate gradient method for optimizing the 3D structures. The deviation between the two structures was evaluated by their Root Mean Square Deviation (RMSD) analysis.
Molecular docking studies
The FireDock (Mashiach et al., 2008), Fast Interaction REfinement in molecular docking algorithm was used to calculate the binding affinity of cetuximab with E-cadherin. This program is available at http://www.bioinfo3d.cs.tau.ac.il/FireDock/index.html. The method simultaneously targets the problem of flexibility and scoring of solutions produced by fast rigid-body docking algorithms. Given a set of up to 1000 potential docking candidates, FireDock refines and scores them according to an energy function, spending about 3.5 seconds per candidate solution. The candidate solutions for FireDock can be generated by rigid-body docking methods, such as PatchDock, FFT-based methods such as ZDOCK, GRAMM-X, Hex, ClusPro etc. The output is a table of ranked global energy values. The refined complex structure is generated for up to 100 low-energy candidates.
Result and Discussion
There are 68 missense mutations, namely M203V, R29Q, V153I, T211P, W483S, N155S, R162W, R28W, V138M, A80T, R169H, M177T, K69T, C306R, E336D, N674Y, P597S, R224H, K182N, T395A, M282I, G879S, D805N, K184I, R224C, D882N, D443N, E410K, Y755C, A401D, N666H, E702K, S543F, N622S, P201R, I535V, K381N, E864K, A298T, V132I, V392I, D768N, P30T, A788V, A408V, A709S, A634V, S838G, L711V, T340A, V574F, R124H, D676N, T506A, V242I, T340M, D498E, V473I, D72N, V832M, L478P, A592T, E880K, I393N, N751K, A617T, C695R, L630V were examined in this work retrieved from dbSNP (Smigielski et al., 2000).
The mutations were independently submitted into SIFT program to check its tolerance index (Ng and Henikoff, 2003). Among the 68 variants, 24 variants found to be deleterious having the tolerance index score of ≤0.05. The result is shown in Table I. We observer that, Out of 24 variants 8 variants were having highly deleterious tolerance score 0. Six variants were having tolerance index score of 0.01, four variants were having tolerance index score of 0.02, one variant had a tolerance index score of 0.03, four variants were having tolerance index score of 0.04 and one variant had a tolerance index score of 0.05.
Table I: List of variants that were predicted to be functionally significant by SIFT, PolyPhen, I-Mutant 2.0
|Amino acid change||Tolerance index||PSIC SD||Prediction||DDG (kcal/mol)|
Protein sequence with mutational position and amino acid variants associated with 68 single point mutants, used in this work were submitted as input to the PolyPhen program (Ramensky et al., 2002) and results were shown in Table I. A PSIC score difference of 1.5 and above was considered to be damaging. Out of 68 variants 23 variants were considered to be damaging by PolyPhen program. Interestingly 13 variants namely, R162W, R169H, N674Y, E336D, A298T, P201R, C695R, G879S, D882N, Y755C, D768N, I393N and N751K were considered to be damaging by PolyPhen also were seen to be deleterious according to the SIFT program.
To further probe this behaviour, we used I-Mutant 2.0 program for our analysis. This program predicts the stability to the protein structure by means of DDG value. Out of 68 variants, we obtained 53 variants found to be less stable from the I-Mutant 2.0 Program (Capriotti et al., 2005) as shown in Table I. It is interesting to observe that 5 variants showed a DDG value of >-3.0 kcal/mol. The other 7 variants were showed a DDG value of >-2.0 kcal/mol. The other 16 variants showed a DDG value of >-1.0 kcal/mol. The remaining 25 variants showed a DDG value of <-1.0 kcal/mol as depicted in Table I. Out of 53 variants which showed a negative DDG, 4 variants namely, E864K, E702K, E880K and E410K changed their amino acid from negatively charged to positively charged amino acid; 10 variants such as P597S, G879S, A709S, M177T, A80T, A298T, A617T, P30T, A592T and I393N changed from non-polar to polar amino acid; 4 variants, N666H, C306R, N751K and C695R changed from polar to positively charged amino acid; 5 variants, T340M, T340A, T506A, T211P, T395A and S838G changed from polar to non-polar amino acid. S543F and N674Y changed from polar to aromatic amino acid. 2 variants, K184I and K69T changed from positively charged to non-polar amino acid; R29Q, R224H, K182N, R224C and K381N changed from positively charged to polar amino acid; R162W and R28W changed from positively charged to aromatic amino acid. 6 variants, D805N, D882N, D443N, D768N, D676N and D72N changed from negatively charged to polar amino acid; W483S and Y755C changed from aromatic to polar amino acid. Finally, the variants such as A401D, P201R and V574F changed from non-polar to negatively charged, positively charged and aromatic amino acid, respectively. It is also to be noted that M203V, M282I, A408V, A634V, A788V, V242I, V153I, V392I, V132I, V473I, L711V, L630V, L478P, I535V, V138M and V832M variants retained non-polar amino acid, N155S and N622S variants retains polar amino acid, E336D and D498E, variants retained negatively charged amino acid, R169H and R124H, variants retained positively charged amino acid property were found to be less stable by I-Mutant 2.0. Most importantly, 18 variants were considered to be damaging by PolyPhen program were also seen to be deleterious according to I-Mutant2.0 program. The above point portrays that preserving amino acid physico-chemical properties does not necessary result in harmless mutation. Indeed considering only amino acid substitution based on physico-chemical properties could not be able to identify the detrimental effect rather than considering the sequence conservation along with the above said properties could have more advantages and reliable to find out the detrimental effect of missense mutations (Teng et al., 2009).
We rationally considered the 10 potential detrimental point mutations such as E336D, A298T, P201R, C695R, G879S, D882N, Y755C, R169H, D768N and N751K for further course of investigation. They were commonly found to be less stable, deleterious and damaging by the I-Mutant2.0, SIFT and PolyPhen programs, respectively. It was interestingly to note that among these 10 variants, E336D and A298T showed very good agreement with experimental observation performed elsewhere (Berx et al., 1998; Corso et al., 2012; Oliveira et al., 2004; Brooks et al., 2004; Kaurah et al., 2007).
In order to predict the human disease related single point protein mutations we used SNPs and GO program (Calabrese et al., 2009) to predict a particular variant is disease related or neutral. Among the 10 detrimental missense mutations 4 variants namely, E336D, A298T, 201R and R169H found to be diseased and the remaining 6 variants predicted to be neutral by SNPs and GO program. The result is shown in Table II. We observed that, out of 10 variants, 4 variants were aving RI of >5 reveal the disease related effect and the remaining 6 variants were having RI of <5 indicate relatively the neutral effect.
Table II: Disease related prediction by SNPs & GO
|RI<5 predicted to be neutral and RI>5 predicted to be disease|
The four detrimental structure of CDH1 were generated by means of Swiss model program. The mutant structures (E336D, A298T, P201R and R169H) were generated by SwissPDB viewer. The PyMol view of the modelled structures of E-cadherin is shown in Figure 1.
Figure 1: PyMol view of modelled structures (A) native structure (B) mutant (E336D) structure (C) mutant (A298T) structure (D) mutant (P201R) structure (E) mutant (R169H) structure
In order to find out the deviation between the two structures, we superimposed the energy refined native structure with all the energy refined mutant structures to get RMSD. The higher the RMSD value, the more is the deviation between the native and the mutant structure, which in turn changes their binding efficiency with inhibitors due to deviation in 3D space of the binding residues of CDH1 gene. Table III shows the RMSD for native structure with all the mutant modelled structures. The value is of 0.015 Å, 0.105 Å, 3.617Å and 0.028 Å for the E336D, A298T, P201R and R169H structures respectively.
Table III: RMSD for native and mutant structures
Finally, the molecular docking studies were performed to confirm the functional impact of the amino acid mutation. Cetuximab (PDB ID: 1yy8) structure was retrieved from PDB. It is docked with native and mutant (E336D, A298T, P201R and R169H) structures of E-cadherin to understand the binding affinity. Docking was performed using the FireDock program (Mashiach et al., 2008). The result is shown from Figure 2. The analysis indicates the affinity for cetuximab for native CDH1 was found to be -52.3 kcal/mol, whereas with the mutants, the DG was found to be in the ranges –30.0 to -45 kcal/mol. It can be seen from Figure 2, the mutants established lesser binding affinity with cetuximab than the native type protein. These data clearly portray that mutation in the E-cadherin structure leads to the resistance for cetuximab. This is the clear evidence of the deleterious effect of missense mutations such as E336D, A298T, P201R and R169H lesser. Hence, we conclude that these variants should also consider for the design of drug for the treatment of HDGC.
Figure 2:Estimation of free energy of binding, DG, (kcal/mol) using FireDock
The mutations of CDH1 namely E336D and A298T were proved more deleterious effect to the structural stability and its function of the E-catherin. In this work, we also found quite a few other drug-resistant mutations by computational approach. We believe that our observations have critical implications for the understanding of CDH1 associated missense mutations and also for the development of novel therapies for this disease.
The authors would like to thank the School of Biosciences and Technology, VIT University for providing the facilities to carry out this work.
Arnold K, Bordoli L, Kopp J, Schwede T. The SWISS-MODEL Workspace: A web-based environment for protein structure homology modeling. Bioinformatics 2006; 22: 195-201.
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. The protein data bank. Nucleic Acids Res. 2000; 28: 235-42.
Berx G, Becker KF, Höfler H, van Roy F. Mutations of the human E-cadherin (CDH1) gene. Hum Mutat. 1998; 12: 224- 32.
Brooks-Wilson AR, Kaurah P, Suriano G, Leach S, Senz J, Grehan N, Butterfield YSN, Jeyes J, Schinas J, Bacani J, Kelsey M, Ferreira P, MacGillivray B, MacLeod P, Micek M, Ford J, Foulkes W, Australie K, Greenberg C, LaPointe M, Gilpin C, Nikkel S, Gilchrist D, Hughes R, Jackson CE, Monaghan KG, Oliveira MJ, Seruca R, Gallinger S, Caldas C, Huntsman D. Germline E-cadherin mutations in hereditary diffuse gastric cancer: Assessment of 42 new families and review of genetic screening criteria. J Med Genet. 2004; 41: 508-17.
Calabrese R, Capriotti E, Fariselli P, Martelli PL, Casadio R. Functional annotations improve the predictive score of human disease-related mutations in proteins. Hum Mutat. 2009; 30: 1237-44.
Capriotti E, Fariselli P, Casadio R. I-Mutant2.0: Predicting stability changes upon mutation from the protein sequence or structure. Nucleic Acids Res. 2005; 33: W306-10.
Corso G, Marrelli D, Pascale V, Vindigni C, Roviello F. Frequency of CDH1 germline mutations in gastric carcinoma coming from high- and low-risk areas: Metanalysis and systematic review of the literature. BMC Cancer. 2012; 12: 8.
Graziano F, Arduini F, Ruzzo A, Bearzi I, Humar B, More H, Silva R, Muretto P, Guilford P, Testa E, Mari D, Magnani M, Cascinu S. Prognostic analysis of E-cadherin gene promoter hypermethylation in patients with surgically resected, node-positive, diffuse gastric cancer. Clin Cancer Res. 2004; 10: 2784-89.
Heindl S, Eggenstein E, Keller S, Kneissl J, Keller G, Mutze K, Rauser S, Gasteiger G, Drexler I, Hapfelmeier A, Höfler H, Luber B. Relevance of MET activation and genetic altera-tions of KRAS and E-cadherin for Cetuximab sensitivity of gastric cancer cell lines. J Cancer Res Clin Oncol. 2012; 138: 843-58.
Kaurah P, MacMillan A, Boyd N, Senz J, De Luca A, Chun N, Suriano G, Zaor S, Van Manen L, Gilpin C, Nikkel S, Connolly-Wilson M, Weissman S, Rubinstein WS, Sebold C, Greenstein R, Stroop J, Yim D, Panzini B, McKinnon W, Greenblatt M, Wirtzfeld D, Fontaine D, Coit D, Yoon S, Chung D, Lauwers G, Pizzuti A, Vaccaro C, Redal MA, Oliveira C, Tischkowitz M, Olschwang S, Gallinger S, Lynch H, Green J, Ford J, Pharoah P, Fernandez B, Huntsman D. Founder and recurrent CDH1 mutations in families with hereditary diffuse gastric cancer. JAMA. 2007; 297: 2360-72.
Keller G, Vogelsang H, Becker I, Hutter J, Ott K, Candidus S, Grundei T, Becker KF, Mueller J, Siewert JR, Höfler H. Diffuse type gastric and lobular breast carcinoma in a familial gastric cancer patient with an E-Cadherin germline mutation. Am J Pathol. 1999; 155: 337-42.
Kim HC, Wheeler JMD, Kim JC, Ilyas M, Beck NE, Kim BS, Park KC, Bodmer WF. The E-cadherin gene (CDH1) variants T340A and L599V in gastric and colorectal cancer patients in Korea. Gut 2000; 47: 262-67.
Lordick F, Luber B, Lorenzen S, Hegewisch-Becker S, Folprecht G, Wöll E, Decker T, Endlicher E, Röthling N, Schuster T, Keller G, Fend F, Peschel C. Cetuximab plus oxaliplatin/leucovorin/5-fluorouracil in first-line metastatic gastric cancer: A phase II study of the Arbeitsgemeinschaft Internistische Onkologie (AIO). Br J Cancer. 2010; 102: 500-05.
Mashiach E, Schneidman-Duhovny D, Andrusier N, Nussinov R, Wolfson HJ. FireDock: A web server for fast interaction refinement in molecular docking. Nucleic Acids Res. 2008; 36: W229-32.
More H, Humar B, Weber W, Ward R, Christian A, Lintott C, Graziano F, Ruzzo AM, Acosta E, Boman B, Harlan M, Ferreira P, Seruca R, Suriano G, Guilford P. Identification of seven novel germline mutations in the human E-cadherin (CDH1) gene. Hum Mutat. 2007; 28: 203.
Ng PC, Henikoff S. SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res. 2003; 31: 3812-14.
Nishiyama M, Wada S. Docetaxel: Its role in current and future treatments for advanced gastric cancer. Gastric Cancer. 2009; 12: 132-41.
Oliveira C, Suriano G, Ferreira P, Canedo P, Kaurah P, Mateus R, Ferreira A, Ferreira AC, Oliveira MJ, Figueiredo C, Carneiro F, Keller G, Huntsman D, Machado JC, Seruca R. Genetic screening for familial gastric cancer. Hered Cancer Clin Pract. 2004; 2: 51-64.
Pećina-Šlaus N. Tumor suppressor gene E-cadherin and its role in normal and malignant cells. Cancer Cell Int. 2003; 3: 17.
Qian X, Karpova T, Sheppard AM, McNally J, Lowy DR. E-cadherin mediated adhesion inhibits ligand dependent activation of diverse receptor tyrosine kinases. EMBO J. 2004; 23: 1739-48.
Ramensky V, Bork P, Sunyaev S. Human non-synonymous SNPs: Server and survey. Nucleic Acids Res. 2002; 30: 3894–3900.
Smigielski EM, Sirotkin K, Ward M, Sherry ST. dbSNP: A database of single nucleotide polymorphisms. Nucleic Acids Res. 2000; 28: 352-55.
Teng S, Madej T, Panchenko A, Alexov E. Modeling effects of human single nucleotide polymorphisms on protein–protein interactions. Biophys J. 2009; 96: 2178-88.
Zhang Y, Liu X, Fan Y, Ding J, Xu A, Zhou X, Hu X, Zhu M, Zhang X, Li S, Wu J, Cao H, Li J, Wang Y. Germline mutations and polymorphic variants in MMR, E-cadherin and MYH genes associated with familial gastric cancer in Jiangsu of China. Int J Cancer. 2006; 119: 2592-96.
Zhang Y, Wang Q. Sunitinib reverse multidrug resistance in gastric cancer cells by modulating Stat3 and inhibiting P-gp Function. Cell Biochem Biophys. 2013; 67: 575-81.