In silico prediction of functional loss of cst3 gene in hereditary cerebral amyloid angiopathy

  • Piyush Choudhary Industrial Biotechnology Division, Bioinformatics Division, School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India.
  • Juhee Singh Industrial Biotechnology Division, Bioinformatics Division, School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India.
  • V. Karthick Industrial Biotechnology Division, Bioinformatics Division, School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India.
  • V. Shanthi Industrial Biotechnology Division, Bioinformatics Division, School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India.
  • R. Rajasekaran Bioinformatics Division, School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India
  • K. Ramanathan Industrial Biotechnology Division, Bioinformatics Division, School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India.
Keywords: Cerebral amyloid angiopathy, CST3 gene, Genomi tool, Hereditary, In silico, Missense mutation
DOI: 10.3329/bjp.v8i4.16524


The computational identification of missense mutation in CST3 (CYSTATIN 3 or CYSTATIN C) gene has been done in the present study. The missense mutations in the CST3 gene will leads to hereditary cerebral amyloid angiopathy  The initiation of the analysis was done with SIFT followed by POLY-PHEN-2 and I-Mutant 2.0 using 24 variants of CST3 gene of Homo sapiens which were derived from dbSNP. The analysis showed that 5 variants (Y60C, C123Y, L19P, Y88C, L94Q) were found to be less stable and damaging by SIFT, POLYPHEN-2 and I-MUTANT2.0. Furthermore the outputs of SNP & GO are collaborated with PHD-SNP (Predictor of Human Deleterious-Single Nucleotide Polymorphism) and PANTHER to predict 5 variants (Y60C, Y88C, C123Y, L19P, and L94Q) having clinical impact in causing the disease.  These findings will be certainly helpful for the present medical practitioners for the treatment of cerebral amyloid angiopathy.


Single nucleotide polymorphisms (SNPs) are the most abundant form of genetics variations in the human genome. Most of the SNPs in the human genome are present in the non-coding DNA consisting of 5’ and 3’ and translated regions (UTR) (Rajasekaran et al., 2007). The dbSNP is used for the same and it is a public domain archive (Sherry et al., 2001).

The gene, CST3, codes for human cystatin C, and has the same organization as the CST1 gene for cystatin SN and the CST2 gene for cystatin SA (Saitoh et al., 1989). It has been found to play a role in brain disorder example amyloid (a specific type of protein deposition) (Goate et al., 1991).

Analysis found that missense mutation in the CST3 gene lead to a condition called hereditary cerebral amyloid angiopathy. This condition is characterised by stock and dementia which begins in mid adulthood. CST3 gene is located from base pair 23, 608, 533 to base pair 23, 618, 684 on chromosome 20 (Saitoh et al., 1989). As far as presence scenario is concerned the discovery of deleterious SNPs is crucial task for pharmacogenomics and pharmacogenetics. We undertook this work basically to perform a computational analysis of CST3 gene consisting of ns SNPs and identification of possible deleterious mutation. Out of the 24 SNPs, the most deleterious SNPs which are significant in causing disease are Y60C, C123Y, L19P, Y88C, and L94Q. These mutations can be a candidate of most concern in the disease hereditary cerebral amyloid angiopathy caused by CST3 gene.

Materials and Methods


db-SNP ( is used to obtain the SNPs and their related protein sequence for CST3 gene of Homo sapiens for the computational analysis. (Arnold et al., 2006). Every SNP consist of an unique ID, reference ID (rsIDs). Complete information about that SNP as well as the amino acid changes, their respective positions and corresponding accessions IDs are obtained by clicking on each rsIDs. Clicking on accessions ID delivers information regarding the protein encoded by the genes. Also we are thankful for the availability of numerous comprehensive and easy to use software packages and web-based services to detect the structures (Kumar et al., 2009).

Sequence homology based method (SIFT)-Analysis of functional effect of point’s mutations

The damaging single amino acid polymorphism detected by the SIFT programme (Ng and Henikoff, 2003). The main concept behind this technique is mainly based on the evolutionary amino acid conservation with in protein families. The more the conserved positions are the more they are intolerant to substitution where as the vice versa is also true. Therefore, the results are deleterious or damaging when the changes occurs at well conserved positions. Protein sequence forms of queries are being submitted. SIFT works by using multiple sequence alignment information on a considered query sequence for the prediction (Capriotti et al., 2005) of tolerated as well as deleterious substitution for each position for the query sequence. The multistep SIFT process consist of a) protein database search for related sequences, b) sequence alignment build-up, c) probability scaling at every position from the alignment. The cut off value of tolerance index for SIFT program >0.5. the tolerance index is inversely proportions to the impact of amino acid substitutions that is higher the tolerance index lesser the impact of substitution and lesser the tolerance index the higher the functional impact of amino acid substitution.

Structure and sequence based method-POLYPHEN2 (polymorphism phenotyping v2)

POLYPHEN2 is a physical and comparison based tool that shows the impact of amino acid substitution on the structure and function of human protein (Ramensky et al., 2002). The input is a protein sequence with mutational positions and two variants of amino acids. This is followed by PSIC scores calculations for both the variants and then the difference between two are computed. The greater the PSIC score difference the higher the functional impact of particular amino acid substitution.

Stability analysis- I-Mutant 2.0

It is a SVM based tools that is i.e, which is support vector machine based tool. I-Mutant2.0 leads to automatic protein stability change prediction which is caused by single point mutation (Capriotti et al., 2005). The initiations were done either by using protein structure or more precisely from the protein sequence. The output is a free energy change value (ΔΔG). Positive ΔΔG value infers that the protein being mutated is of higher stability and vice versa is also true.

SNPs & GO- (disease related mutations predictions)

SNPs tends for Single Nucleotide Polymorphism data base and GO is Gene Ontology. Like I-Mutant2.0 SNPs & GO (Calabrese et al., 2009) is also a support vector machine (SVM) which is based on the method to accurately predict the mutation related to disease from protein sequence. The input is the FASTA sequence of the whole protein, the output is based on the difference among the neutral and disease related variations of the protein sequence. The RI (reliability index) with value of greater than 5 depicts the disease related effect caused by mutation on the function of parent protein. The PHD SNP (Altschul et al., 1997) & PANTHER algorithms were also used in the display of output.

Result and Discussion

There are 24 missense mutation were found namely Y60C, P33L, V75M, V44M, M67K, A129T, Y88C, A72S, D113A, A2S, G30S, T98M, C123Y, T142A, L19P,L94Q, R71S, R71H, V17M, G3R, R96G, G38A, R79H, A25T. These mutations were retrieved from dbSNP (Smigielski et al., 2000).

The mutations were one by one submitted in SIFT program for the tolerance index (Ng and Henikoff, 2003) check. Out of the 24 variants, 8 variants were found to be deleterious with a tolerance index score of >.05. The result has been depicted in Table I. It was observed that 4 out of 8 variants were highly deleterious with a tolerance index score of 0. One variant with a tolerance index of 0.01, one with 0.03, one with 0.04 and one with 0.05.

The POLYPHEN2 program (Ramensky et al., 2002) was used after SIFT with protein sequence having mutational position submitted as inputs, A PSIC score > 0.950 were found to be probably damaging, A PSIC score of >0.5 were found to be possibly damaging and the rest were found to be benign (Table I).

Table I: List of nsSNP predicted as deleterious, damaging and less stable by SIFT, PolyPhen-2 and I-Mutant respectively

rsID AA change Tolerance index PSIC SD Prediction Stability
rs377450166 Y60C 0.03 0.999 Probably damaging Decrease
rs375692362 P33L 0.32 0.051 Benign Decrease
rs373743268 V75M 0 1 Probably damaging Increase
rs373213120 V44M 0.16 0.867 Possibly damaging Decrease
rs373177867 M67K 0.92 0 Benign Decrease
rs371605207 A129T 0.63 0.01 Benign Decrease
rs371124032 Y88C 0 1 Probably damaging Decrease
rs202145575 A72S 0.09 0.37 Benign Decrease
rs201184716 D113A 0.38 0.607 Possibly damaging Decrease
rs200984369 A2S 0 0.939 Possibly damaging Decrease
rs200245337 G30S 0.72 0.01 Benign Decrease
rs200037041 T98M 0.09 0.975 Probably damaging Decrease
rs149051742 C123Y 0.05 1 Probably damaging Decrease
rs141643699 T142A 0.45 0.002 Benign Decrease
rs113550984 L19P 0.01 0.94 Possibly damaging Increase
rs28939068 L94Q 0 0.988 Probably damaging Decrease
rs11542364 R71S 0.44 1 Probably damaging Decrease
rs11542360 R71H 0.1 0.999 Probably damaging Decrease
rs11542359 V17M 0.1 0.63 Possibly damaging Decrease
rs11542357 G3R 0.09 0.002 Benign Decrease
rs11542355 R96G 0.16 1 Probably damaging Decrease
rs11542354 G38A 0.2 0.243 Benign Decrease
rs11542353 R79H 0.04 0.999 Probably damaging Decrease
rs1064039 A25T 0.41 0.003 Benign Decrease

Following the POLYPHEN2 was I-Mutant2 program for the analysis. The program tells about protein structure stability, out of 24 variants 22 variants were found to have less stability (Table I). The transformations that happened in the amino acids as a result of the missense mutations are Y60C (polar amino acid to a polar amino acid), P33L (non-polar amino acid to non-polar  amino acid), V75M (non-polar amino acid to non-polar amino acid), V44M (non-polar amino acid to non-polar amino acid), M67K (non-polar amino acid to polar basic amino acid), A129T (non-polar amino acid to polar amino acid), Y88C (polar amino acid to polar amino  acid), A72S (non-polar amino acid to polar amino acid), D113A (polar acidic amino acid to non-polar amino acid), A2S (non-polar amino acid to polar amino acid), G30S (non-polar amino acid to polar amino acid), T98M(polar amino acid to non-polar amino acid), C123Y (polar amino acid to polar amino acid), T142A (polar amino acid to non-polar amino acid), L19P (non-polar amino acid to non-polar amino acid), L94Q (non-polar amino acid to polar amino acid), R71S (polar basic amino acid to polar amino acid), R71H (polar basic  amino acid to polar basic amino acid), V17M (non-polar amino acid to non-polar amino acid), G3R (non-polar amino acid to polar basic amino acid), R96G (polar basic amino acid to non-polar amino acid), G38A (non-polar amino acid to non-polar amino acid), R79H (polar basic  amino acid to polar basic amino acid), A25T (non-polar amino acid to polar amino acid). It can be said that by preserving the pysico chemical properties of amino acids may not necessarily result in mutations that are harmless.

Out of the 24 variants, 8 variants namely Y60C, C123Y, L19P, R79H, V75M, Y88C,  A2S,  L94Q  were found to be deleterious and damaging by all the three programs that is SIFT, POLPHEN 2   and I-Mutant2.0 (Capriotti et al., 2005). The SNPs and GO server predicted 7 variants as disease causing mutation (Table II), whereas PHD-SNP server predicted 12 variants to be disease related (Table III), and PANTHER predicted 11 variants as disease (Table IV). Finally combining the results of all the programs, 5 variants namely Y60C, Y88C, C123Y, L19P and L94Q were predicted to have functional effect on protein function and stability (Table V), and further these functionally significant variants were superimposed with native structure using PyMol (Figure 1).

Table II: List of nsSNP predicted as disease associated by SNP & GO server

rsID AA change SNP & GO prediction Probability score RI
rs377450166 Y60C disease 0.73 5
rs375692362 P33L neutral 0.048 9
rs373743268 V75M neutral 0.416 2
rs373213120 V44M neutral 0.021 10
rs373177867 M67K neutral 0.145 7
rs371605207 A129T neutral 0.072 9
rs371124032 Y88C disease 0.835 7
rs202145575 A72S neutral 0.107 8
rs201184716 D113A neutral 0.033 9
rs200984369 A2S neutral 0.015 10
rs200245337 G30S neutral 0.037 9
rs200037041 T98M neutral 0.056 9
rs149051742 C123Y disease 0.9 8
rs141643699 T142A neutral 0.012 10
rs113550984 L19P disease 0.537 1
rs28939068 L94Q disease 0.682 4
rs11542364 R71S disease 0.56 1
rs11542360 R71H neutral 0.357 3
rs11542359 V17M neutral 0.054 9
rs11542357 G3R neutral 0.009 10
rs11542355 R96G disease 0.541 1
rs11542354 G38A neutral 0.034 9
rs11542353 R79H neutral 0.253 5
rs1064039 A25T neutral 0.042 9

Table III: List of nsSNP predicted as disease associated by PHD-SNP server

rsID AA change PHD-SNP prediction Probability score RI
rs377450166 Y60C Disease 0.962 9
rs375692362 P33L Neutral 0.195 6
rs373743268 V75M Disease 0.863 7
rs373213120 V44M Neutral 0.148 7
rs373177867 M67K Disease 0.625 3
rs371605207 A129T Neutral 0.348 3
rs371124032 Y88C Disease 0.991 10
rs202145575 A72S Disease 0.509 0
rs201184716 D113A Neutral 0.324 4
rs200984369 A2S Neutral 0.132 7
rs200245337 G30S Neutral 0.341 3
rs200037041 T98M Neutral 0.499 0
rs149051742 C123Y Disease 0.993 10
rs141643699 T142A Neutral 0.03 9
rs113550984 L19P Disease 0.954 9
rs28939068 L94Q Disease 0.897 8
rs11542364 R71S Disease 0.889 8
rs11542360 R71H Disease 0.824 6
rs11542359 V17M Neutral 0.427 1
rs11542357 G3R Neutral 0.044 9
rs11542355 R96G Disease 0.931 9
rs11542354 G38A Neutral 0.365 3
rs11542353 R79H Disease 0.718 4
rs1064039 A25T Neutral 0.254 5

Table IV: List of nsSNP predicted as disease associated by PANTHER server

rsID AA change PANTHER prediction Probability score RI
rs377450166 Y60C Disease 0.975 10
rs375692362 P33L Disease 0.504 0
rs373743268 V75M Disease 0.808 6
rs373213120 V44M Neutral 0.294 4
rs373177867 M67K Neutral 0.371 3
rs371605207 A129T Neutral 0.391 2
rs371124032 Y88C Disease 0.973 9
rs202145575 A72S Neutral 0.365 3
rs201184716 D113A Neutral 0.188 6
rs200984369 A2S Neutral 0.038 9
rs200245337 G30S Neutral 0.112 8
rs200037041 T98M Neutral 0.392 2
rs149051742 C123Y Disease 0.995 10
rs141643699 T142A Neutral 0.189 6
rs113550984 L19P Disease 0.734 5
rs28939068 L94Q Disease 0.859 7
rs11542364 R71S Disease 0.817 6
rs11542360 R71H Disease 0.848 7
rs11542359 V17M Neutral 0.352 3
rs11542357 G3R Neutral 0.099 8
rs11542355 R96G Disease 0.847 7
rs11542354 G38A Neutral 0.23 5
rs11542353 R79H Disease 0.628 3
rs1064039 A25T Neutral 0.294 4

Table V: List of nsSNP predicted as disease associated by SNP & GO, PHD-SNP and PANTHER server

rs377450166 Y60C Disease Disease Disease
rs149051742 C123Y Disease Disease Disease
rs113550984 L19P Disease Disease Disease
rs371124032 Y88C Disease Disease Disease
rs28939068 L94Q Disease Disease Disease

Figure 1: Superimposed view of C60Y (A), Y88C (B), C123Y(C), L19P (D) and L94Q (E) rendered using PyMol


We examined clinically important mutations in CST3 gene by means of different genomic algorithms. We certainly believe that this analysis will have immense importance in clinical management of cerebral amyloid angiopathy.


The authors would like to thank management of VIT University for providing the facilities to carry out this work.


Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997; 25: 3389-402.

Arnold K, Bordoli L, Kopp J, Schwede T. The SWISS-MODEL Workspace: A web-based environment for protein structure homology modeling. Bioinformatics 2006; 22: 195-201.

Calabrese R, Capriotti E, Fariselli P, Martelli PL, Casadio R. Functional annotations improve the predictive score of human disease-related mutations in proteins. Hum Mutat. 2009; 30: 1237-44.

Capriotti E, Fariselli P, Casadio R. I-Mutant2.0: Predicting stability changes upon mutation from the protein sequence or structure. Nucleic Acids Res. 2005; 33: W306-10.

Goate A, Chartier-Harlin MC, Mullan M, Brown J, Crawford F, Fidani L, Giuffra L, Haynes A, Irving N, James L, Mant R, Newton P, Rooke K, Roques P, Talbot C, Pericak-Vance M, Roses A, Williamson R, Rossor M, Owen M, Hardy J. Segregation of a missense mutation in the amyloid precursor protein gene with familial Alzheimer's disease. Nature 1991; 349: 704-06.

Johansson MU, Zoete V, Michielin O, Guex N. Defining and searching for structural motifs using DeepView/Swiss-PdbViewer. BMC Bioinformatics. 2012; 13: 173.

Kumar P, Henikoff S, Ng PC. Predicting the effects of coding non- synonymous variants on protein function using the SIFT algorithm. Nat Protoc. 2009; 4: 1073-81.

Ng PC, Henikoff S. Predicting deleterious amino acid substitutions. Genome Res. 2001; 11: 863-74.

Ng PC, Henikoff S. SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res. 2003; 31: 3812-14.

Rajasekaran R, Sudandiradoss C, Doss CG, Sethumadhavan R. Identification and in silico analysis of functional SNPs of the BRCA1 gene. Genomics 2007; 90: 447-52.

Ramensky V, Bork P, Sunyaev S. Human non-synonymous SNPs: Server and survey. Nucleic Acids Res. 2002; 30: 3894-900.

Saitoh E, Sabatini LM, Eddy RL, Shows TB, Azen EA, Isemura S, Sanada K. The human cystatin C gene (CST3) is a member of the cystatin gene family which is localized on chromosome 20. Biochem Biophys Res Commun. 1989; 162: 1324-31.

Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: The NCBI database of genetic variation. Nucleic Acids Res. 2001; 29: 308-11.


Apply citation style format of Bangladesh Journal of Pharmacology

Research Articles
Financial Support
Conflict of Interest
Authors declare no conflict of interest