The Cadherin Resource - Description of the Classification Method

Introduction to the Cadherin Resource

Our classification methodology

Our analysis methodology

Protein search

Browse database

CLASSIFICATION METHOD

TOPICS
Introduction	Finding Cadherin Proteins
Grouping Cadherins into Sub-families	Us er Interface

INTRODUCTION

In trying to classify cadherins into subfamilies, we need to address two questions.

First, how do we identify a protein in the cadherin superfamily?
Second, how do we group those sequences into sub-families (eg. E-cadherin, N-cadherin and desmosomal cadherins)?

TOP

FINDING CADHERIN PROTEINS

One experimental way of finding cadherins is by probing the genomic library. This method is probably not the best way as the findings depend on selecting the right probe. The region probed may not be conserved by all cadherin related proteins.

As the sequence databases explode in size, it makes more sense to search for cadherins by computational analysis. When doing computational analysis, have some tools at our disposal. The traditional tool is BLAST or FASTA. Both programs do fast pairwise comparisions between the target sequence and every sequence in the sequence database. The highest probability matches are reported. BLAST and FASTA tools are good for finding closely homologous sequences, but it does not find more remote homologous sequence which are often the more interesting sequences.

Another tool is quickly gaining popularity to fill this gap. This tool is called HMM (Hidden Markov Model). HMM's are probabilistic models created from the multiple alignment of sequences belonging to the same class. It is considerably more powerful than a simple BLAST or FASTA search because it incorporates the information from a multiple alignment (ie. conserved regions).

How do we use this technology to help us identify cadherins? All cadherins share a cadherin repeat. A careful multiple alignment of cadherin repeats was created and an HMM of the cadherin repeat was generated. To identify proteins in the cadherin superfamily, we run an HMM on the sequence. If a cadherin repeat domain is detected, then the protein must be in the cadherin superfamily.

TOP

GROUPING CADHERINS INTO SUB-FAMILIES

Now that we have a method for identifying that the protein of interest is in the cadherin superfamily, we would like to further classify into sub-families like E-cadherin (epithelial cadherin), N-cadherin (neural cadherin) and et-cetera. To do this, we can again use the HMM. First, we group the sequences we found in the sequence database that have cadherin repeats by similarity. Then, we verify by inspection that the groups created are from the same class of proteins. We then, perform a multiple alignment of the groups and create a HMM. The generated HMM can be used to sub-classify the sequence.

However, HMM's cannot be created for proteins larger than about 1000 residues because it becomes too computationally expensive. Therefore, for those cases we use heuristic methods instead.

TOP

USER INTERFACE

The interface for the classifying was created as simple as possible. Browse to the classify section of the website and simply enter the protein or DNA sequence. Choosing the DNA search will map the DNA sequence to a protein sequence and then perform the HMM search. A predicted domain layout and sub-family classification will be returned. The HMM search should take under one minute.

TOP