One
experimental way of finding cadherins is by probing
the genomic library. This method is probably not
the best way as the findings depend on selecting
the right probe. The region probed may not be
conserved by all cadherin related proteins.
As
the sequence databases explode in size, it makes
more sense to search for cadherins by computational
analysis. When doing computational analysis, have
some tools at our disposal. The traditional tool
is BLAST or FASTA. Both programs do fast pairwise
comparisions between the target sequence and every
sequence in the sequence database. The highest
probability matches are reported. BLAST and FASTA
tools are good for finding closely homologous
sequences, but it does not find more remote homologous
sequence which are often the more interesting
sequences.
Another
tool is quickly gaining popularity to fill this
gap. This tool is called HMM (Hidden Markov Model).
HMM's are probabilistic models created from the
multiple alignment of sequences belonging to the
same class. It is considerably more powerful than
a simple BLAST or FASTA search because it incorporates
the information from a multiple alignment (ie.
conserved regions).
How
do we use this technology to help us identify
cadherins? All cadherins share a cadherin repeat.
A careful multiple alignment of cadherin repeats
was created and an HMM of the cadherin repeat
was generated. To identify proteins in the cadherin
superfamily, we run an HMM on the sequence. If
a cadherin repeat domain is detected, then the
protein must be in the cadherin superfamily.
|