|
INTRODUCTION |
In
the analysis of the information collected on cadherins,
we would like to go further than mere classification
and delve into the biological mysteries of cadherins.
We would like to be able to answers questions about
how cadherins adhere to each, what causes the homophilic
specificity, and how regulation occurs. Those are
very ambitious goals for mere sequence analysis
alone. Though we may not be able answer those very
important questions, we may acquire interesting
insights on cadherins by building a relational database
of information focused on cadherins and continuously
updating and data mining that information. |
|
PARADIGM
OF BIOINFORMATICS |
The
goal of bioinformatics research is to understand
biology through computational analysis. Computational
analysis begins with genetic information (DNA
sequences or protein structure). From the genetic
information, we would like to model a structure
because the function of a protein is dependent
on it's structure. From the structure, we would
like to find it's biological function (ie. what
substrates the protein reacts with, how it reacts).
From biological function, we would like to explain
phenotypes (symptoms). This chain of events begins
with the sequence of the genes which undoubtedly
has tremendous impact on phenotype. All that needs
to be done is the proper analysis and all the
mysteries will be revealed to us. Therein lies
the promise of bioinformatics.
|
|
CHALLENGES |
There
are many challenges with going from genetic information
to phenotype. First, genetic information is redundant
because multiple genes many perform the same function.
Second, structural information is redundant because
different sequences could produce the same structure.
Third, single genes may have multiple functions.
Fourth, genes are one dimensional but function
depends on three dimensional structure.
|
|
SEQUENCE
REPRESENTATIONS |
The
most common analysis done in bioinformatics involves
a collection of sequences with common structure/function.
The goal of the analysis is to find a pattern
in the data that allows us to find the same patterns
in unknown sequences. There are four common ways
to represent the information in that collection
of sequences: sequence concensus, sequence alignments,
profiles and hidden markov models (HMM). Each
method varies in that degrees of determinism and
probability. HMM are the most probabilistic and
when the patterns are not obvious, a probabilistic
approach is best. HMM are used extensively in
our analysis.
|
|
STRUCTURE
REPRESENTATIONS |
As
structural work in cadherin progress, we will
have structures. We can model unknown sequences
to structures.
|
|
RELATIONAL
DATABASES |
Before
we can do any analysis, we need to have a flexible
database system to store and retrieve the raw
and processed data. We are continually polling
information from various databases and updating
our a cadherin specific database.
|
|
USER
INTERFACE |
|
The
Analysis section of the web site offers an interface
to view raw and processed data of the database.
|
TOP
|