The Cadherin Resource - Description of the Analysis Method

Introduction to the Cadherin Resource

Our classification methodology

Our analysis methodology

Protein search

Browse database

ANALYSIS METHOD

OPTIONS
Introduction	Paradigm of Bioinformatics
Challenges	Sequence Representations
Structure Representations	Relational Databases
User Interface

INTRODUCTION

In the analysis of the information collected on cadherins, we would like to go further than mere classification and delve into the biological mysteries of cadherins. We would like to be able to answers questions about how cadherins adhere to each, what causes the homophilic specificity, and how regulation occurs. Those are very ambitious goals for mere sequence analysis alone. Though we may not be able answer those very important questions, we may acquire interesting insights on cadherins by building a relational database of information focused on cadherins and continuously updating and data mining that information.

TOP

PARADIGM OF BIOINFORMATICS

The goal of bioinformatics research is to understand biology through computational analysis. Computational analysis begins with genetic information (DNA sequences or protein structure). From the genetic information, we would like to model a structure because the function of a protein is dependent on it's structure. From the structure, we would like to find it's biological function (ie. what substrates the protein reacts with, how it reacts). From biological function, we would like to explain phenotypes (symptoms). This chain of events begins with the sequence of the genes which undoubtedly has tremendous impact on phenotype. All that needs to be done is the proper analysis and all the mysteries will be revealed to us. Therein lies the promise of bioinformatics.

TOP

CHALLENGES

There are many challenges with going from genetic information to phenotype. First, genetic information is redundant because multiple genes many perform the same function. Second, structural information is redundant because different sequences could produce the same structure. Third, single genes may have multiple functions. Fourth, genes are one dimensional but function depends on three dimensional structure.

TOP

SEQUENCE REPRESENTATIONS

The most common analysis done in bioinformatics involves a collection of sequences with common structure/function. The goal of the analysis is to find a pattern in the data that allows us to find the same patterns in unknown sequences. There are four common ways to represent the information in that collection of sequences: sequence concensus, sequence alignments, profiles and hidden markov models (HMM). Each method varies in that degrees of determinism and probability. HMM are the most probabilistic and when the patterns are not obvious, a probabilistic approach is best. HMM are used extensively in our analysis.

TOP

STRUCTURE REPRESENTATIONS

As structural work in cadherin progress, we will have structures. We can model unknown sequences to structures.

TOP

RELATIONAL DATABASES

Before we can do any analysis, we need to have a flexible database system to store and retrieve the raw and processed data. We are continually polling information from various databases and updating our a cadherin specific database.

TOP

USER INTERFACE

The Analysis section of the web site offers an interface to view raw and processed data of the database.

TOP