The goal of our research is to understand the transcriptome diversity and its significance in human diseases using bioinformatics approach. We have been developing various tools and algorithms to decipher the human genome. Our research is focused on predicting the transcript structure, expression and functions taking alternative splicing into account - typical subjects of genome annotation.
Recently, we developed a novel gene prediction program (ECgene) that combined the genome-based EST clustering and transcript assembly procedures in a coherent fashion. EST clustering itself is similar to the UniGene of NCBI. In addition, ECgene builds the transcript models using graph-theoretic assembly and sub-clusters EST sequences according to transcript variants, thus being one of the state-of-the-art gene predicting programs.
Currently, we are trying to delineate the functional inference of transcript variation. It is essential to investigate systematically the changes in functional domains and in gene expression due to alternative splicing. For example, we developed many unique tools to analyze the EST and SAGE data based on the ECgene prediction. They would play important roles to find the differentially expressed transcripts in different tissue/organ or pathological states. Many of the EC suite of tools are available to public - for example, ECfunction, ECexpress, ECprofiler, ECortholog, ASmodeler, and ASePCR .
Other projects based on ECgene annotation include
(i) prediction of microRNA targets,
(ii) identification of the TFBS (transcription factor binding sites) conserved in multiple species,
(iii) association of SNP and alternative splicing,
(iv) identifying a special type of natural antisense transcripts,
(v) search for fusion mRNA and EST sequences.