Genomics

Model
Digital Document
Publisher
Florida Atlantic University
Description
The goal of this project is to gain access to valuable genetic information that will be utilized to create a genetics-based selective breeding program. This selective breeding program will be implemented to produce and maintain a healthy and diverse breeding stock of the Florida Pompano Trachinotus carolinus. The Florida Pompano is a popular food fish found in abundance off Florida's east and west coasts. There has been interest in the aquaculture of this species for decades with minimal success. With recent improvements in aquaculture systems and nutrition, now is the time to bring this fish to commercialization. The main research objectives of this study are to create a bioinformatics workflow to generate a draft whole-genome of the Florida Pompano, identify variation sites within this genome, and run a comparative analysis with two closely related Trachinotus species, Permit T. falcatus and Palometa T. goodei. These two species were chosen because they are found in the same environment as the Florida Pompano but grow to substantially different sizes. To sequence and assemble the whole genome of the Florida Pompano, a hybrid method was applied using long and short-read sequencing technologies. The draft genome was found to be 733.5 Mb in length with a total of 26,891 protein-coding genes. Sites of variation within this assembled genome were identified using a 2b-RAD sequencing method on 62 individuals collected off Florida’s east and gulf coasts.
Model
Digital Document
Publisher
Florida Atlantic University
Description
Mining the human genome for therapeutic target(s) discovery promises novel outcome. Over half of the proteins in the human genome however, remain uncharacterized. These proteins offer a potential for new target(s) discovery for diverse diseases. Additional targets for cancer diagnosis and therapy are urgently needed to help move away from the cytotoxic era to a targeted therapy approach. Bioinformatics and proteomics approaches can be used to characterize novel sequences in the genome database to infer putative function. The hypothesis that the amino acid motifs and proteins domains of the uncharacterized proteins can be used as a starting point to predict putative function of these proteins provided the framework for the research discussed in this dissertation.
Model
Digital Document
Publisher
Florida Atlantic University
Description
Recently Dr. Narayanan's laboratory, utilizing bioinformatics approaches, identified a novel gene which may play a role in colon cancer. This gene in view of its expression specificity was termed Colon Carcinoma Related Gene (CCRG). The CCRG belongs to a novel class of secreted molecules with a unique cysteine rich motif. The function of CCRG however, remains unknown. The basis of this project revolved around establishing the putative function (functional genomics) of CCRG. The rationale for the project was to test a hypothesis that CCRG may offer a growth advantage to cancer cells. The availability of diverse tumor-derived cell lines, which were CCRG negative offered a possibility to study the consequence of enforced expression of CCRG. A breast carcinoma cell line was transfected with an exogenous CCRG expression vector and the stable clones were characterized. The stable transfectants of CCRG showed enhanced growth and a partial abrogation of serum growth factor(s) requirement. These results provide a framework for future experiments to further elucidate the function of CCRG.
Model
Digital Document
Publisher
Florida Atlantic University
Description
After the sequencing of many complete genomes, we are in a post-genomic era in which the most important task has changed from gathering genetic information to organizing the mass of data as well as under standing how components interact with each other. The former is usually undertaking using bioinformatics methods, while the latter task is generally termed proteomics. Success in both parts demands correct statistical significance assignments for results found. In my dissertation. I study two concrete examples: global sequence alignment statistics and peptide sequencing/identification using mass spectrometry. High-performance liquid chromatography coupled to a mass spectrometer (HPLC/MS/MS), enabling peptide identifications and thus protein identifications, has become the tool of choice in large-scale proteomics experiments. Peptide identification is usually done by database searches methods. The lack of robust statistical significance assignment among current methods motivated the development of a novel de novo algorithm, RAId, whose score statistics then provide statistical significance for high scoring peptides found in our custom, enzyme-digested peptide library. The ease of incorporating post-translation modifications is another important feature of RAId. To organize the massive protein/DNA data accumulated, biologists often cluster proteins according to their similarity via tools such as sequence alignment. Homologous proteins share similar domains. To assess the similarity of two domains usually requires alignment from head to toe, ie. a global alignment. A good alignment score statistics with an appropriate null model enable us to distinguish the biologically meaningful similarity from chance similarity. There has been much progress in local alignment statistics, which characterize score statistics when alignments tend to appear as a short segment of the whole sequence. For global alignment, which is useful in domain alignment, there is still much room for exploration/improvement. Here we present a variant of the direct polymer problem in random media (DPRM) to study the score distribution of global alignment. We have demonstrate that upon proper transformation the score statistics can be characterized by Tracy-Widom distributions, which correspond to the distributions for the largest eigenvalue of various ensembles of random matrices.
Model
Digital Document
Publisher
Florida Atlantic University
Description
This research is concerned with analyzing a set of viral genomes to elucidate the underlying characteristics and determine the information-theoretic aspects of the genomic signatures. The goal of this study thereof, is tailored to address the following: (i) Reviewing various methods available to deduce the features and characteristics of genomic sequences of organisms in general, and particularly focusing on the genomes pertinent to viruses; (ii) applying the concepts of information-theoretics (entropy principles) to analyze genomic sequences; (iii) envisaging various aspects of biothermodynamic energetics so as to determine the framework and architecture that decide the stability and patterns of the subsequences in a genome; (iv) evaluating the genomic details using spectral-domain techniques; (v) studying fuzzy considerations to ascertain the overlapping details in genomic sequences; (vi) determining the common subsequences among various strains of a virus by logistically regressing the data obtained via entropic, energetics and spectral-domain exercises; (vii) differentiating informational profiles of coding and non-coding regions in a DNA sequence to locate aberrant (cryptic) attributes evolved as a result of mutational changes and (viii) finding the signatures of CDS of genomes of viral strains toward rationally conceiving plausible designs of vaccines. Commensurate with the topics indicated above, necessary simulations are proposed and computational exercises are performed (with MatLabTM R2009b and other software as needed). Extensive data gathered from open-literature are used thereof and, simulation results are verified. Lastly, results are discussed, inferences are made and open-questions are identified for future research.
Model
Digital Document
Publisher
Florida Atlantic University
Description
In this thesis, we propose to discover co-regulated genes using microarray expression data, as well as providing visualization functionalities for domain experts to study relationships among discovered co-regulated genes. To discover co-regulated genes, we first use existing gene selection methods to select a small portion of genes which are relevant to the target diseases, on which we build an ordered similarity matrix by using nearest neighbor based similarity assessment criteria. We then apply a threshold based clustering algorithm named Spectral Clustering to the matrix to obtain a number of clusters. The genes which are clustered together in one cluster represent a group of co-regulated genes and to visualize them, we use Java Swings as the tool and develop a visualization platform which provides functionalities for domain experts to study relationships between different groups of co-regulated genes; study internal structures within each group of genes, and investigate details of each individual gene and of course for gene function prediction. Results are analyzed based on microarray expression datasets collected from brain tumor, lung cancers and leukemia samples.
Model
Digital Document
Publisher
Florida Atlantic University
Description
Most tumors contain multiple karyotypes due to genomic instability gained through chromosomal segregational defects. The variability of genomic changes within a population makes it difficult to study specific processes without the existence of confounding mutations. My project is to create a model system for observation of mitotic defects, specifically multipolar spindles, in a normal cell line, where the genome is intact. Induction of centrosome amplification is required for formation of multipolar spindles. Treatments with colcemid showed a 10% increase in abnormal centrosome numbers over control. However, treatment with hydroxyurea and transfection of hMPSl showed little increase. Extra centrosomes are insufficient to drive multipolarity, therefore, I am using siRNA-mediated knockdown of Nek2 or HSET to decluster the extra centrosomes. Successful declustering will preferably show an increase in multipolar frequency, allowing us to study the formation and resolution of these structres to better understand how they contribute to aneuploidy and tumor progression.