Major Qualifying Project Ideas
Project: Visual Analysis of Biological Datasets
Professor: Matthew Ward
Solo or group: Multiple projects, solo or group
Description
As the size and diversity of biological datasets continues to expand without bound, there is a growing need for tools tohelp scientists discover interesting patterns and relationships within the data, which can lead to the development of theories on how biology works. The focus of these projects is to examine current techniques for visualizing one or more types of biological data and propose, develop, and evaluate new visualization and interaction mechanisms to assist in the discovery process.
Skills needed
Ideally, at least one member should have taken graphics (CS 4732) or biovisualization (BCB 4002), at least one should have taken HCI (CS 3041), and at least one should have taken BB 1035 (Biotechnology). Other biology courses, such as Genetics, Molecular Biology, and Bioinformatics, would be a plus. Interdisciplinary teams are welcome.
Project: 3-D HP model for protein folding
Professor: Brigitte Servatius
Solo or group: Either
Description
Ken Dill proposed a simple model for protein folding which isnot even completely understood in the plane, but yields good insight about shape and energy of a molecule in the folded state and possible pathways between states. A 3-D model is more realistic and opens new problems, for example about knottedness in the folded state.
Skills needed
Some knowledge about proteins, discrete mathematics, statistics and programming is helpful for this project.
Project: Improvements to Next-Generation Sequencing Base Calling
Professor: Patrick Flaherty
Solo or group: Group
Description
The current pipeline for calling bases from next-generation sequencing platforms yields an error rate as high as 1-2% for some experiments. The goal of this project is to develop a data analysis pipeline for processing next-generation sequencing image data files that improves both the speed and accuracy compared to current methodology. Improving the base calling accuracy will allow us to detect rare mutations in clinical and pooled samples with higher resolution and facilitate the detection of resistant or unique cells in large these populations.
The raw data from a next-generation sequencing experiment is an image that shows the nucleotide incorporated into an extending strand of DNA. A single strand of DNA is represented by a spot on this image and a typical image has millions of spots. If a spot is misidentified due to overlap or low resolution, an error is made in calling the DNA sequence.
This project will be to develop a software data analysis pipeline that improves the accuracy and speed of calling the DNA sequence from the raw images. You may use graphical processing units, cloud-computing or standard computational resources to achieve both speed and accuracy.
Skills needed
Bioinformatics, image analysis, algorithms, statistics, machine learning, python or matlab, high-performance computing
Project: Articular Cartilage Mechanobiology
Professor: Sarah Olson
Solo or group: Group
Description
Articular cartilage is found in diarthrodial joints such the knee and is maintained by a sparsely distributed cell, the chondrocyte. In the knee, cartilage serves as a cushion and distributes loads. Specifically, in this project we are interested in modeling signaling pathways that couple forces on the cellular scale to up/down regulation of specific protein constituents. This will involve the development and simulation of a model consisting of a set of coupled ordinary differential equations. The signaling pathways that we wish to investigate have a large role in osteoarthritis, the degeneration of articular cartilage. Additional Information
Skills needed
Calculus, Differential Equations, Modeling. Familiarity or willingness to learn about cellular signaling pathways and programming (mainly using MATLAB)
Project: Expressed Sequence Analysis of a family of putative C. elegans surface antigen genes
Professor: Samuel Politz
Solo or group: 1-2 students per project
Description
The genome of the nematode C. elegans contains ~300 copies of a 150 bp sequence that encode a conserved amino acid sequence found in surface antigens of the parasitic nematode Toxocara canis. Little is known about how this large family of protein-coding sequences is expressed. This project will explore transcribed sequences in RNA/cDNA sequence databases to understand how/when/where these sequences are expressed.
Skills needed
Will train to work with BLAST searches of DNA sequence databases and multiple sequence alignment programs such as CLUSTAL and PILEUP
Project: Analysis of a novel class of fungal specific transcription factors involved in virulence.
Professor: Reeta Rao
Solo or group: 1 or 2 students maximum
Description
The ZCF proteins (~77) define a novel class of transcription factors that are only found in fungi. Some of these proteins regulate fungal virulence. The questions the MQP would like to address are
- Are there features/motifs that differentiate the subset of ZCF protiens that regulate virulence from other members of the family?
- Can we predict their likely targets?
Skills needed
Computational skills with good understanding of molecular biology.
Project: De Bruijn Graphs and their properties
Professor: Brigitte Servatius
Solo or group: Group
Description
In bioinformatics, De Bruijn graphs are used for de novo assembly of (short) read sequences into a genome. We want to examine the mathematical properties of De Bruijn graphs and match the theory with laboratory experiments.
Skills needed
Graph theory good programming skills are required.
