Major Qualifying Project Ideas

Project: Visual Analysis of Biological Datasets

Professor: Matthew Ward

Solo or group: Multiple projects, solo or group

Description

As the size and diversity of biological datasets continues to expand without bound, there is a growing need for tools tohelp scientists discover interesting patterns and relationships within the data, which can lead to the development of theories on how biology works.  The focus of these projects is to examine current techniques for visualizing one or more types of biological data and propose, develop, and evaluate new visualization and interaction mechanisms to assist in the discovery process.

Skills needed

Ideally, at least one member should have taken graphics (CS 4732) or biovisualization (BCB 4002), at least one should have taken HCI (CS 3041), and at least one should have taken BB 1035 (Biotechnology). Other biology courses, such as Genetics, Molecular Biology, and Bioinformatics, would be a plus.  Interdisciplinary teams are welcome.

Project: 3-D HP model for protein folding

Professor: Brigitte Servatius

Solo or group: Either

Description

Ken Dill proposed a simple model for protein folding which isnot even completely understood in the plane, but yields good insight about shape and energy of a molecule in the folded state and possible pathways between states. A 3-D model is more realistic and opens new problems, for example about knottedness in the folded state.

Skills needed

Some knowledge about proteins, discrete mathematics, statistics and programming is helpful for this project.

Project: Improvements to Next-Generation Sequencing Base Calling

Professor: Patrick Flaherty

Solo or group: Group

Description

The current pipeline for calling bases from next-generation sequencing platforms yields an error rate  as high as 1-2% for some experiments. The goal of this project is to develop a data analysis pipeline for processing next-generation sequencing image data files that improves both the speed and accuracy compared to current methodology. Improving the base calling accuracy will allow us to detect rare mutations in clinical and pooled samples with higher resolution and facilitate the detection of resistant or unique cells in large these populations.

The raw data from a next-generation sequencing experiment is an image that shows the nucleotide incorporated into an extending strand of DNA. A single strand of DNA is represented by a spot on this image and a typical image has millions of spots. If a spot is misidentified due to overlap or low resolution, an error is made in calling the DNA sequence.

This project will be to develop a software data analysis pipeline that improves the accuracy and speed of calling the DNA sequence from the raw images. You may use graphical processing units, cloud-computing or standard computational resources to achieve both speed and accuracy.

Skills needed

Bioinformatics, image analysis, algorithms, statistics, machine learning, python or matlab, high-performance computing

Project: Articular Cartilage Mechanobiology

Professor: Sarah Olson

Solo or group: Group

Description

Articular cartilage is found in diarthrodial joints such the knee and is maintained by a sparsely distributed cell, the chondrocyte. In the knee, cartilage serves as a cushion and distributes loads. Specifically, in this project we are interested in modeling signaling pathways that couple forces on the cellular scale to up/down regulation of specific protein constituents. This will involve the development and simulation of a model consisting of a set of coupled ordinary differential equations. The signaling pathways that we wish to investigate have a large role in osteoarthritis, the degeneration of articular cartilage. Additional Information

Skills needed

Calculus, Differential Equations, Modeling. Familiarity or willingness to learn about cellular signaling pathways and programming (mainly using MATLAB)

Project: Expressed Sequence Analysis of a family of putative C. elegans surface antigen genes

Professor: Samuel Politz

Solo or group: 1-2 students per project

Description

The genome of the nematode C. elegans contains ~300 copies of a 150 bp sequence that encode a conserved amino acid sequence found in surface antigens of the parasitic nematode Toxocara canis.  Little is known about how this large family of protein-coding sequences is expressed.  This project will explore transcribed sequences in RNA/cDNA sequence databases to understand how/when/where these sequences are expressed.

Skills needed

Will train to work with BLAST searches of DNA sequence databases and multiple sequence alignment programs such as CLUSTAL and PILEUP

Project: Analysis of a novel class of fungal specific transcription factors involved in virulence.

Professor: Reeta Rao

Solo or group: 1 or 2 students maximum

Description

The ZCF proteins (~77) define a novel class of transcription factors that are only found in fungi.  Some of these proteins regulate fungal virulence.  The questions the MQP would like to address are

  1. Are there features/motifs that differentiate the subset of ZCF protiens that regulate virulence from other members of the family?
  2. Can we predict their likely targets?

Skills needed

Computational skills with good understanding of molecular biology.

Project: De Bruijn Graphs and their properties

Professor: Brigitte Servatius

Solo or group: Group

Description

In bioinformatics, De Bruijn graphs are used for de novo assembly of (short) read sequences into a genome. We want to examine the mathematical properties of De Bruijn graphs and match the theory with laboratory experiments.

Skills needed

Graph theory good programming skills are required.

 
  • Email a Friend
  • Bookmark this Page
  • Share this Page