Following this link may get you blocked.

Project Guidelines and Deliverables

In your project, you need to analyze one or more biomolecular datasets and report your results.

  • See Student projects from previous years for example projects.
    • You must work on a different dataset than the ones explored in those projects. Exceptions may be granted if your project work will be a substantially different one (e.g., using different methods of analysis). You must get permission from the instructor for an exceptions.

Project Recipe (subject to negotiation for your particular project and availability of relevant data)

  • Type of project:
    • Disease Project: Pick a disease (or a biological phenomena) and perform meta-analysis of multiple datasets (from different studies/publications/experiments) about that disease.
    • Method Project: Pick a step of expression data analysis and investigate the options/parameters for that step.
  • Minimum requirements for the data you are using:
    • You may use the same data type (e.g., all microarray) or a mix of different data types (e.g., microarray, rnaseq, protein array)
    • If your team is has 3 students, use data from at least three different studies.
    • If your team is less than 3 students, use data from at least two studies.
    • The combined data should have at least 100 samples.
  • Disease Project Analysis steps (may not be applicable to some Method Projects)
    • You may analyze multiple datasets separately or combine them into a single dataset.
    • Normalize
    • Visualize samples (PCA, Clustering)
    • Find Differentially Expressed Genes (or mirnas or proteins)
    • Perform enrichment analysis
    • Apply machine learning for Classification or Regression
  • Method Project Analysis steps
    • Define the success measures. e.g., which differentially expressed genes are considered the correct ones. If the results can be used for machine learning, you may be able to use classification performance metrics.
    • Scan different analysis options (eg. different normalization methods) or parameter values (e.g., value of k in k-means clustering) and their effect on the success measures.
  • Optional Analysis steps
    • Identify transcription factors that may be responsible for the observed expression changes.
    • Find drug candidates that could reverse the observed expression changes.

Project Teams

Each project will be done by 2-3 people. You can utilize the student survey and project spreadsheets to find team-mates.

Coding Guidelines

  • In addition to the analysis results, you are expected to provide self-contained programming code that can be used by others to replicated your results.
  • Include a README.md file (see markdown syntax) that lists the the files in your project with a short description of what each file is.
    • Provide installation instructions for someone who wants to use run your code on their computer. Provide command lines for installing any required Python packages. Give links to any required software that need to be installed manually.
  • Include a script named project.m (or project.ipynb, or project.py, or project.sh, or project.php, etc. depending on the programming language you used) that when executed, is able to generate most/all of the results/use-cases you present in your project. Any additional files required to run project.m should be included in your project folder.
    • Your code should include text/comments describing the analysis and discussing the results you have obtained. Figures should be properly formatted (axis labels, plot legends).
    • Include any auxiliary functions you wrote for the project as separate files. Your project should be self-contained. i.e., if we move your sourcecode folder to a different computer, we should still be able run main script without making any changes to your code.
  • If the programming language you use allows for generation of reports, publish your main script file and produce project.pdf. Use Matlab's publish'ing ability to generate the pdf. If you are using Jupyter in python, print the webpage to pdf. If the programming language you use does not allow generation of pdf reports, you can create a report as a Word document.
  • Don't implement your project as one giant script or function. Identify parts of your code that can be implemented as separate, self-contained, testable functions. As much as your programming language allows, use a separate file for each function.
  • If you are getting data from the web, get it on-demand within your code (i.e., download if it has not been downloaded before). Store these files in a temporary folder or in a data folder. If you are working in Matlab, python, or php, use the bmes.tempdir() or bmes.datadir() functions provided by the instructor. In other programming languages, implement similar functions.
  • If you are using data files, don't use full-paths to access them. Others downloading your project files should be able to execute regardless of where they place the project files. Make as few assumptions as possible regarding the file system your code is running and specify these assumptions in your README file.
  • Keep large data files (e.g., image files, GEO files) in a separate location on your computer than the project files. Data files that individually or collectively exceed 10MB are considered large.
  • If you are using a database to store information, include database initialization code. When your program is executed, create and populate any missing database tables.
  • Be mindful of what information is “leaking”. E.g., do not include any journal and conference papers that are not in the public domain (instead, provide a URL for such publications).
  • Total project size limit: Your project folder should not next exceed 10MB in size. Reducing image resolution (without significantly/noticeably reducing quality), removing reproduceable data, software, or temporary files (all of which you should not have included in your project folder in the first place) are strategies you can use to reduce the size of your project.

Project Presentation

  • Create your presentation in a file named presentation.pptx. If you use Google Docs, export it as presentation.pptx. If you are using a slideshow editor that cannot export as a powerpoint file, then export it as presentation.pdf.
  • You are to give a 10-15 minute presentation of your project. Your presentation will be graded based on how well you accomplish the following sections.
  • Title page [0% (but -5% if you are missing it), 0min]
    • Title, your name(s) (last names are recommended/optional).
    • If your project is based on a publication, include its reference.
  • Introduction - Problem description, Motivation [5%, 2min]:
    • Why are we studying this problem? What is the biomedical need? Public health stats, if available.
  • Introduction - Biology/Physiology [15%, 3min]
    • Describe the underlying biology/physiology/pathology.
    • Find figures illustrating the system (remember to cite the sources).
  • Introduction - Goals [15%, 2min]
    • What do you/authors hope to find/accomplish with this study?
    • If successful, how will your findings/result influence our understanding or medical practice?
  • The Dataset(s) [20%, 2min]: (this may be included within your Introduction, if appropriate)
    • Describe the experiment(s) that produced the datasets you are analyzing in your project.
    • Describe any pre-processing or normalization that have already been performed.
      • If you had to perform your own pre-processing & normalization, describe them in the Methods.
  • Methods [20%, 4min]
    • Give an overview of your analysis workflow. What statistical tests / visualation / analysis methods did you use?
    • Briefly describe the methods. If you used the methods we covered in class, you can skip their description. Describe any modifications you made or options you had to set to make these methods work for your specific dataset.
      • You can tell us more about the method: if we did not cover the method(s) in class or if you used a variant of the method we covered.
    • Mention how these methods were implemented. Did you implement them yourself? Which programming library/tool/modules did you utilize?. If we covered it in the course, you do not need to go into detail.
  • Results [30%, 5min]
    • Show your main findings/results. Use figures (e.g., bar charts) instead of tables to present your results.
    • If there are reports (publications, webpages, etc) that have used the same dataset, you must compare your results to theirs (mention what methods they used).
  • Discussion [10%, 1min]
    • Do your results make sense biologically? Find studies that support your findings. (E.g., you found 10 genes in your Alzheimer's dataset analysis, check literature to see if these genes are known for their involvement in Alzheimer's).
    • What are the limitations of your study?
    • What follow up studies can be performed to validate/improve upon your findings?

Project Report

  • Submit a 4 to 6-page report.docx file. Use the template file to write your report. The template file describes the sections you need to include in your report.
  • If you use Google Docs or other editor to write your report, please export your report as a Word document.
  • The break-down of the grading of the report will be similar to the presentation.
  • The report should contain sufficient details (types of statistical tests, program parameters, thresholds, etc.) to replicate your results. If listing them is prohibitive within text, include them in an auxiliary table at the end of your report.

Analysis and Reporting Guidelines

Not all of these guidelines may be applicable to your project.

  • You must normalize data appropriately, if it is not already normalized.
  • When reporting numbers (e.g., p vlaues or fold changes), report no more than three significant digits. e.g., rather than 9.4579, report 9.46. Rather than 0.000014782, report 1.5e-5. This does not apply to data files (e.g., Excel files), where you must provide precise values and should not round your numbers.
  • If others (e.g., a paper that your project is based on) have produced results you can compare to, provide side-by-side comparison of your results whereever applicable. E.g., if you are reporting significant genes or pathways in a table, also include in the same table their results as separate column(s). You need to find the right level of detail for depicting your comparisons.
  • Don't over-use numerical results. Prefer to present the results graphically (using bar graphs, Venn diagrams, heatmaps, PCA plots etc.).

index.yml Summary File

Create a file named index.yml and enter your project's information (title/author/abstract). The projects may become publicly available. For author information, your first name(s) is required; last name(s) is recommended/optinal. When entering abstract in multiple lines, place two spaces in each of the lines after the first. See below for contents of an example index.yml file.

index.yml
title: Pathway based functional analysis of IBD patients using metagenomic analysis of human gut microbiome
author: Dhruv Sakalley and Jason Knecht
abstract: Each of the additional lines of the abstract should be indented with 2 spaces.
  The abstract should be ASCII-only, e.g., if you copy-paste from Word, replace smart-quotes
  with regular quotes. Replace non-ASCII special characters and symbols with ASCII characters.
  Human Gut Microbiome contains collection of diverse species which help carry
  out various functions for the proper functioning of the human body. However, IBD
  is a condition where the diversity of this microbiome is significantly altered.
  Little is known about the causes and effects of these variations in the present
  literature. The next generation sequencing techniques provide suitable data for
  Metagenomic analysis leading to identification of uncluttered microorganisms, and
  make it possible to get detailed functional insights into the functional footprint
  of these altered microorganisms. This study uses of KEGG pathways for mapping functionality
  of the diverse gene sets in order to better understand function level changes caused
  due to the altered microbiome in case of IBD.

Project thumbnail

Create an image file thumb.png that can be used as a cover image for your project.