In your project, you need to analyze one or more biomolecular datasets and report your results.
Type of project:
Disease Project: Pick a disease (or a biological phenomena) and perform meta-analysis of multiple datasets (from different studies/publications/experiments) about that disease.
Method Project: Pick a step of expression data analysis and investigate the options/parameters for that step.
Each project will be done by 2-3 people. You can utilize the student survey and project spreadsheets to find team-mates.
In addition to the analysis results, you are expected to provide self-contained programming code that can be used by others to replicated your results.
Include a README.md file (see markdown syntax) that lists the the files in your project with a short description of what each file is.
Include a script named project.m (or project.ipynb, or project.py, or project.sh, or project.php, etc. depending on the programming language you used) that when executed, is able to generate most/all of the results/use-cases you present in your project. Any additional files required to run project.m should be included in your project folder.
Your code should include text/comments describing the analysis and discussing the results you have obtained. Figures should be properly formatted (axis labels, plot legends).
Include any auxiliary functions you wrote for the project as separate files. Your project should be self-contained. i.e., if we move your sourcecode folder to a different computer, we should still be able run main script without making any changes to your code.
If the programming language you use allows for generation of reports, publish your main script file and produce project.pdf. Use Matlab's publish'ing ability to generate the pdf. If you are using Jupyter in python, print the webpage to pdf. If the programming language you use does not allow generation of pdf reports, you can create a report as a Word document.
Don't implement your project as one giant script or function. Identify parts of your code that can be implemented as separate, self-contained, testable functions. As much as your programming language allows, use a separate file for each function.
If you are getting data from the web, get it on-demand within your code (i.e., download if it has not been downloaded before). Store these files in a temporary folder or in a data folder. If you are working in Matlab, python, or php, use the bmes.tempdir() or bmes.datadir() functions provided by the instructor. In other programming languages, implement similar functions.
If you are using data files, don't use full-paths to access them. Others downloading your project files should be able to execute regardless of where they place the project files. Make as few assumptions as possible regarding the file system your code is running and specify these assumptions in your README file.
Keep large data files (e.g., image files, GEO files) in a separate location on your computer than the project files. Data files that individually or collectively exceed 10MB are considered large.
If you are using a database to store information, include database initialization code. When your program is executed, create and populate any missing database tables.
Be mindful of what information is “leaking”. E.g., do not include any journal and conference papers that are not in the public domain (instead, provide a URL for such publications).
Total project size limit: Your project folder should not next exceed 10MB in size. Reducing image resolution (without significantly/noticeably reducing quality), removing reproduceable data, software, or temporary files (all of which you should not have included in your project folder in the first place) are strategies you can use to reduce the size of your project.
Create your presentation in a file named presentation.pptx. If you use Google Docs, export it as presentation.pptx. If you are using a slideshow editor that cannot export as a powerpoint file, then export it as presentation.pdf.
Title page [0% (but -5% if you are missing it), 0min]
Title, your name(s) (last names are recommended/optional).
If your project is based on a publication, include its reference.
Introduction - Problem description, Motivation [5%, 2min]:
Introduction - Biology/Physiology [15%, 3min]
Introduction - Goals [15%, 2min]
What do you/authors hope to find/accomplish with this study?
If successful, how will your findings/result influence our understanding or medical practice?
The Dataset(s) [20%, 2min]: (this may be included within your Introduction, if appropriate)
Methods [20%, 4min]
Give an overview of your analysis workflow. What statistical tests / visualation / analysis methods did you use?
Briefly describe the methods. If you used the methods we covered in class, you can skip their description. Describe any modifications you made or options you had to set to make these methods work for your specific dataset.
Mention how these methods were implemented. Did you implement them yourself? Which programming library/tool/modules did you utilize?. If we covered it in the course, you do not need to go into detail.
Results [30%, 5min]
Show your main findings/results. Use figures (e.g., bar charts) instead of tables to present your results.
If there are reports (publications, webpages, etc) that have used the same dataset, you must compare your results to theirs (mention what methods they used).
Discussion [10%, 1min]
Do your results make sense biologically? Find studies that support your findings. (E.g., you found 10 genes in your Alzheimer's dataset analysis, check literature to see if these genes are known for their involvement in Alzheimer's).
What are the limitations of your study?
What follow up studies can be performed to validate/improve upon your findings?
Submit a 4 to 6-page report.docx file. Use the template file to write your report. The template file describes the sections you need to include in your report.
If you use Google Docs or other editor to write your report, please export your report as a Word document.
The break-down of the grading of the report will be similar to the presentation.
The report should contain sufficient details (types of statistical tests, program parameters, thresholds, etc.) to replicate your results. If listing them is prohibitive within text, include them in an auxiliary table at the end of your report.
Not all of these guidelines may be applicable to your project.
You must normalize data appropriately, if it is not already normalized.
When reporting numbers (e.g., p vlaues or fold changes), report no more than three significant digits. e.g., rather than 9.4579, report 9.46. Rather than 0.000014782, report 1.5e-5. This does not apply to data files (e.g., Excel files), where you must provide precise values and should not round your numbers.
If others (e.g., a paper that your project is based on) have produced results you can compare to, provide side-by-side comparison of your results whereever applicable. E.g., if you are reporting significant genes or pathways in a table, also include in the same table their results as separate column(s). You need to find the right level of detail for depicting your comparisons.
Don't over-use numerical results. Prefer to present the results graphically (using bar graphs, Venn diagrams, heatmaps, PCA plots etc.).
Create a file named index.yml and enter your project's information (title/author/abstract). The projects may become publicly available. For author information, your first name(s) is required; last name(s) is recommended/optinal. When entering abstract in multiple lines, place two spaces in each of the lines after the first. See below for contents of an example index.yml file.
- index.yml
title: Pathway based functional analysis of IBD patients using metagenomic analysis of human gut microbiome
author: Dhruv Sakalley and Jason Knecht
abstract: Each of the additional lines of the abstract should be indented with 2 spaces.
The abstract should be ASCII-only, e.g., if you copy-paste from Word, replace smart-quotes
with regular quotes. Replace non-ASCII special characters and symbols with ASCII characters.
Human Gut Microbiome contains collection of diverse species which help carry
out various functions for the proper functioning of the human body. However, IBD
is a condition where the diversity of this microbiome is significantly altered.
Little is known about the causes and effects of these variations in the present
literature. The next generation sequencing techniques provide suitable data for
Metagenomic analysis leading to identification of uncluttered microorganisms, and
make it possible to get detailed functional insights into the functional footprint
of these altered microorganisms. This study uses of KEGG pathways for mapping functionality
of the diverse gene sets in order to better understand function level changes caused
due to the altered microbiome in case of IBD.
Create an image file thumb.png that can be used as a cover image for your project.
|