====== Effects of Clioquinol on yeast Analyze the microarray dataset made available by the following study: https://www.ncbi.nlm.nih.gov/pubmed/21504115 {{clioquinol.yeast.Li2010.pdf}} The microarray data is available at: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE17257 * Create a Matlab Livescript (*.mlx), Python Jupyter Notebook (*.ipynb), or R notebook (*.Rmd) for this assignment. Use separate sections (code blocks in Jupyter, or "chunks" in R) for each of the items below. Your main analysis should be in the notebook file. You may create additional functions in separate files. * Download and load the necessary files from GEO. 5% of your grade is for optimizing download & parse functions. If you do optimize, describe your optimizations as comments in your main notebook. * **Preliminary Analysis** * Find the highest expression value across the entire dataset. Report the sample id, probe id and the expression value of this highest expression value. If there are multiple of the same highest value you may arbitrarily report one of them. for the lowest expression across the entire dataset. * Repeat for the lowest expression value. * Write a function that replaces GSE probe ids with the gene symbols present in the GPL. When a gene symbol is not available, keep the probe id. Note that the probes may or may not be listed in the same order in the GSE and GPL files. * Apply a normalization method of your choice, so the expression values of samples become comparable. * Show a hierarchical clustering of samples (**not** of genes). (Just a hierarchical clustering (ie a dendrogram) of samples, not a heatmap of expression values.). * Show a clustergram (heatmap, combined with clustering of samples and clustering of genes) of expression values. * Report the top 10 most different genes between the Clioquinol and control groups. (**NOTE2SELF: Ask to use ttest2 without FDR specific pvalue and fold change values to have reproducible results. Make a note that this approach may not be appropriate in the future.**) * (**NOTE2SELF: Ask to export results (gene symbols, fold change, pvalue) of DEGs into an Excel or CSV file in the future.**). * Report the functional annotations (GO Biological Processes and KEGG Pathways) that are significantly different between the two groups. * Discuss whether your results align with the findings reported in the paper.