Bioconductor is an open source software for Bioinformatics. Its is an incredible resource for the conduction of statistical analyses on data conducted from wet-lab experiments. On their webpage you can find information about the tools and packages they have developed. You will most probably use one of their libraries or vignettes in the future if you stick to using R. Check it out!
Following a Vignette:
When conducting analyses, there are only a few situations in which you would need to build a script from scratch. Most of the time, you will find yourself following a vignette. A vignette is essentially a documented description of a problem that has been solved. i.e. if you wanted to learn how to study the differential expression of genes, you would try to find a tried and tested vignette that does just that. The input would vary as you would have your own data, but the overall step-by-step would be quite similar.
Let’s look at a vignette and try to follow along. The vignette aims to conduct RNA Seq Analysis using the glimma, limma and edgeR packages.
In the vignette, the differential expressions of genes from the mouse genome are tested. Vignettes are designed to be adaptable to new data. For my analysis, I wanted to perform a similar analysis for the human genome. Specifically, I wanted to test the differential expression of genes of patients diagnosed with brain cancer. I chose 10 male patients and 10 female patients as my two groups. I sourced my data from the Cancer Genome Atlas.
You can find the link to the data I used here: Link in a file called DEM.
The vignette itself is quite comprehensive, so I won’t be giving too many explanations on this page. To see my adaptation of the vignette, you can refer to the following google docs page: Link
- In my adaption, I deviated slightly from the existing model when I came across some roadblocks. For example, the heatmap 2 function was unable to host the volume of data I had. I chose to use a new heatmap function,
heatmap.plus. I played around with the arguments of the original heatmap function and modified according to the new requirements. I was able to find a suitable version through a few iterations. The vignette is a template. Modify areas to better suit your data.
- I didn’t always understand the operations being done. By reading the documentation, and using the help of my trusty friend, google, I was able to gain a high level understanding of what was happening in each step. This is alright!
- When you don’t totally understand a step, the vignette acts as a sanity check. I find myself referring to the vignette along the way to make sure my results resemble the original for the most part.