- Develop a working understanding what Bioinformatic data analysis involves, how it is done, and what skills it requires
- Gain an appreciation for how next-generation sequencing data is generated (NGS) and how the information generated is stored
- Learn the major file-types used in bioinformatic data analysis and how to manipulate them
- Learn how to install standard bioinformatic software using Conda
- Understand the concepts of reference genomes and genome annotations and where to find them
- Learn how to leverage the Integrative Genomics Viewer (IGV) for exploring genomics data
- Gain a working knowledge of basic programming in R and how it can be used for Bioinformatics
- Understand the basic principles for statistical learning and inference as they apply to bioinformatics
- Learn how to leverage high performance computing systems (HPCs) to perform Bioinformatic data-analysis
- Navigating the terminal using bash code
- Stringing together bash code to make a more complex command
- Saving complex commands as scripts that can be used later or shared with others
- Basic NGS file types (FASTQ, BAM, BED/bigWig, VCF) their formats and the types of data that stored in each
- Programs used to generate and or manipulate NGS file types
- Visualizing NGS data with IGV
- Introduction to an HPC and how to use it efficiently
- Customizing your HPC environment and installing software
- Introduction to R objects and Functions
- Using R/bioconductor to manipulate genomic data
- R objects and classes built for NGS data
- Software and packages used to manipulate and visualize different types of NGS data
- Utilizing genomic databases with R/bioconductor
- Visualization of genomic data with R
- Statistical modelling applied to NGS data to pull meaningful results out of large datasets
- Spend the time (and money) to plan, consult, and practice bioinformatics to generate high quality data that will provide robust inferences
- If you are going to do a lot of Bioinformatics, you should get really good at the command-line (Bash), otherwise, pre-processing will be slow & painful (the first several times it will be slow and painful)
- Independently Seek out the appropriate level of statistical training for the analyses you want to conduct
- Re-run the code a week or two after the workshop, as this is a great way to consolidate what you have learned at the command-line
- Edit the code, run sub-sections, read the
man
pages for commands, etc. to build a solid understanding of how everything works - Read manuals for the tools that we used today and try to understand the flags that we choose not to show you as well as the flags that we did explain
- When you get an error (and you will) google it and see what you can find, we learn a lot through community forums, mysterious errors are part of coding
- Seek out additional resources for learning to use tools that are of interest to you
- Software carpentry
- edX
- LinkedIn Learning
- R vignettes
We ask that you all complete the survey that has been sent out over email so that we can gauge what worked well and what we need to improve for our next workshop. If you have additional thoughts that were not addressed in the survey, please feel free to contact any one of us, or reach out to the DAC email directly (DataAnalyticsCore@groups.dartmouth.edu).
We plan to offer this workshop again, in addition to two RNA-seq specific workshops:
- Data pre-processing & quality control for RNA-seq
- Differential expression analysis for RNA-seq
If you have suggestions for workshops you would like to see in the future, please let us know!
Please feel free to reach out to us with questions about concepts discussed in the workshop, or for a analysis consultations. Our bioinformatics office hours on Fridays 1-2pm are a great place to do this! (currently on zoom: https://dartmouth.zoom.us/s/96998379866, pword: bioinfo)