Internship Questions Set Stage 2

To get started with learning bioinformatics, with mentorship and project experience, visit HackBio

In this section, you will implement a simple NGS analysis on a simple dataset

Starting Datasets:

Proposed Pipeline:

Download dataset (wget) => Quality Control (FastQC) => Trimming (FastP) => Genome Mapping (bwa) => Variant Calling (bcftools/freebayes)

Feel free to add software as you prefer.

Let’s get bigger:

Use your pipeline to analyze more datasets

Referencehttps://raw.githubusercontent.com/josoga2/yt-dataset/main/dataset/raw_reads/reference.fasta

ACBarrie

Alsen

Baxter

Chara

Drysdale

Submission:

  • We look forward to receiving your final pipeline script.sh (you can use bash, snakemake, nextflow or any pipeline tool you know how to use).
  • Alongside, create a setup.sh file that anyone can use to install all the tools needed for making the pipeline work.
  • Make a requirement.txt file that simply lists all the tools you used
  • Upload the 3 files to your team’s github repo. Each team member should have a folder and their folder should contain their 3 scripts.
  • Copy the link to the team’s repo and paste it on HackBio Submission platform
  • Finally, be ready to discuss your pipeline with everyone

Resources


🍿 Subscribe to get notified of news, opportunities, gigs and new roles in the bioinformatics world.