Internship Questions Set Stage 2

2024-05-22 2024-05-22 256 words 2 minutes

To get started with learning bioinformatics, with mentorship and project experience, visit HackBio

Project 3: Run a simple NGS analysis pipeline

In this section, you will implement a simple NGS analysis on a simple dataset

Starting Datasets:

Proposed Pipeline:

Download dataset (wget) => Quality Control (FastQC) => Trimming (FastP) => Genome Mapping (bwa) => Variant Calling (bcftools/freebayes)

Feel free to add software as you prefer.

Let’s get bigger:

Use your pipeline to analyze more datasets

Reference: https://raw.githubusercontent.com/josoga2/yt-dataset/main/dataset/raw_reads/reference.fasta

ACBarrie

Alsen

Baxter

Chara

Drysdale

Submission:

We look forward to receiving your final pipeline script.sh (you can use bash, snakemake, nextflow or any pipeline tool you know how to use).
Alongside, create a setup.sh file that anyone can use to install all the tools needed for making the pipeline work.
Make a requirement.txt file that simply lists all the tools you used
Upload the 3 files to your team’s github repo. Each team member should have a folder and their folder should contain their 3 scripts.
Copy the link to the team’s repo and paste it on HackBio Submission platform
Finally, be ready to discuss your pipeline with everyone

Resources

🍿 Subscribe to get notified of news, opportunities, gigs and new roles in the bioinformatics world.