Internship Questions Set Stage 1

To get started with learning bioinformatics, with mentorship and project experience, visit HackBio

Google Doc File: Stage 1 Task

You are to achieve this short story with the command line alone.

Create your copy of the file and enter your command in the terminal space ($) below each action.

Participants who contributed significantly (slack handle alone):

N/B: The story here is fictional and the files are just hypothetical. Please don’t use it for any serious research work.

Please copy exactly what worked. Do not paraphrase. A single mismatch makes you loose your point.

  1. Login to your coding workspace
  2. Create a folder titled your name

$


  1. Create another new directory titled biocomputing and change to that directory with one line of command

$


  1. Download these 3 files:
    1. https://raw.githubusercontent.com/josoga2/dataset-repos/main/wildtype.fna
    2. https://raw.githubusercontent.com/josoga2/dataset-repos/main/wildtype.gbk
    3. https://raw.githubusercontent.com/josoga2/dataset-repos/main/wildtype.gbk

$


  1. OH! You made a mistake. You have to move the .fna file to the folder titled your name directly. (Do this with one command. Hint: See our cheatsheet)

$


  1. OH No! The gbk file is a duplicate, they are actually the same thing. Please delete it.

$


  1. The .fna file is actually from a bacteria, and it should definitely have a TATA (tata) box for initiating gene transcription. The molecular biologist is trying to understand the implication of dual TATA sequences. The files got mixed up and we are not sure which is wildtype and which is mutant. The mutant should have “tatatata” while the normal should have just “tata”. Can you confirm if the file is mutant or wild type

$


  1. If it is mutant, print all the lines that show it is a mutant into a new file

$


  1. What is your favorite gene? (In any organism). Each team member should pick a unique gene different from every other person

$


  1. Download the fasta format of the gene from NCBI Nucleotide

$


  1. How many lines are in the FASTA file (with the exception of the header)

$


  1. How many times does A occur

$


  1. How many times does G occur

$


  1. How many times does C occur

$


  1. How many times does T occur

$


  1. Calculate the %GC content of your gene

$


  1. Create a nucleotide (.fasta) file title your name

$


  1. “echo” the following into the file using »: the number of A, G, T and C in the file you created above.

$


  1. Upload the file to your team’s github repo in a folder called /output

$


  1. Save all the codes you have used in this project in a file named yourname.sh Upload all the codes you have used to your team’s github repo in a folder called /script

$


  1. Clear your terminal space and print all the commands you have used today.

$


  1. List the files in the two folders and share a screenshot of your terminal below

$


  1. Take a screenshot of your terminal screen currently and paste it below

N/B: You need to install and setup your conda environment with either anaconda or miniconda.

Please copy exactly what worked. Do not paraphrase. A single mismatch makes you loose your point.

  1. Activate your base conda environment

$


  1. Create a conda environment names funtools

$


  1. Activate the funtools environment

$


  1. Install Figlet using conda

$


  1. Run the following command figlet {your name}. Put a screenshot of what you see below 😀

$

  1. Install bwa through the bioconda channel

$


  1. Install blast through the bioconda channel

$


  1. Install samtools through the bioconda channel

$


  1. Install bedtools through the bioconda channel

$


  1. Install spades.py through the bioconda channel

$


  1. Install bcftools through the bioconda channel

$


  1. Install fastp through the bioconda channel

$


  1. Install multiqc through the bioconda channel

$


To submit this project, make this document open using the 🔒share icon at the top right corner. Copy the link and submit it on HackBio platform.

Finally, everyone in your team should be ready to discuss your code submission with everyone.

Learning Resources:

The Official learning resource for this internship is HackBio’s Genomics Course. Sign up to enjoy uninterrupted and synchronized flow of bioinformatics knowledge. If you have access to the course already, everything you need for the internship is already provided in the course.

However, we have plans for you if you are unable to purchase the course. We have gathered some resources for you to help you learn and navigate the internship better.

Stage 1


🍿 Subscribe to get notified of news, opportunities, gigs and new roles in the bioinformatics world.