Variables and Loops

The Unix Shell

Christopher Sifuentes

Storing and Using Values – Variables

Values can be temporarily stored into items called variables.

This is very useful in looping and scripting, particularly when we may not know or be able to keep track of values.

Storing and Using Values – Variables

Interestingly, we use diffent syntax when assigning/unsetting and using variables.

  • setting variables – use variable=value

  • using variables – use $variable

  • unsetting variables – use unset variable

    #create a variable named file_type and assign it a value of fastq
    file_type="fastq"
    
    #call the file_type variable, print it to the screen
    echo "the value after setting:" $file_type
    
    #unset (or remove) the variable assignment
    unset file_type
    
    #check for the value of file_type
    echo "the value after unsetting:" $file_type 
    the value after setting: fastq
    the value after unsetting:

Storing and Using Values – Variables

Tips and Tricks with Variables

  • When setting, no spaces around the =variable = value will not do what we want.
  • Using quotes when calling variables prevents weird issues – command "$variable" prevents issues when variable values have spaces, etc.
  • Command output can be stored using $()variable=$(command x) stores the output of command x as variable.
  • Suffixes can be added by using ${variable}"${file_type}1" from above would be fastq1

Variables – Checking Understanding

Q&A: Which of the following correctly assigns the value of fastq to a variable named file_suffix?

  1. fastq=$file_suffix
  2. fastq = $file_suffix
  3. fastq=file_suffix
  4. file_suffix=fastq
  5. file_suffix=$fastq

Variables – Checking Understanding

Q&A: Which of the following correctly assigns the value of fastq to a variable named file_suffix?

Answer

  1. fastq=$file_suffix – No. Refers to a variable that doesn’t exist and wrong order.
  2. fastq = $file_suffix – No. The added space tries to call a command named fastq. Also, this is the wrong order.
  3. fastq=file_suffix – No. This is the wrong order and would create a variable called fastq.
  4. file_suffix=fastq – Yes.
  5. file_suffix=$fastq – No. Refers to a variable that doesn’t exist.

Q&A: Which of the following correctly assigns the value of trt to a variable named var1?

  1. var1=${trt}
  2. var1 =trt
  3. var1=trt
  4. var1=$trt
  5. var1="trt"

Q&A: Which of the following correctly assigns the value of trt to a variable named var1?

Answer

  1. var1=${trt} – No. Refers to variable that doesn’t exist.
  2. var1 =trt – No. The added space tries to call a command named var1.
  3. var1=trt – Yes.
  4. var1=$trt – No. Refers to a variable that doesn’t exist.
  5. var1="trt" – Yes.

Q&A: How can I save the value of the directory that I am in, as a variable named start_dir?

Q&A: How can I save the value of the directory that I am in, as a variable named start_dir?

Answer

start_dir="$(pwd)" and start_dir=$(pwd)

Q&A: What would the value of out_var=$"(ls)" be?

Q&A: What would the value of out_var=$"(ls)" be?

Answer

(ls). Why not the command output?

Q&A: What would the final output be after running the following in a terminal?

base1="sampleX"
ext1=.txt
name1=$file1${ext1}

echo "${name1}"

Q&A: What would the final output be after running the following in a terminal?

base1="sampleX"
ext1=.txt
name1=$file1${ext1}

echo "${name1}"

Answer

.txt

The value of name1 begins with $file1, which is not a variable name, so it has no value. The only value assigned comes from ext1, references as ${ext1}.

Q&A: What would the final output be after running the following in a terminal?

base1="sampleX"
ext1=.txt
name1=$base1$"ext1"

echo $name1

Q&A: What would the final output be after running the following in a terminal?

base1="sampleX"
ext1=.txt
name1=$base1$"ext1"

echo $name1

Answer

sampleXext1

The value of name1 begins with $base1, which is correctly referenced and holds the value of "sampleX". This is followed by $"ext1". There is no variable named "ext1", the variable is actually named ext1, which would be referenced by $ext1, or ${ext1}, or "$ext1", or "${ext1}". By having $ before the quotes, we’re really just adding in a string value at the end.

Q&A: What would the final output be after running the following in a terminal?

base1="sampleX"
ext1=.txt
name1=$base1"${ext1}"

echo $name1

Q&A: What would the final output be after running the following in a terminal?

base1="sampleX"
ext1=.txt
name1=$base1"${ext1}"

echo $name1

Answer

sampleX.txt

The value of name1 begins with $base1, which is correctly referenced and holds the value of "sampleX". This is followed by "${ext1}", which is correctly references and holds the value of .txt.

Q&A: What would the final output be after running the following in a terminal?

base1="sampleX"
ext1=.txt
name1=$base1"${ext1}"
unset $name1

echo $name1

Q&A: What would the final output be after running the following in a terminal?

base1="sampleX"
ext1=.txt
name1=$base1"${ext1}"
unset $name1

echo $name1

Answer

sampleX.txt

The value of name1 begins with $base1, which is correctly referenced and holds the value of "sampleX". This is followed by "${ext1}", which is correctly references and holds the value of .txt. To unset, we need to pass the variable name (name1), not a reference to the variable ($name1).

Q&A: What would the final output be after running the following in a terminal?

base1="sampleX"
ext1=.txt
name1=$base1"${ext1}"
unset name1

echo $name1".otherstuff"

Q&A: What would the final output be after running the following in a terminal?

base1="sampleX"
ext1=.txt
name1=$base1"${ext1}"
unset name1

echo $name1".otherstuff"

Answer

.otherstuff

The value of name1 is unset right before we reference it, so it holds not value. The last line, we print $name1, followed by a string ".otherstuff".

Use Case

Let’s see how we can use variables, combined with previous commands/methods in a quick analysis.

Use Case

From the example.gtf file (downloaded and used in the previous lesson), which chromosome has the highest number of genes? What about exons?

The initial structure is below

# cd to ~/Desktop/shell-lesson-data
cd ~/Desktop/shell-lesson-data

# view first few lines of the file
head example.gtf
#!genome-build CNA3
#!genome-version CNA3
#!genome-date 2015-11
#!genome-build-accession GCA_000149245.3
#!genebuild-last-updated 2015-11
1   ena gene    100 5645    .   -   .   gene_id "CNAG_04548"; gene_source "ena"; gene_biotype "protein_coding";
1   ena transcript  100 5645    .   -   .   gene_id "CNAG_04548"; transcript_id "AFR92135"; gene_source "ena"; gene_biotype "protein_coding"; transcript_source "ena"; transcript_biotype "protein_coding";
1   ena exon    5494    5645    .   -   .   gene_id "CNAG_04548"; transcript_id "AFR92135"; exon_number "1"; gene_source "ena"; gene_biotype "protein_coding"; transcript_source "ena"; transcript_biotype "protein_coding"; exon_id "AFR92135-1";
1   ena CDS 5494    5645    .   -   0   gene_id "CNAG_04548"; transcript_id "AFR92135"; exon_number "1"; gene_source "ena"; gene_biotype "protein_coding"; transcript_source "ena"; transcript_biotype "protein_coding"; protein_id "AFR92135"; protein_version "1";
1   ena start_codon 5643    5645    .   -   0   gene_id "CNAG_04548"; transcript_id "AFR92135"; exon_number "1"; gene_source "ena"; gene_biotype "protein_coding"; transcript_source "ena"; transcript_biotype "protein_coding";

Use Case

From last time – we need to remove the leading lines of the file to make it easier to work with, using grep -v '^#', then we can cut the fields that we need, sort and count the total genes with sort | uniq -c | grep 'gene'.

# cd to directory
cd ~/Desktop/shell-lesson-data

# pull out gene biotype totals
cat example.gtf | grep -v '^#' | cut -f1,3 | sort | uniq -c | grep 'gene'
1033 1  gene
 474 10 gene
 663 11 gene
 326 12 gene
 322 13 gene
 417 14 gene
 706 2  gene
 725 3  gene
 503 4  gene
 812 5  gene
 640 6  gene
 641 7  gene
 639 8  gene
 554 9  gene
  42 Mt gene

Use Case

We’re not quite there yet. Let’s capture the output as a variable, named chr_n, to use for later.

Note: We’re introducing awk here, a language the is quite useful in parsing text, to print out the second column $2.

# cd to directory
cd ~/Desktop/shell-lesson-data

# pull out gene biotype totals
# grab the first line
# use awk to print the 2nd column
biotype_gene="gene"
biotype_exon="exon"
chr_n_gene=$(cat example.gtf | grep -v '^#' | cut -f1,3 | sort | uniq -c | grep $biotype_gene | head -n 1 | awk '{print $2;}')
chr_n_exon=$(cat example.gtf | grep -v '^#' | cut -f1,3 | sort | uniq -c | grep $biotype_exon | head -n 1 | awk '{print $2;}')

echo "The chromosome with the most "$biotype_gene" is: "$chr_n_gene
echo "The chromosome with the most "$biotype_exon" is: "$chr_n_exon
The chromosome with the most gene is: 1
The chromosome with the most exon is: 1

The option to capture values and use them in further commands is really evident when we get into loops.

Performing Actions, Repetitively

Loops allow us to perform a command (or set of commands) on each item in a list.

For Loop Syntax

Bash for loops follow a specific syntax.

Figure 1: The syntax of a bash for loop.

Key components of the syntax

  • keywords for, in, do, done – tell bash when portions of the loop are coming
  • item – a variable that holds the value of an item from the list for an iteration of the loop
  • list – a set of items (list or array) to iterate over
  • commands – the command(s) performed with each item in the list or array

Let’s work through an example from ~/Desktop/shell-lesson-data/exercise-data/creatures, printing out the first two lines of each file.

Walking through the 4 lines, line-by-line.

  1. The keyword for tells the computer we are entering a loop.
  2. A variable named filename is created, which is initially empty.
  3. The keyword in tells the computer to create an empty list.
  4. basilisk.dat, minotour.dat, and unicorn.dat are added to the list.
#cd to ~/Desktop/shell-lesson-data/
cd ~/Desktop/shell-lesson-data/exercise-data/creatures

for filename in basilisk.dat minotaur.dat unicorn.dat
do
  head -n 2 $filename
done
  1. The keyword do tells the computer to listen for the following commands perform on each item in the list.
#cd to ~/Desktop/shell-lesson-data/
cd ~/Desktop/shell-lesson-data/exercise-data/creatures

for filename in basilisk.dat minotaur.dat unicorn.dat
do
  head -n 2 $filename
done
  1. The computer the commands to perform on the value held by the variable $filename.
#cd to ~/Desktop/shell-lesson-data/
cd ~/Desktop/shell-lesson-data/exercise-data/creatures

for filename in basilisk.dat minotaur.dat unicorn.dat
do
  head -n 2 $filename
done
  1. The keyword done tells the computer that the loop is over.
#cd to ~/Desktop/shell-lesson-data/
cd ~/Desktop/shell-lesson-data/exercise-data/creatures

for filename in basilisk.dat minotaur.dat unicorn.dat
do
  head -n 2 $filename
done

Iterations

In the example above, there are 3 iterations of the loop. Notice how the value of filename changes with each iteration.

Iteration filename list
1 basilisk.dat basilisk.dat minotaur.dat unicorn.dat
2 minotaur.dat basilisk.dat minotaur.dat unicorn.dat
3 unicorn.dat basilisk.dat minotaur.dat unicorn.dat

Note

The variable could be named anything – in the example above, we can say

for x in basilisk.dat minotaur.dat unicorn.dat instead.

While Loop Syntax

A while loop is another useful type of loop in bash and follows a specific syntax.

Figure 2: The syntax of a bash while loop.

Key components of the syntax

  • keywords while, do, done – tell bash when portions of the loop are coming
  • condition – a condition to be met for the loop to continue (“while true”)
  • commands – the command(s) performed with each item in the list or array

While Loop

Let’s see an example where we print out numbers less than or equal to 7 (-le).

Note: We can increment num by 1 each time by reassigning the value of num, num=$(($num+1)).

num=1

while [ $num -le 7 ]
do
  echo $num" is less than or equal to 7."
  num=$(($num+1))
done
1 is less than or equal to 7.
2 is less than or equal to 7.
3 is less than or equal to 7.
4 is less than or equal to 7.
5 is less than or equal to 7.
6 is less than or equal to 7.
7 is less than or equal to 7.

Using Variables in Loops

Returning to our earlier gtf example, we can now identify the chromosomes with the most of several biotypes with a loop.

# cd to directory
cd ~/Desktop/shell-lesson-data

for bt in gene exon transcript CDS start_codon
do 
 chr_n=$(cat example.gtf | grep -v '^#' | cut -f1,3 | sort | uniq -c | grep $bt | head -n 1 | awk '{print $2;}')
 echo "The chromosome with the most "$bt" is: "$chr_n
done
The chromosome with the most gene is: 1
The chromosome with the most exon is: 1
The chromosome with the most transcript is: 1
The chromosome with the most CDS is: 1
The chromosome with the most start_codon is: 1

Using Variables in Loops

We can take this futher and capture all of the types of biotypes as an array to pass to the loop as a variable.

Note: An item at position x in an array can be accessed via array[x]. In a loop, we use ${array[@]} to access the item.

# cd to directory
cd ~/Desktop/shell-lesson-data

# capture the types of biotypes as an array
btype_array=$(cat example.gtf | grep -v '^#' | cut -f3 | sort | uniq)

for bt in ${btype_array[@]}
do 
 chr_n=$(cat example.gtf | grep -v '^#' | cut -f1,3 | sort | uniq -c | grep $bt | head -n 1 | awk '{print $2;}')
 echo "The chromosome with the most "$bt" is: "$chr_n
done
The chromosome with the most CDS is: 1
The chromosome with the most exon is: 1
The chromosome with the most five_prime_utr is: 1
The chromosome with the most gene is: 1
The chromosome with the most start_codon is: 1
The chromosome with the most stop_codon is: 1
The chromosome with the most three_prime_utr is: 1
The chromosome with the most transcript is: 1

Q&A: Write a loop that would print out the months of the year. Create an array that holds the months.

Q&A: Write a loop that would print out the months of the year. Create an array that holds the months.

Answer

months_array=(january february march april may june july august september october november december)

for month in ${months_array[@]}
do
  echo ${month}
done
january
february
march
april
may
june
july
august
september
october
november
december

Q&A: Look at the following code and output.

$ ls
cubane.pdb  ethane.pdb  methane.pdb octane.pdb  pentane.pdb propane.pdb

What would be the output of the following code?

$ for filename in c*
  do 
    ls $filename
  done

Q&A: Look at the following code and output.

$ ls
cubane.pdb  ethane.pdb  methane.pdb octane.pdb  pentane.pdb propane.pdb

What would be the output of the following code?

$ for filename in c*
  do 
    ls $filename
  done

Answer

cubane.pdb. The list that is iterated over is any file that startes with c.

Hopefully you’ve seen how helpful variables and loops can be. Next, we’ll put things together with bash scripts.