Wellesley College - BISC189 Essential Skills for Computational Biology

Instructions

Hopefully, this process is familiar by now. Click the assignment 3 invitation above, clone the repository, and follow the instructions in src/assignment.jl. Remember, an easier-to-read version of the assignment script can also be viewed at the bottom of this page.

Also, recall that you may run the automated tests on your own computer:

$ julia --project -e 'using Pkg; Pkg.test()'

Using the VS Code julia extension

There are a number of tools that can turn VS code into a more useful development environment. One of them is the julia plugin.

To install plugins, click the "Extensions" button on the left hand side of the VS code application window.

VS code extensions

Then type "julia" and click to install the Julia and Markdown Julia extensions.

julia extensions

Once this is finished, your julia files should have syntax highlighting (making it easier to see different parts of your program), and allowing you to send bits of julia code to an integrated julia REPL.

Note

Just after you install the plugin, it may take a bit of time for the "language" server to get going. If the stuff below doesn't work at first, just wait a couple of minutes.

"Windows Users"

You should also install the "WSL remote" extension. This allows you to access your Ubuntu installation as if it were a remote computer. Once it's installed, press the "Remote Explorer" button on the left toolbar, and you should see your Ubuntu installation. Click the + icon next Ubuntu, and then you can browse and open folders inside linux.

Once the plugin is installed, open src/assignment.jl in your Assignment03 directory from within VS Code. Then, put the cursor on the line with function complement(base), press and hold the alt key, then press return.

A julia REPL will open inside VS Code, and the entire function body should be copied into it and executed. This will allow you to experiment with the code inside the assignment and do incremental development of your functions without needing to run the entire file each time you make a change.

Tip

Sometimes, you can confuse yourself by running things out of order. For example, let's say I write,

function hello(x)
println("Hello, $(x)!")
end

s = "Students"
t = "Tutors"

hello(s)
hello(t)

And then execute the whole thing in VS code. Later on, I decide I don't need to say hello to the Tutors so I delete the line t = "Tutors", but forget to delete hello(t).

As I continue to run the code in the same julia session, there are no problems because, even though I deleted t = "Tutors", there's no way to unassign a - it's still defined. But, if I come back later and try to run the file in a new julia session, I will get an UndefinedVariableError when the program tries to execute hello(t).

Assignment03 code

For each assignment, the contents of the assignment code script will be rendered as html at the bottom of the assignment description page. If you're interested in how that works, check out Literate.jl

Instructions for Assignment03

Introduction

Note: this file is runnable in its current state, but is incomplete. You can run the file from the command line, or use the VS Code julia extension to run individual lines.

Writing real Code

In assignments 1 and 2, variable and function names were often things like question1 and question2. From now on, we'll use more informative function and variable names so that our code is "self documenting."

We'll also continue to use doc strings to help understand the specifications required for our functions.

Understanding DNA sequences

We've already done a fair amount of work in assignment 2 and in lesson 3 to make some functions for understanding DNA sequences. Below, I've put a couple of function signatures with docstrings, but you can (and should!) copy the necessary functionality out of the functions you've already defined, if applicable.

Tip

If you defined those functions in the julia REPL, you can go find them from the command line or the julia REPL! In VS Code, open ~/.julia/logs/repl_history.jl.

Question 1 - a complement function

"""
    complement(base)

Get the DNA complement of the provided base:

    A <-> T
    G <-> C

Accepts `String` or `Char`, but always returns `Char`.
If a valid base is not provided, the function throws an error.

Examples
≡≡≡≡≡≡≡≡≡≡

    julia> complement('A')
    'T'

    julia> complement("G")
    'C'

    julia> complement("T")
    'A'

    julia> complement('C')
    'G'
"""
function complement(base)
    # See Lesson 4 for more info
end

Question 2 - some boolean functions

"""
    ispurine(base)

A boolean function that returns `true` if the base is a purine (A or G)
and `false` if it is not.
The function only supports bases A, C, G, and T (throws an error for other values).
Accepts `String` or `Char`.

Examples
=========

    julia> ispurine('A')
    true

    julia> ispurine("C")
    false

    julia> if ispurine("G")
               println("It's a purine!")
           else
               println("It's a pyrimidine!")
           end
    It's a purine!

    julia> ispurine('B')
    Error: "Base B not supported")
"""
function ispurine(base)
    # We haven't made this before, but you should have all the pieces
end

"""
    ispyrimidine(base)

A boolean function that returns `true` if the base is a pyrimidine (C or T)
and `false` if it is not.
The function only supports bases A, C, G, and T (throws an error for other values).
Accepts `String` or `Char`.

Examples
=========

    julia> ispyrimidine('G')
    false

    julia> ispyrimidine("T")
    true

    julia> if ispyrimidine("G")
               println("It's a pyrimidine!")
           else
               println("It's a purine!")
           end
    It's a purine!

    julia> ispyrimidine('X')
    Error: "Base X not supported"
"""
function ispyrimidine(base)
    # This is the strict opposite of `ispurine`.
    # In principle, you can write this in one line - remember `!` means `NOT`.
    # Eg `isuppercase(x)` means the same thing as `!islowercase(x)`
end

Question 3 - Using boolean functions for composition

For the following function, you should not need to re-write the logic checking what kind of base this is. You've already written it, and it's in a convenient function!

A big part of programming is re-use; if you find yourself writing the same code multiple times, you should probably put it in a function and call that instead!

"""
    base_type(base)

Determines whether a base is a purine (A or G) or pyrimidine (T or C),
and returns a `String`.

Examples
≡≡≡≡≡≡≡≡≡≡

    julia> base_type("G")
    "purine"

    julia> base_type('C')
    "pyrimidine"

    julia> base_type('Z')
    Error: "Base Z not supported"

    julia> x = base_type('A'); println(x)
    purine
"""
function base_type(base)
    # Note: this is different than the `base_type()` we defined in the lesson.
    # Here, we want a fruitful function that returns the value rather than `print`ing it.
    # Also, there's no need to re-write the logic. If your `ispurine` / `ispyrimidine` functions work,
    # you can use them here.
end

Question 4 - Modifying arguments instead of adding a bunch of logic

One thing that none of our functions can do so far is to accept lowercase sequences. Most of the time, DNA sequences are written with uppercase letters, but we may not be able to count on that.

If we want to be able to accept lowercase strings, one possibility would be to add additional logic, Eg

if base == 'G' || base == "G" || base = 'g' || base == "g"

But that's a lot of typing - especially considering we'd have to do this for every base! In programming, it's OK to be lazy (in fact, it's often better)! Instead, we can modify the parameters to be formatted the way we expect. For example, the uppercase() function takes a String or Char and returns the uppercase representation of it.

julia> uppercase('a')
'A': ASCII/Unicode U+0041 (category Lu: Letter, uppercase)

julia> uppercase("attc")
"ATTC"

Note that, if you simply use this function on a variable or parameter, it will remain unchanged (the function doesn't "mutate" its argument), but you can re-assign the variable or parameter.

julia> seq = "attcgc"
"attcgc"

julia> uppercase(seq)
"ATTCGC"

julia> seq
"attcgc"

julia> seq = uppercase(seq)
"ATTCGC"

julia> seq
"ATTCGC"

Some julia functions can mutate their aruments - we'll encounter some of those soon.

"""
    gc_content(sequence)

Calculates the GC ratio of a DNA sequence.
The GC ratio is the total number of G and C bases divided by the total length of the sequence.
For more info about GC content, see here:

Example
≡≡≡≡≡≡≡≡≡≡

    julia> gc_content("AATG")
    0.25

    julia> gc_content("cccggg") * 100
    100.0

    julia> gc_content("ATta")
    0.0
"""
function gc_content(sequence)
    # Start with the same code as `question3()` from assignment 2.
    # only a small modification is necessary to make this work.
end

Question 5 - Incremental development

Now that you've learned how to do this, go back and modify the functions defined in questions 1-3 so that they are able to take lowercase arguments. You should not copy the code below this line to modify it, you should modify it in-place.