In this lab, you will begin constructing a sequence aligner based on the Needleman-Wunsch algorithm. This is a classic bioinformatics method, and quite accessible.
We'll be keeping track of all of your labs in a single git repository. Follow the instructions laid out in the Lab 2 bonus, for forking and making a pull request, but use the BISC195Labs
repository.
Here's a summary - more details are available at the link above.
Fork the BISC195Labs
repo
Clone the repo with git clone <url>
Make a new branch with git branch <branchname>
(note: don't include the <
or >
in your branch's name)
Checkout the branch with git checkout <branchname>
(note: you can combine steps 2 and 3 with git checkout -b <branchname>
)
This is going to be a julia package, so add yourself as an author to the Project.toml
. This field is like a julia array, so it should take the form ["Kevin Bonham, PhD", "Your name"]
Push the changes using git push -u origin <branchname>
Open a pull request to the main repo
Please include your name in the PR title.
Like your assignments, this lab repo is structured like a julia package. We'll talk more about the various compenents a lot more as the course progresses; for now, notice that there are some code files found in the src/
directory, including the BISC195Labs.jl
file, which defines the BISC195Labs
module (more on that later), and include
s the file needleman_wunsch.jl
.
The needleman_wunsch.jl
file contains several methods for a function called nwscore()
. The first method, which takes two Char
s as arguments, is currently empty. The other methods take a Char
and a Nothing
(what's going on with the 3rd method??), or two Nothing
s as arguments. Can you infer what they're doing?
There's also a test/
directory that includes a runtests.jl
file, which already contains some unit tests for the nwscore()
function. Based on the unit tests, can you get a sense of what the function is supposed to do?
What happens if you run the tests? A reminder, you can run tests from the command line by doing
$ julia --project -e 'using Pkg; Pkg.test()'
or from the julia REPL by typing ]
to get the Pkg REPL, then running test
.
Notice how many tests pass, how many fail, and how many error.
The first thing to nail down in the NW algorithm is a function that will score two bases, as either a match or a mismatch. At first, just assume you're using the scoring from the original Needleman-Wunsch paper from 1970 - matches should score +1
and mismatches should score -1
.
Add code to the first method that will accept two bases (Char
s) and return the correct score.
Once this is accomplished, you should go from 8 passing tests to 18. Nice! There's still a ways to go though.
In julia, function arguments can be "positional" (determined based on their order in the function call) or "keyword" arguments (an identifier is used in the function call).
Keyword arguments (often called "kwargs") are separated from positional arguments by a ;
.
For example:
julia> function testing_args(pos1, pos2; kwarg1, kwarg2)
@info "Pos 1 arg always comes first" pos1
@info "Pos 2 arg always comes second" pos2
@info "Kwargs can come in any order" kwarg1 kwarg2
end
testing_args (generic function with 1 method)
julia> testing_args(1, 2; kwarg1 = 3, kwarg2 = 4)
┌ Info: Pos 1 arg always comes first
└ pos1 = 1
┌ Info: Pos 2 arg always comes second
└ pos2 = 2
┌ Info: Kwargs can come in any order
│ kwarg1 = 3
└ kwarg2 = 4
julia> testing_args(2, 1; kwarg2 = 4, kwarg1 = 3)
┌ Info: Pos 1 arg always comes first
└ pos1 = 2
┌ Info: Pos 2 arg always comes second
└ pos2 = 1
┌ Info: Kwargs can come in any order
│ kwarg1 = 3
└ kwarg2 = 4
Kwargs are often given default values so that a function will work without them, but can be tuned according to the caller's needs.
We want our nwscore()
function to be able to accept alternative scoring systems. In principle, this could be done with positional arguments, but that would require the user to remember the order that the scores are entered, and potentially make it necessary to include extra arguments, even if only changing one default. Eg, if I wrote the function with positional arguments in the order match
, mismatch
, gap
, then if I wanted to call this with a gap score of -2
, but defaults for match and mismatch, I'd still have to call nwscore(base1, base2, 1, -1, -2)
, whereas with kwargs, I can call nwscore(base1, base2; gap=-2)
.
Rewrite your first method to include keyword arguments for setting the match
and mismatch
score.
Don't forget us actually use those arguments in the body of your function.
We're getting closer! Once this step is finished, we should be up to 28 passing tests.
Repeat the same procedure with the methods for gaps, including a keyword argument for setting the gap score. You'll need to modify both methods - how does the second gap method work?
Woo! The tests should now all pass! But we're not done...
This is all well and good if someone knows ahead of time whether they are scoring a match/mismatch or a gap, but what happens if someone scores a gap, but includes different scores for match
or mismatch
(or vise versa)?
julia> nwscore('A', 'T')
-1
julia> nwscore('A', 'T'; mismatch=-2, match = 3)
-2
julia> nwscore('A', 'T'; mismatch=-2, match = 3, gap=-2)
ERROR: MethodError: no method matching nwscore(::Char, ::Char; mismatch=-2, match=3, gap=-2)
Add a few test cases that deal with different combinations of this situation. They should throw errors as your code currently stands.
Add a gap
keyword argument to the nwscore
method that takes 2 Char
s and match
and mismatch
kwargs to the methods that take a Char
and a Nothing
. Do your tests pass now? If not, did you write the tests incorrectly, or the methods?
The final method defined in the lab template is designed to throw an error if it gets two nothing
values. How do we test that, since @test
only deals with true
or false
(boolean values)?
Julia's testing library has a special macro @test_throws
, to check if an expression throws an error. It takes the form @test_throws <error type> <expr>
, where <error type>
is something that's a subtype of Exception
, and <expr>
is any julia expression.
For example:
julia> using Test
julia> sqrt(-2)
ERROR: DomainError with -2.0:
sqrt will only return a complex result if called with a complex argument. Try sqrt(Complex(x)).
Stacktrace:
[1] throw_complex_domainerror(f::Symbol, x::Float64)
@ Base.Math ./math.jl:33
[2] sqrt
@ ./math.jl:582 [inlined]
[3] sqrt(x::Int64)
@ Base.Math ./math.jl:608
[4] top-level scope
@ REPL[8]:1
julia> @test_throws DomainError sqrt(-2)
Test Passed
Thrown: DomainError
julia> @test_throws BoundsError sqrt(-2)
Test Failed at REPL[10]:1
Expression: sqrt(-2)
Expected: BoundsError
Thrown: DomainError
ERROR: There was an error during testing
Write tests that check to make sure that nwscore(nothing, nothing)
throws the correct error. Be sure to also test the function with different keyword arguments...
In the doc string for nwscore
, write out the new signature, and describe its functionality for someone that's never used it before. You can use the docstrings in your assignments, or the ones from your favorite julia functions as inspiration.
Remember that you can type ?
in the julia REPL to bring up the help REPL
help?> isa
search: isa isascii isapprox isabspath isassigned isabstracttype disable_sigint isnan ispath isvalid ismarked istaskdone istaskfailed istaskstarted isreal
isa(x, type) -> Bool
Determine whether x is of the given type. Can also be used as an infix operator, e.g. x isa type.
Examples
≡≡≡≡≡≡≡≡≡≡
julia> isa(1, Int)
true
julia> isa(1, Matrix)
false
julia> isa(1, Char)
false
julia> isa(1, Number)
true
julia> 1 isa Number
true
You may spread your documentation out accross all of the methods, or do a single docstring that contains information about all methods.
In preparation for completing the Needleman Wunsch alignment, you should start thinking about how it will actually work. Assume that you want to write a function nwalign()
that takes 2 sequences and a scoring system, and returns the best alignment.
Some things to consider:
What is the form of the returned value? Eg. is it a String, a Tuple? Do you return the score as well?
What happens when there's only 1 best alignment? What happens when there's more than 1?
What kinds of inputs are allowed? Are there multiple methods of your function?
There are many things to consider when developing functionality, but thinking about inputs and outputs is generally a good way to start.
Create a new testset in test/runtests.jl
that includes some tests for how you think nwalign()
should work. Also create a skeleton function in src/needleman_wunsch.jl
, eg
function nwalign()
# aligner
end
and add it to the exports in src/BISC195Labs.jl
. Take a look at your assignments if you're unsure how to make a new testset, or how to do exports.
Once you've done this, you should see your new testset with failing tests when you run them - if you get a LoadError
or something else that prevents your tests from running at all, you've probably just got an issue with syntax.
If you set up your pull request correctly in Step 1, you should just be able to git push
, and your pull request will update.
You don't need to wait until you've finished to push your code. Push early and often, I say!