Wellesley College - BISC189 Essential Skills for Computational Biology

Lesson 9 - Types and methods

lesson9date assignment8link assignment8due

Learning Objectives

Concepts - After completing this lesson, students will be able to:

Skills - After completing this lesson, students will be able to:

Tasks - This lesson is complete when students have:

Types can be scary (but don't have to be)

The first programming language I learned was python, which has a very lax relationship with types. When I first started learning julia, seeing types everywhere freaked me out a little bit.

One bit of good news is that, most of the time, you don't need to worry too much about types to write julia code. As you've seen, beyond knowing that things like split() only work on strings, or seeing a MethodError and needing to know what that means, mostly you can write functions without specifying argument types and you can get a lot done with built-in types.

But knowing a little bit about types can lead to simpler and clearer code and can help you debug problems more easily. But if it's not immediately obvious how to use types, or the syntax is a bit clumsy for a bit, that's fine. It's useful to know this tool exists, but you don't need to use it all the time.

Functions vs Methods

As you've seen, a function is an action performed on data. Most functions have names, like length() or gc_content(), But when you call a function, what is actually executed is a specific "method" of that function. That is, a version of the function that depends on the types of its arguments.

When you first define a function, it only has one method (the one you just defined).

julia> function somefunc(x)
           println("Fallback method!")
       end
somefunc (generic function with 1 method)

But in julia, the same function name can refer to many methods, with different argument types, and even different numbers of arguments.

julia> function somefunc(x::Number)
           println("Number method!!")
       end
somefunc (generic function with 2 methods)

julia> function somefunc(x::AbstractString)
           println("AbstractString method!!")
       end
somefunc (generic function with 3 methods)

julia> function somefunc(x,y)
           println("Two argument method!!")
       end
somefunc (generic function with 4 methods)

julia> somefunc(2.3)
Number method!!

julia> somefunc("woo!")
AbstractString method!!

julia> somefunc([])
Fallback method!

julia> somefunc(1.0, "hey!")
Two argument method!!

One can even define a method that calls another method of the same function! For example, we can write a complement() function that works on Char:

julia> function complement(base::Char)
           base = uppercase(base)
           comps = Dict('A' => 'T',
                        'C' => 'G',
                        'G' => 'C',
                        'T' => 'A',
                        'N' => 'N')
           return comps[base]
       end
complement (generic function with 1 method)

And then another function that works on Strings, that maps the complement(::Char) method onto the String.

julia> function complement(seq::AbstractString)
           map(complement, seq)
       end
complement (generic function with 2 methods)

julia> complement("ATTGC")
"TAACG"

This works because map on a String applies the function to each element of the String, which are Chars.

Some functions have a ton of methods - you can see them using the methods() function:

julia> methods(complement)
# 2 methods for generic function "complement":
[1] complement(base::Char) in Main at REPL[5]:2
[2] complement(seq::AbstractString) in Main at REPL[3]:2
"Checking Question"

How many methods does + have?

Writing your own types

Sometimes, the best way to to solve a problem is to make a new type. For example, when you parsed FASTA files in Assignments 7 and 8, you were keeping headers and sequences separate - this could lead to problems trying to keep them in sync later when you try to work with them.

Further, most of you solved that assignment by keeping a bunch of extra vectors around that stored intermediate sequences, and had to deal with special-casing the first and last sequence. That works, but it's a lot to keep track of.

Compare that approach to the following:

mutable struct FastaRecord
    header
    sequence
end

# these are "accessor" functions - they're not strictly necessary
function header(fr::FastaRecord)
    return fr.header
end

function sequence(fr::FastaRecord)
    return fr.sequence
end

## Note: Simple functions like those above can be written with shortened syntax:
# header(fr::FastaRecord) = fr.header
# sequence(fr::FastaRecord) = fr.sequence

# sequence! updates the sequence field
function sequence!(fr::FastaRecord, seq::AbstractString)
    fr.sequence = seq
end

function parse_fasta(path)
    records = FastaRecord[]
    for line in eachline(path)
        if startswith(line, '>')
            # if the line is a header, we push! a new record with an empty sequence to the `records` vector
            header = line[2:end]
            push!(records, FastaRecord(header, ""))
        else
            # otherwise, we add the line onto the end of the sequence
            record = records[end]
            newseq = sequence(record) * line
            sequence!(record, newseq)
        end
    end
    return records
end
julia> ex1 = parse_fasta("/Users/ksb/repos/courses/assignment07/data/ex1.fasta")
2-element Array{FastaRecord,1}:
 FastaRecord("ex1.1 | easy", "AATTATAGC")
 FastaRecord("ex1.2 | multiline", "CGCCCCCCAGTCGGATT")

We can also write functions like length and gc_content that work on our FastaRecord type.

Base.length(fr::FastaRecord) = length(sequence(fr))

# assuming you've already written `gc_content()` that works on `String`s
gc_content(fr::FastaRecord) = gc_content(sequence(fr))
"Checking Question"

If multiple methods work for a particular function call, how does julia decide which one to use?

Eg, if I write

julia> function foo(x::Number, y::Number)
           println("first method")
       end
foo (generic function with 1 method)

julia> function foo(x::Float64, y::Number)
           println("second method")
       end
foo (generic function with 2 methods)

julia> function foo(x::Float64, y::Float64)
           println("third method")
       end
foo (generic function with 3 methods)

Which method is called when I run foo(1.0, 1)? What about foo(42, 1.0)? Try to answer the question before running the code, then check to see if you're right.