Wellesley College - BISC189 Essential Skills for Computational Biology

Lecture 8 - Finishing alignments

lesson8date Lecture8 slides

Continuing from Lab07

Alignment tracing for NW and SW

  • start from M(i,j)M_{(i,j)} where ii and jj are

    • the last indices in 1st and 2nd dimension for NW

    • the indices for the matrix with the maximum score in SW

  • Check the score from

    1. M(i,j1)M_{(i,j-1)} (cell to the left), a gap score

    2. M(i1,j)M_{(i-1,j)} (cell above), a gap score

    3. M(i1,j1)M_{(i-1,j-1)} (cell from diagonal), a match or mismatch

  • If any match your current cell, push correct characters to alignments

    1. push gap to seq1, character at jj to seq2

    2. push character at ii to seq1, gap to seq2

    3. push character at ii to seq1, character at jj to =seq2=

  • update indices

Special considerations

  • Be mindful of what happens when you hit the first row or first column

    • i1i-1 or j1j-1 may throw bounds error

  • When should your loop stop?

    • It will be different for Needleman-Wunsch than for Smith-Waterman

A test suite:

@testset "Lab07" begin
    @test nwalign("AATTGGCC", "AATTTGCC") == ("AATTGGCC", "AATTTGCC")
    @test any(aln -> aln == nwalign("AATTGGCC", "AAGGTTCC"), [
            ("AA--TTGGCC", "AAGGTT--CC"),
            ("AATTGGCC", "AAGGTTCC")])
    @test nwalign("AATTGGCC", "AAGGTTCC", gap=-2) == ("AATTGGCC", "AAGGTTCC")
    @test nwalign("AATTGGCC", "AAGGTTCC", mismatch=-2) == ("AA--TTGGCC", "AAGGTT--CC")

    @test swalign("AAT", "AA") == ("AA", "AA")
    @test swalign("AAAAATTGGCCAAAAA", "ATTGGCCA") == ("ATTGGCCA", "ATTGGCCA")
    @test swalign("AAAAATTGGCCAAAAA", "ATTGGCAAA") == ("ATTGGCCAAA", "ATTGGC-AAA")
end