Lesson 6 - Miscellany
Learning objectives
Concepts - After completing this lesson, students will be able to:
- Compare and contrast plain text and binary file formats
- Select the best data structure for different types of data
Skills - After completing this lesson, students will be able to:
- Download files from a url from the command line
- Read files line-by-line and perform actions on each line.
Assignments - This lesson is complete when students have:
- Read Chapter 9 and Chapter 14 of Think Julia.
- Run all code examples from Lesson 6 on their own computers
- Cloned the Assignment 6 repository with github classroom.
- Completed assignment 6 with all tests passing.
Beyond Base
- Statistics
All of the functionality we've used thus far has been found in the main julia
library, called Base
. But one of the great things about programming languages, especially open-source ones like julia
, is that many people have written a lot of other functionality and shared it with the world in "packages."
This functionality is not available out-of-the-box, but you can easily bring this world of additional functions and types into your own code. Some commonly used packages are installed along with julia (these are called the "standard library" or "stdlib"), and others can be installed using the "package manager."
For now, we'll just deal with the packages in stdlib
.
Bringing in other functions
Julia was designed for numerical computing, but there are a many functions commonly used in statistics that are not available when you first start up julia. For example, it is often useful to calculate the mean, median, and standard deviation of numbers in a vector.
julia> v = rand(100); # create vector of random numbers
julia> mean(v)
ERROR: UndefVarError: mean not defined
julia> median(v)
ERROR: UndefVarError: median not defined
Write a function
my_mean()
that calculates the mean of numbers in aVector
. Remember, the mean of a series of numbers is the sum of those numbers divided by the number of values.Hint: What function tells you how many numbers are in a
Vector
?Hint2: Remember that the
sum()
function can tell you the sum of the numbers in a vector.What is the mean of
v
, defined above?
Defining my_mean()
isn't so hard, but does your function also work on Tuples? What about a matrix (a 2-dimensional array) like m = [1 2; 1 2; 4 5]
? And what about my_median()
or my_standard_deviation()
? You could probably figure out how to define those functions given what you've learned so far, but lots of other people have probably needed this functionality before, and the code is already written and tested.
In this case, the functionality is part of the stdlib in the Statistics
package. To load the functionality into your own code, we use the keyword using
:
julia> using Statistics
julia> mean(v)
0.4837000036302401
julia> median(v)
0.5008004372076055
julia> std(v)
0.30745641858709283
How does the result form
mean()
? fromStatistics
compare tomy_mean()
?Take a look at the code for the mean function defined in
Statistics
- it's about 150 lines of code!This function does a lot more than
my_mean()
, and is probably more efficient. Looking at the docstring formean()
(remember, you can see it by doing?mean
at the REPL), Can you find one thingmean()
can do thatmy_mean()
can't?
Beyond Base
- other libraries
In the stdlib, there are also packages for working with dates, generating random numbers, and for doing linear algebra, among others, but a huge amount of additional functionality can be found in the broader package ecosystem.
Installing these packages takes just a little bit of extra effort, and we'll learn about that in future lessons, but it's important to realize that these packages are just like the code you've been writing this whole time. In fact - the lessons you've already completed are packages!
Practice, practice, practice
Both chapters from Think Julia this week are practical exercises, but contain a lot of concepts that will be very helpful as we start to more earnestly deal with biological data. Several of the assignment questions are based on the examples in the book. Try to solve the problems in the book before looking at the answers, as struggling through these practice problems will help when it comes time to do the assignment.
Good luck!