Differences

This shows you the differences between two versions of the page.

part9 [2009/05/14 11:49] (current)
nuin created
Line 1: Line 1:
 +==== Functional programming in Python: using map ====
 +
 +First we need to define what is functional programming. Quoting [[http://en.wikipedia.org/wiki/Functional_programming | Wikipedia]]: "is a programming paradigm that treats computation as the evaluation of mathematical functions and avoids state and mutable data. It emphasizes the application of functions, in contrast with the imperative programming style that emphasizes changes in state."
 +
 +Rather complex, eh? Another explanation can be found [[http://www.cs.nott.ac.uk/~gmh/faq.html#general-topics | here]]: "Functional programming is a style of programming that emphasizes the evaluation of expressions, rather than execution of commands. The expressions in these language are formed by using functions to combine basic values. A functional language is a language that supports and encourages programming in a functional style."
 +
 +Basically any functional programming language tries to minimize the code, reducing it, avoiding the use of variables and relying on the use of functions and expressions. Python, clearly, is a not a functional programming language <i>per se</i>, but it has some functions that allow a functional approach. We will see a couple of them in this entry. If you are interested in learning more about functional programming and Python, check Alan Gauld's [[http://www.freenetpages.co.uk/hp/alan.gauld/tutfctnl.htm | tutorial]].
 +
 +There are many ways to use functional programming in bioinformatics. We will start with a very simple example, but first we need to modify the FASTA module we created before. Currently this module only has two functions, one that reads sequences and their names and one that formats their output. We will add a function that reads only the sequences and returns a list containing them. This function is very similar to the //read_fasta// function
 +
 +<code python>def read_seqs(file):
 +    items = []
 +    seq = ''
 +    index = 0
 +    for line in file:
 +        if line.startswith(">"):
 +            if index >= 1:
 +                items.append(seq)
 +                seq = ''
 +            index += 1
 +        else:
 +            seq += line[:-1]
 +
 +    items.append(seq)
 +    return items</code>
 +
 +and it uses an identical approach. Instead of creating an instance of //Fasta// class we only create a simple list, and ignore the sequence title. This will also be useful for a future code reuse, as sometimes we are only interested in the sequence itself and not the name. Might save us some typing and decrease code size depending on the situation.
 +
 +In this entry we will see one Python's functional programming function: //map//. But we will need to use another one too, which is called //lambda//. //lambda// is the function that defines what is called an anonymous function, or a nameless function that can fit in one line. We have seen that functional programming is all about functions, so //lambda// is the aid to implement the small and short functions we use in this type of approach. One example of //lambda// use would be on the calculation of number exponentials, which normally can be achieved like this
 +
 +<code python>def exp(value):
 +    return value**2</code>
 +
 +(exponentials in Python are defined by **)
 +
 +Using //lambda// we can rewrite the above function like
 +
 +<code python>exp = lambda value:value**2</code>
 +
 +with the following syntax:
 +
 +<code> lambda [parameters] : [expression to be used on the parameters] </code>
 +
 +In both cases you can use the functions the same way, by calling //exp(10)// for example. Ok, let's move to //map// then.
 +
 +//map// applies a function, usually a //lambda//, to every item in a list of items. This list can be of any sequence type, but //map// always returns a list. Using the above exponential example, let's say we want to calculate the cube of a series of values. Using //map// and //lambda// we would have something like this
 +
 +<code python>print map(lambda value:value**2, [2,5,12,34,56])
 +[4, 25, 144, 1156, 3136]</code>
 +
 +
 +And that's it. Dissecting this line of code, we have a //lambda// function that due to the use of the //map// function is applied on every item of the list that follows. This list can be anything, previously set or not.
 +
 +So, applying this to a bioinformatics example would be simple (this might not be the most useful or real-world example ever, but it should give us a primer to endeavour in more advanced stuff). Let's say we want to check sequence size for all sequences in a FASTA file. Emphasizing code reuse, we created a new function that returns a list of the actual sequences only. Without functional programming a loop to read the sequence lengths would be very simple
 +
 +<code python>for sequence in seq:
 +    print len(sequence)</code>
 +
 +With a functional programming approach, we can do it in one line (not a large advantage here)
 +
 +<code python>print map(lambda x:len(x), data)</code>
 +
 +Yep, not a lot of gain with this example, as mentioned. The full script would look like
 +
 +<code python>#! /usr/bin/env python
 +
 +import fasta
 +import sys
 +
 +data = fasta.read_seqs(open(sys.argv[1], 'r').readlines())
 +print map(lambda seq:len(seq), data)</code>
 +
 +
 +//data// is our list of sequences, and inside the //map// function we have a //lambda// that checks for the size of each sequence in that list. Quite simple.
 +
 +
 +==== Functional programming in Python: using filter ====
 +
 +This time we check another functional programming function from Python: //filter//. As the name implies, //filter// returns items from a sequence (list, string, etc) that are true to a certain condition defined by the function. The syntax is very similar to //map//
 +
 +<code python>filter (function, sequence)</code>
 +
 +and as //map// it returns a list (except when the sequence is a string or tuple). In the example here we will use //lambda// to define a one-line function. Let's say we want to quickly find the sequences that contain a motif, sequences that are stored in a FASTA file (again, this is a very simple example, just a primer). Of course we can use another[[http://python.genedrift.org/2007/08/28/finding-motifs-iupac-and-regex-an-approach/ | method]], but this time we want to use a functional programming approach. //filter// suits us best here. Again we will reuse the newly created function that returns only the sequences from a FASTA file and we end up with a script that looks like this
 +
 +<code python>#! /usr/bin/env python
 +
 +import fasta
 +import sys
 +
 +sequences = fasta.read_seqs(open(sys.argv[1], 'r').readlines())
 +motif = sys.argv[2]
 +
 +print filter(lambda x:x.find(motif) >= 0, sequences)</code>
 +
 +We skip the part we already seen and check the last line. Basically each item from the list (sequences) is a string and we are applying the //find// method in order to find a motif on a position larger or equal to 0. Notice again, the syntax similar to //map//.
 +
 +
 
part9.txt · Last modified: 2009/05/14 11:49 by nuin
 
Except where otherwise noted, content on this wiki is licensed under the following license:CC Attribution-Noncommercial-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki