Functional programming in Python: using map

First we need to define what is functional programming. Quoting Wikipedia: “is a programming paradigm that treats computation as the evaluation of mathematical functions and avoids state and mutable data. It emphasizes the application of functions, in contrast with the imperative programming style that emphasizes changes in state.”

Rather complex, eh? Another explanation can be found here: “Functional programming is a style of programming that emphasizes the evaluation of expressions, rather than execution of commands. The expressions in these language are formed by using functions to combine basic values. A functional language is a language that supports and encourages programming in a functional style.”

Basically any functional programming language tries to minimize the code, reducing it, avoiding the use of variables and relying on the use of functions and expressions. Python, clearly, is a not a functional programming language <i>per se</i>, but it has some functions that allow a functional approach. We will see a couple of them in this entry. If you are interested in learning more about functional programming and Python, check Alan Gauld's tutorial.

There are many ways to use functional programming in bioinformatics. We will start with a very simple example, but first we need to modify the FASTA module we created before. Currently this module only has two functions, one that reads sequences and their names and one that formats their output. We will add a function that reads only the sequences and returns a list containing them. This function is very similar to the read_fasta function

def read_seqs(file):
    items = []
    seq = ''
    index = 0
    for line in file:
        if line.startswith(">"):
            if index >= 1:
                seq = ''
            index += 1
            seq += line[:-1]
    return items

and it uses an identical approach. Instead of creating an instance of Fasta class we only create a simple list, and ignore the sequence title. This will also be useful for a future code reuse, as sometimes we are only interested in the sequence itself and not the name. Might save us some typing and decrease code size depending on the situation.

In this entry we will see one Python's functional programming function: map. But we will need to use another one too, which is called lambda. lambda is the function that defines what is called an anonymous function, or a nameless function that can fit in one line. We have seen that functional programming is all about functions, so lambda is the aid to implement the small and short functions we use in this type of approach. One example of lambda use would be on the calculation of number exponentials, which normally can be achieved like this

def exp(value):
    return value**2

(exponentials in Python are defined by ) Using lambda we can rewrite the above function like <code python>exp = lambda value:value2</code>

with the following syntax:

 lambda [parameters] : [expression to be used on the parameters] 

In both cases you can use the functions the same way, by calling exp(10) for example. Ok, let's move to map then.

map applies a function, usually a lambda, to every item in a list of items. This list can be of any sequence type, but map always returns a list. Using the above exponential example, let's say we want to calculate the cube of a series of values. Using map and lambda we would have something like this

print map(lambda value:value**2, [2,5,12,34,56])
[4, 25, 144, 1156, 3136]

And that's it. Dissecting this line of code, we have a lambda function that due to the use of the map function is applied on every item of the list that follows. This list can be anything, previously set or not.

So, applying this to a bioinformatics example would be simple (this might not be the most useful or real-world example ever, but it should give us a primer to endeavour in more advanced stuff). Let's say we want to check sequence size for all sequences in a FASTA file. Emphasizing code reuse, we created a new function that returns a list of the actual sequences only. Without functional programming a loop to read the sequence lengths would be very simple

for sequence in seq:
    print len(sequence)

With a functional programming approach, we can do it in one line (not a large advantage here)

print map(lambda x:len(x), data)

Yep, not a lot of gain with this example, as mentioned. The full script would look like

#! /usr/bin/env python
import fasta
import sys
data = fasta.read_seqs(open(sys.argv[1], 'r').readlines())
print map(lambda seq:len(seq), data)

data is our list of sequences, and inside the map function we have a lambda that checks for the size of each sequence in that list. Quite simple.

Functional programming in Python: using filter

This time we check another functional programming function from Python: filter. As the name implies, filter returns items from a sequence (list, string, etc) that are true to a certain condition defined by the function. The syntax is very similar to map

filter (function, sequence)

and as map it returns a list (except when the sequence is a string or tuple). In the example here we will use lambda to define a one-line function. Let's say we want to quickly find the sequences that contain a motif, sequences that are stored in a FASTA file (again, this is a very simple example, just a primer). Of course we can use another method, but this time we want to use a functional programming approach. filter suits us best here. Again we will reuse the newly created function that returns only the sequences from a FASTA file and we end up with a script that looks like this

#! /usr/bin/env python
import fasta
import sys
sequences = fasta.read_seqs(open(sys.argv[1], 'r').readlines())
motif = sys.argv[2]
print filter(lambda x:x.find(motif) >= 0, sequences)

We skip the part we already seen and check the last line. Basically each item from the list (sequences) is a string and we are applying the find method in order to find a motif on a position larger or equal to 0. Notice again, the syntax similar to map.

part9.txt · Last modified: 2009/05/14 11:49 by nuin
Except where otherwise noted, content on this wiki is licensed under the following license:CC Attribution-Noncommercial-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki