This shows you the differences between two versions of the page.

part13 [2009/05/20 16:55]
part13 [2009/05/20 16:58] (current)
Line 318: Line 318:
 +==== Still on merging Pfam alignments ... ====
 +One of the things I like about Python and the Python community is the search for the making code simple and clear. Tal left a comment in the last post about merging Pfam alignment sequences suggesting another approach to our problem. The code is below
 +<code python>def merge_seqs(data1, data2):
 +    from itertools import chain, groupby
 +    format = "%s-%s->%d\n%s%s"
 +    flist = []
 +    keyfunc = lambda it: it.name[it.name.find('|') + 1 : it.name.find('/')]
 +    for it, g in groupby(sorted(chain(data1, data2), key=keyfunc), keyfunc):
 +        values = list(g)
 +        if len(values) == 2:
 +            jname, jseq = values[0].name, values[0].sequence
 +            kname, kseq = values[1].name, values[1].sequence
 +            flist.append(format % (jname, kname, len(jseq), jseq, kseq) )
 +    return flist</code>
 +The code also uses the //itertools// module, importing //chains// and //groupby//. We already saw chains in the previous post, but //groupby// is new to us here. //groupby// was introduced in the 2.4 version of Python and is a method returns keys and groups from an //iterable//. An Python iterable is any object that can return its elemements at given time, for instance in a //for// loop, while the index of this loop is the //iterator//. So, in our case //groupby// will return the sequence names based on the [[http://python.genedrift.org/2007/11/01/functional-programming-in-python-using-filter-take-one/ | lambda]] function defined before the groupby and the //chain// method. Usually //groupby// has this syntax
 +<code python>groupby(iterable[, key])</code>
 +The key is optional, and in our case it is the lambda function. Another method new to use that uses the same lambda function is //sorted//. As its name hints, //sorted// returns a sorted list of iterables. The key in this case is the sorting algorithm, that actually creates the comparison between items.
 +Basically in the code above, a lambda function extracts the desired regions from the sequence names, which are them iterated in a groupby method that returns they key values, one value when the sequence is unique, two values when there are two sequences, of a sorted iterable generated by a chain that read both input lists in one pass. After this we just need to check the number of returned values and we have our list of matching sequences.
part13.txt · Last modified: 2009/05/20 16:58 by nuin
Except where otherwise noted, content on this wiki is licensed under the following license:CC Attribution-Noncommercial-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki