This is an old revision of the document!


Generating reverse complement of DNA sequences

As shown in the GenBank DNA parser script, it is really useful to have the ability to get the reverse complement of some DNA sequences. The reverse complement of a 5'-3' DNA sequence is on its complementary strand. Using our fasta module it is easy to implement a function to generate the antiparallel sequence

Basically we can do this in two step, one - obtaining the complement and two - reversing it.

def complement(seq):
    abuffer = []
    for i in seq:
        if i == 'A':
            abuffer.append('T')
        elif i == 'C':
            abuffer.append('G')
        elif i == 'T':
            abuffer.append('A')
        elif i == 'G':
            abuffer.append('C')
    return abuffer

That's not a Pythonic approach, but it works reasonably well for short sequences. What would be a Pythonic approach to it? Using dictionaries and list comprehension. From the Python online documentation:

<em>List comprehensions provide a concise way to create lists without resorting to use of map(), filter() and/or lambda. The resulting list definition tends often to be clearer than lists built using those constructs. Each list comprehension consists of an expression followed by a for clause, then zero or more for or if clauses. The result will be a list resulting from evaluating the expression in the context of the for and if clauses which follow it.</em>

It is basically a specific way to modify lists using a loop and if clauses when needed. In a couple of lines we can do what we accomplished in 10 lines with the above excerpt. First we need a dictionary to set how the nucleotides are related

complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'}

and then a list comprehension to modify each nucleotide

complseq = [complement[base] for base in seq]

The list comprehension menas: for each base in the sequence get the dictionary value for each key (a nucleotide in the initial sequence). The complete function would be

def complement(seq):
    complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'}
    complseq = [complement[base] for base in seq]
    return complseq

One step solved. We are already able to get the complement, now we need to reverse it. Simple. A Python list method automatically reverts list elements. Basically what we need then is below

def reverse_complement(seq):
    seq = list(seq)
    seq.reverse()
    return ''.join(complement(seq))

The call from any script to this function would be simply be

fasta.reverse_complement('AACCGGTTTTGGCCAA')

Batteries included, indeed.

Transcribing DNA

A long time ago when the site was still based on the Perl book we have seen how to transcribe DNA to RNA. This entry serves only to remember the method and add a function to the fasta module in the repository.

It is really simple to transcribe in Python, by employing the

replace

method on strings. Our function looks like

transcribe(seq): RNA = seq.replace('T', 'U') return RNA
 
 
part12.1242834049.txt.gz · Last modified: 2009/05/20 11:40 by nuin
 
Except where otherwise noted, content on this wiki is licensed under the following license:CC Attribution-Noncommercial-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki