Mar-28-2019, 06:58 PM
(This post was last modified: Mar-28-2019, 06:58 PM by pianistseb.)
I am using biopython for dna sequences. I am new in this python library. I have a .fasta file that has a 4-letters dna code, and I want to convert it in 2-letters purines and pyrimidines binary code. So I merge all the segments/records of the .fasta file and I take the full_sequence of 4-letters alphabet. Then I have to convert this alphabet into two letters alphabet new_sequence. And here is the problem! When I am doing the conversion it takes hours to run. The sequence's length is 119750280, so it's a very long sequence. Any ideas to make my program run faster?
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
# merge all the records
full_seq=Seq("")
for seq_record in SeqIO.parse("OMOK01.fasta", "fasta"):
full_seq+=seq_record.seq
# convert the 4-letters alphabet into binary alphabet
new_seq=Seq("")
for i in range(0,len(full_seq)):
if (full_seq[i]=="A") or (full_seq[i]=="G"):
new_seq+=Seq("-")
else:
new_seq+=Seq("+")
print("Binary sequence", repr(new_seq))
