Jan-26-2018, 10:18 AM
Hi everyone !
This is my first post here. I am originally a biologist but I started doing bioinformatics. I have very basic knowledge in programming, in perl, python and shell.
I found online a small python script which prints a list of all contigs in a multi-fasta file, with their length.
- How can I combine these two scripts into one ? I actually don't need the intermediate list of lengths, I only need to use it as an input for my second script, and as I will have quite a lot of files to process, I would quite like to avoid the accumulation of intermediate files and having to run two scripts instead of just one.
I would really like to understand how it works, more than just having the solution, as I think this is something very basic that I should know how to do.
Thanks in advance for your time and help. If there is anything I should explain differently in order for you to help me, please let me know.
This is my first post here. I am originally a biologist but I started doing bioinformatics. I have very basic knowledge in programming, in perl, python and shell.
I found online a small python script which prints a list of all contigs in a multi-fasta file, with their length.
#!/usr/bin/python from Bio import SeqIO import sys cmdargs = str(sys.argv) for seq_record in SeqIO.parse(str(sys.argv[1]), "fasta"): output_line = '%s\t%i' % \ (seq_record.id, len(seq_record)) print(output_line)I used this script to store the result in a file (that I named TestTailleContigs1.txt for now), and I wrote a script that uses this file and the original multi-fasta file to cut off all the contigs with a length < 1000 and store the result in a new file.
#!/usr/bin/python
from __future__ import division
length_file = open("TestTailleContigs1.txt")
contig_file = open("2013_1056H.contigs.fa")
output_file = open("2013_1056H.contigs_filtered.fa","w")
Contigs_over_1000 = 0
Contigs = 0
FirstContigToTrim = 0
for line in length_file:
column = line.split("\t")
contigsize = int(column[1])
if contigsize > 1000:
Contigs_over_1000 += 1
Contigs += 1
else :
Contigs += 1
if FirstContigToTrim == 0:
FirstContigToTrim = column[0]
print("The first contig to filter is " + str(FirstContigToTrim))
print("The number of contigs in the original file is " + str(Contigs))
print("The number of contigs remaining after filtering is " + str(Contigs_over_1000))
testcontig = 0
for line in contig_file:
present = line.count(str(FirstContigToTrim))
testcontig = testcontig + present
if testcontig == 0:
output_file.write(line)
length_file.close
contig_file.close
output_file.closeMy question is: - How can I combine these two scripts into one ? I actually don't need the intermediate list of lengths, I only need to use it as an input for my second script, and as I will have quite a lot of files to process, I would quite like to avoid the accumulation of intermediate files and having to run two scripts instead of just one.
I would really like to understand how it works, more than just having the solution, as I think this is something very basic that I should know how to do.
Thanks in advance for your time and help. If there is anything I should explain differently in order for you to help me, please let me know.
