Jun-22-2018, 12:05 PM
Apologies for the long question. Basically I am trying to loop through a dictionary I've constructed and check whether a specific element of the hash is in a given list.
Test script:
For the test script you get 2 printed out.
The data looks like this:
The isolateFile is a text file:
L01121-17_R1
AE006468
L00817-17_R1
L00665-17_R1
The snapper_data file is csv file:
1,L02476-16_P_R1,AE006468,873
2,L02476-16_P_R1,L02888-16_P_R1,2
3,L02476-16_P_R1,L00541-14_P_R1,914
4,L02476-16_P_R1,L02471-16_P_R1,842
5,AE006468,L02888-16_P_R1,832
I'm really desperate to get command of using dictionaries but this is bugging me.
Test script:
Hash_Isolates={
"1" : ['L02476-16_P_R1', 'AE006468', '873'],
"2" : ['AE006468', 'AE006468', '40'],
"3" : ['AE006468', 'L02476-16_P_R1', '756'],
"4" : ['L00409-17_R1', 'L02476-16_P_R1', '987'],
"5" : ['L00817-17_R1', 'AE006468', '65']
}
new_isolateList=['AE006468', 'L00817-17_R1']
my_Isolates=[]
for i in Hash_Isolates:
if Hash_Isolates[i][0] in new_isolateList and Hash_Isolates[i][1] in new_isolateList:
my_Isolates.append(Hash_Isolates[i])
print(len(my_Isolates))Strangely, when I run this test script it works, but when I run the proper script it doesn't.For the test script you get 2 printed out.
#!/usr/bin/env python2.7
import getpass
import sys
import re
isolateFile=sys.argv[1]
snapper_data=sys.argv[2]
## Get the user ID ##
def get_User():
currentUser = getpass.getuser()
return currentUser
isolatePath='/home/'+get_User()+'/path/to/file/'+isolateFile
dataPath='/home/'+get_User()+'/path/to/file/'+snapper_data
# Retrieve isolates from file
isolateList=[]
with open(isolatePath, 'r') as file:
isolateList=file.readlines()
new_isolateList=[]
for i in isolateList:
try:
x=re.search('(\w.....-?.?.?\d?)', str(i)).group(1)
except:
pass
new_isolateList.append(x)
all_results=[]
with open(dataPath, 'r') as file:
all_results=file.readlines()
# w is the position in the list of the samples being compared from the whole file
# x is first sample in comparison
# y is the second sample in comparison
# z is the SNP distance between the first and second samples
Hash_Isolates={}
for i in all_results:
w=re.search('(.?.?.?.?.?.?.?),.+,.+,\d+\n', str(i)).group(1)
x=re.search('.?.?.?.?.?.?.?,(.+),.+,\d+\n', str(i)).group(1)
y=re.search('.?.?.?.?.?.?.?,.+,(.+),\d+\n', str(i)).group(1)
z=re.search('.?.?.?.?.?.?.?,.+,.+,(\d+)\n', str(i)).group(1)
Hash_Isolates[w]=[x, y, z]
my_Isolates=[]
for i in Hash_Isolates:
if Hash_Isolates[i][0] in new_isolateList and Hash_Isolates[i][1] in new_isolateList:
my_Isolates.append(Hash_Isolates[i])
print(len(my_Isolates))So I expect this to work in the same way, but it prints 0. This snapper_data file has 100k + lines.The data looks like this:
The isolateFile is a text file:
L01121-17_R1
AE006468
L00817-17_R1
L00665-17_R1
The snapper_data file is csv file:
1,L02476-16_P_R1,AE006468,873
2,L02476-16_P_R1,L02888-16_P_R1,2
3,L02476-16_P_R1,L00541-14_P_R1,914
4,L02476-16_P_R1,L02471-16_P_R1,842
5,AE006468,L02888-16_P_R1,832
I'm really desperate to get command of using dictionaries but this is bugging me.
