Jan-29-2020, 01:41 PM
Hello Everybody
I wrote this code which takes 2 files removes any letters inside of them keeping only the phone numbers, then removes any duplicates and compares the files to find the common content.
The Code is this:
I tried .rsplit, .rpartition trying to drop the .csv extension from the initial filename but it doesn't work.
Can anyone help?
I wrote this code which takes 2 files removes any letters inside of them keeping only the phone numbers, then removes any duplicates and compares the files to find the common content.
The Code is this:
import re
import csv
filename_list=[]
file1 = input("Please input file1: ")
filename_list.append(file1)
file2 = input("Please input file2: ")
filename_list.append(file2)
duplicate_list=[]
def clean_file(filename):
with open (filename,'r') as f:
list1=f.readlines()
for ch in list1:
result=re.sub('[^0-9]','',ch)
with open(('{}_clean.csv').format(filename),'a+') as cl:
if len(result)<10:
result=result.strip()
else:
cl.write(result + '\n')
def clean_duplicates(filename):
lines_seen = set()
with open(('{}_clean_dup.csv').format(filename),'w') as rf:
duplicate_list.append(rf.name)
for line in open(('{}_clean.csv').format(filename),'r'):
if line not in lines_seen:
rf.write(line)
lines_seen.add(line)
def find_common():
comp_file1 = open(duplicate_list[0], "r")
comp_file2 = open(duplicate_list[1], "r")
result = open("results.csv", "a")
list1 = comp_file1.readlines()
list2 = comp_file2.readlines()
for i in list1:
for j in list2:
if i==j:
result.write(i)
comp_file1.close()
comp_file2.close()
result.close()
for filename in filename_list:
clean_file(filename)
clean_duplicates(filename)
find_common()So the code works but I have a slight problem. The produced files get filenames like this: filename.csv_clean.csv and filename.csv_clean_dup.csv.I tried .rsplit, .rpartition trying to drop the .csv extension from the initial filename but it doesn't work.
Can anyone help?
