Jun-09-2018, 09:36 AM
Hi everyone.
i am super super new to coding in python (total of 6 hours training), and i am running into a problem with my code. More precisely, i have no clue how to proceed.
Here is the situation:
I have a single folder with about 100+ excel files. Each one contains a list of measurements. All the excel sheets are in identical format (in terms of columns), but vary in number of measurements (indexes). I have created a very rough script to perform the processing I want. But right now i have to manually input the excel sheet name and change the output file name. Doing this for 100+ is rather stupid. So I want to find a way to batch process all excel files in one folder, and save the output file into a different folder.
simplified: how can I make my code "universal" and process every file in the folder, and save the output into a new folder that is created within parent folder.
The link below provides a couple sample data sets
Datasets
Here is my poorly made script that is full of useless codes and not efficient, but it works
p.s. please go easy on me.. i know my coding sucks.. lol
i am super super new to coding in python (total of 6 hours training), and i am running into a problem with my code. More precisely, i have no clue how to proceed.
Here is the situation:
I have a single folder with about 100+ excel files. Each one contains a list of measurements. All the excel sheets are in identical format (in terms of columns), but vary in number of measurements (indexes). I have created a very rough script to perform the processing I want. But right now i have to manually input the excel sheet name and change the output file name. Doing this for 100+ is rather stupid. So I want to find a way to batch process all excel files in one folder, and save the output file into a different folder.
simplified: how can I make my code "universal" and process every file in the folder, and save the output into a new folder that is created within parent folder.
The link below provides a couple sample data sets
Datasets
Here is my poorly made script that is full of useless codes and not efficient, but it works
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Fri Jun 8 12:01:07 2018
@author: anthonyeritano
"""
import pandas as pd
import numpy as np
#choose excel file
File = 'test1.xlsx'
xl_file = pd.ExcelFile(File)
#Select spreadsheet data
Data= xl_file.parse('Sheet1') #"parse" reads a sheet from the Excel file
#Select values from Columns
Skel = Data['Skeleton ID']
Brnch = Data['Branch length']
Euc = Data['Euclidean distance']
tort= Data['Branch length']/Data['Euclidean distance']
# make DataFrame (new spreadsheet) for the values
DF_3 = pd.DataFrame({'Branch Number': Skel,
'Brnchln': Brnch,
'Eucl': Euc,
'Tort': tort})
DF_4 = DF_3[['Branch Number','Brnchln','Eucl','Tort']]
DF_4.head()
totalNumber = len(DF_4.dropna())
dic = DF_4.to_dict()
dic2 = {}
for key in dic.keys():
for key2 in dic[key]:
dic2[key + ' ' + str(key2)] = dic[key][key2]
key_names = ['Brnchln','Eucl','Tort']
numbers = range(totalNumber)
titlearray=[File]
ordered_names = []
ordered_values = []
for i in range(totalNumber):
for name in key_names:
temp_name = name + ' ' + str(i)
ordered_names.append(temp_name)
ordered_values.append(dic2[temp_name])
#print(temp_name, dic2[temp_name])
array = np.array((ordered_names, ordered_values))
array.shape
#%%
clmntitle=array[0,:]
mesur=array[1,:]
newclmntitle=np.insert(clmntitle, 0, 'Filename')
newmesur=np.insert(mesur,0,titlearray)
array3=np.array((newclmntitle,newmesur))
array3
array3.shape
#%%
import xlsxwriter
workbook = xlsxwriter.Workbook('test1 processed.xlsx')
worksheet = workbook.add_worksheet()
row=0
col=0
worksheet.write_row(row, col, newclmntitle)
row=1
col=0
worksheet.write_row(row, col, newmesur)
workbook.close()Any help is appreciated!!!p.s. please go easy on me.. i know my coding sucks.. lol
