Hi Team,
I have 20 xlsb files, my task to combine all xlsb files data into single csv files , with Pipe delimiter
later this CSV I need to import it into SQL Table.
Challenges
18 xlsb files contains 30 headers , all header are same sequence
2 xlsb files contains 31 headers , all header are same , but one extra header for last column
I wanted to add one extra header in 18 xlsb header file
to make it 31 same headers. Extra header name I want to give word 'IFRS'
Expected output
In my output csv files ,
I want 31 header , extra header name is IFRS.
my attempted code is as below.
I have 20 xlsb files, my task to combine all xlsb files data into single csv files , with Pipe delimiter
later this CSV I need to import it into SQL Table.
Challenges
18 xlsb files contains 30 headers , all header are same sequence
2 xlsb files contains 31 headers , all header are same , but one extra header for last column
I wanted to add one extra header in 18 xlsb header file
to make it 31 same headers. Extra header name I want to give word 'IFRS'
Expected output
In my output csv files ,
I want 31 header , extra header name is IFRS.
my attempted code is as below.
<python
from pathlib import Path
import pandas as pd
foldername = "your_folder_name"
output_folder = "output_folder" # Define the folder where you want to save the output CSV files
extra_header = "Extra_Header"
columns_to_fill = ['year1', 'year2', 'year3', 'year4', 'year5', 'Total']
files_combine = Path(f"{foldername}").rglob("*.xlsx")
header_written = False
for input_xls in files_combine:
# Attempt to read from the 'Data' sheet; if it doesn't exist, read from the active sheet
try:
df = pd.read_excel(input_xls, sheet_name='Data')
except Exception:
df = pd.read_excel(input_xls, sheet_name=None, engine='openpyxl') # Read all sheets using openpyxl
# Select the first non-empty sheet as the active sheet
for sheet_name, sheet_data in df.items():
if not sheet_data.empty:
df = sheet_data
break
# Check if "IFRS" column already exists, and if not, add it
if 'IFRS' not in df.columns:
df['IFRS'] = extra_header
# Replace blank cells in the specified columns with 0
df[columns_to_fill] = df[columns_to_fill].fillna(0)
# Extract the existing headers from the first row
existing_headers = list(df.columns)
# Add the extra header "IFRS" to the list of existing headers
existing_headers.append(extra_header)
# Update the DataFrame columns with the new list of headers
df.columns = existing_headers
# Insert an additional column for 'FileName' in the DataFrame
df.insert(0, 'FileName', input_xls.name)
# Define the output CSV file name based on the input file name
output_csv_file = f"{output_folder}/{input_xls.stem}_combined.csv"
# Write the headers only if header_written is False
if not header_written:
df.to_csv(output_csv_file, index=False, sep='|', encoding='utf-8', float_format='%.6f', header=True, mode='w')
header_written = True
else:
# Append the workbook data to the CSV file without writing headers
df.to_csv(output_csv_file, index=False, sep='|', encoding='utf-8', float_format='%.6f', header=False, mode='a')
/python>
