Aug-15-2020, 05:47 PM
Hi there,
I have the following Python Code, which I run in Jupyter Notebook :-
When I run the Code, using the Line above the display = line of Code I am using, i.e :-
What do I need to change in the following Line of Code :-
i.e. with this bit of that line of Code ? :-
Best Regards
Eddie Winch
I have the following Python Code, which I run in Jupyter Notebook :-
import pandas as pd
import requests
from bs4 import BeautifulSoup
res1 = requests.get("http://web.archive.org/web/20020602133812/http://www.raf.mod.uk/bbmf/calendar.html")
res2 = requests.get("http://web.archive.org/web/20020803081304/http://www.raf.mod.uk/bbmf/calendar.html")
soup1 = BeautifulSoup(res1.content,'lxml')
soup2 = BeautifulSoup(res2.content,'lxml')
table1 = soup1.find_all('table', align="CENTER")[0]
table2 = soup2.find_all('table', align="CENTER")[0]
df1 = pd.read_html(str(table1))
df2 = pd.read_html(str(table2))
df1 = df1[0]
df2 = df2[0]
df = pd.concat([df2, df1], axis=0)
##################
##################
##################
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
#make df[0] to list
list=[]
for i in df[0]:
list.append(i)
#reverse the list to make split to sublist easier
list.reverse()
#split list to sublist using condition len(val)> 2
size = len(list)
idx_list = [idx + 1 for idx, val in
enumerate(list) if len(val) > 2]
res = [list[i: j] for i, j in
zip([0] + idx_list, idx_list +
([size] if idx_list[-1] != size else []))]
#make monthname to numbers and print
for i in res:
for j in range(len(i)):
if i[j].upper()=='JUNE':
i[j]='6'
elif i[j].upper() =='MAY':
i[j]='5'
elif i[j].upper() == 'APRIL':
i[j]='4'
elif i[j].upper() =='JANUARY':
i[j]='1'
elif i[j].upper() == 'FEBRUARY':
i[j]='2'
elif i[j].upper() =='MARCH':
i[j]='3'
elif i[j].upper() == 'JULY':
i[j]='7'
elif i[j].upper() =='AUGUST':
i[j]='8'
elif i[j].upper() == 'SEPTEMBER':
i[j]='9'
elif i[j].upper() =='OCTOBER':
i[j]='10'
elif i[j].upper() == 'NOVEMBER':
i[j]='11'
elif i[j].upper() =='DECEMBER':
i[j]='12'
#append string and append to new list
finallist=[]
for i in res:
for j in range(len(i)):
if j < len(i) - 1:
#print(f'2004-{i[-1]}-{i[j]}')
finallist.append(f'2002-{i[-1]}-{i[j]}')
#print(finallist)
finallist.reverse()
#print("\n=== ORIGINAL DF ===\n")
#print(df)
#convert dataframe to list
listtemp1=df.values.tolist()
#replace found below values with 0000_removable
removelist=['LOCATION','LANCASTER','SPITFIRE','HURRICANE','DAKOTA','DATE','JUNE','JANUARY','FEBRUARY','MARCH','MAY','JULY','AUGUST','SEPTEMBER','OCTOBER','NOVEMBER','DECEMBER','APRIL']
for i in listtemp1:
for j in range(len(i)):
for place in removelist:
if str(i[j]).upper()==place:
i[j]='0000_removable'
else:
pass
#remove sublists with the replaced values we redirected
dellist=['0000_removable', '0000_removable', '0000_removable', '0000_removable', '0000_removable', '0000_removable']
res = [i for i in listtemp1 if i != dellist]
#assign back to dataframe DF3
df3=pd.DataFrame()
df3=pd.DataFrame(res, columns=['Date','LOCATION','LANCASTER','SPITFIRE','HURRICANE','DAKOTA'])
#print("\n=== AFTER REMOVE month and column names from DF, assigned to new as DF3 ===\n")
#print(df3)
#now assign that sorted date list to dataframe DF3
idx = 0
df3.insert(loc=idx, column='DATE', value=finallist)
pd.options.display.max_rows = 500
#print("\n=== FINAL DF3 after joining the edited date format column list ===\n")
#print(df3)
#validation logic if needed compare processed date from new joined "edited_Date_format" column with already existing "Date" column
#df3['ED1']= pd.to_datetime(df3['EDITED_DATE_FORMAT'],format='%Y-%m-%d').dt.day
#df3['validation of date'] = df3.apply(lambda x: str(x['ED1']) == x['Date'], axis=1)
#convert df3['EDITED_DATE_FORMAT'] column from object to datetime64 foramt
#df3['EDITED_DATE_FORMAT']= pd.to_datetime(df3['EDITED_DATE_FORMAT'],format='%Y-%m-%d')
##################
##################
##################
#df3 = df3.rename(columns=df.iloc[0])
#df3 = df.iloc[2:]
#df3.head(15)
pd.options.display.max_rows = 1000
pd.options.display.max_columns = 1000
df3['LANCASTER'] = df3['LANCASTER'].replace({'X': 'L'})
df3['HURRICANE'] = df3['HURRICANE'].replace({'X': 'H'})
df3['SPITFIRE'] = df3['SPITFIRE'].replace({'X': 'S'})
df3['SPITFIRE'] = df3['SPITFIRE'].replace({'X x 2': 'SS'})
df3['DAKOTA'] = df3['DAKOTA'].replace({'X': 'D'})
#display = df3[(df3['LOCATION'].str.contains('[a-zA-Z]')) & (df3['DAKOTA'].str.contains('X')) & (df3['SPITFIRE'].str.contains('X', na=True)) & (df3['LANCASTER'] != 'X')]
#display = df3[(df3['LOCATION'].str.contains('[a-zA-Z]')) & (df3['DAKOTA'].str.contains('D')) & (df3['SPITFIRE'].str.contains('S', na=True)) & (df3['LANCASTER'] != 'L')]
display = df3[(df3['LOCATION'].str.contains('[a-zA-Z]')) & (df3['DATE'].str.contains('Jun')) & (df3['DAKOTA'].str.contains('D')) & (df3['SPITFIRE'].str.contains('S', na=True)) & (df3['LANCASTER'] != 'L')]
#Months = May Jun Jul Aug Sep
#Months = -05- -06- -07- -08- -09-
#print(display)
display['DATE']= pd.to_datetime(display['DATE'],format='%Y-%m-%d')
display['DATE']= pd.to_datetime(display['DATE']).dt.strftime('%m-%d-%Y')
display=display.sort_values(by=['DATE'])
##added two lines above to convert date format
display['DATE']= pd.to_datetime(display['DATE']).dt.strftime('%d-%b-%Y')
#display.drop('LANCASTER', axis=1, inplace=True)
#display.drop('Date', axis=1, inplace=True)
display=display.dropna(subset=['SPITFIRE', 'HURRICANE'], how='all')
display=display[['LOCATION','DATE','DAKOTA','HURRICANE','SPITFIRE']]
display=display.fillna('--')
display.to_csv(r'C:\Users\Edward\Desktop\BBMF Schedules And Master Forum Thread Texts\BBMF-2002-Code (Dakota With Fighters).csv')
display
#display[display['DATE'].str.contains('Jun')]
#print(display)However when I run the Code, I don't get the desired Output, i.e. only the Columns are shown.When I run the Code, using the Line above the display = line of Code I am using, i.e :-
display = df3[(df3['LOCATION'].str.contains('[a-zA-Z]')) & (df3['DAKOTA'].str.contains('D')) & (df3['SPITFIRE'].str.contains('S', na=True)) & (df3['LANCASTER'] != 'L')]Then run the following Code, in the next Cell i.e. :-display[display['DATE'].str.contains('Jun')]I get the correct Output.What do I need to change in the following Line of Code :-
display = df3[(df3['LOCATION'].str.contains('[a-zA-Z]')) & (df3['DATE'].str.contains('Jun')) & (df3['DAKOTA'].str.contains('D')) & (df3['SPITFIRE'].str.contains('S', na=True)) & (df3['LANCASTER'] != 'L')So that I do get the correct Output, running that line of Code, instead of the first one ? i.e. with this bit of that line of Code ? :-
(df3['DATE'].str.contains('Jun'))Any help and info, anyone could give me, would be much appreciated.Best Regards
Eddie Winch
