I have sets of data, which look like this:
I do the following:
As you can see, there seems to be something wrong. I would expect the fit to go through the data. Any help is greatly appreciated.
I do the following:
df=pd.read_csv('data.csv')
timestamp_fields = ['Year', 'Month', 'Day', 'Hour', 'Minute','Second']
df['Date']=pd.to_datetime(df[timestamp_fields])
df=df.dropna()I'd like to plot the data and fit a straight line. Here is my approach:#fit
x = np.arange(df.iloc[:,-1:].size) #df.iloc[:,-1:] is the column Date added above
fit = np.polyfit(x, df.iloc[:,-2:-1], 1) #df.iloc[:,-2:-1] is the column of data
fit_fn=np.poly1d(fit[0])
#plotting
plt.rcParams["figure.figsize"] = (20,5)
plt.plot(df.iloc[:,-1:],df.iloc[:,-2:-1],'x',color='k')
plt.plot(df.iloc[:,-1:], fit_fn(x), 'k-')
plt.title('Isoprene and MBO')
ax = plt.gca()
ax.xaxis.set_major_locator(mdates.MonthLocator(interval=3))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m'))
plt.xticks(rotation='45')
plt.yscale('log')
plt.ylabel('Concentration [ppb]');Output:As you can see, there seems to be something wrong. I would expect the fit to go through the data. Any help is greatly appreciated.
