Jul-20-2022, 08:15 PM
(This post was last modified: Jul-20-2022, 08:15 PM by Led_Zeppelin.)
In the following code which should print out feature importances, I instead get a print of a matrix of all zeroes.
How to fix?
Any help appreciated.
Respectfully,
LZ
shortened_sensor.csv (Size: 3.02 KB / Downloads: 299)
import numpy as np
import pandas as pd
%matplotlib inline
from sklearn.preprocessing import StandardScaler
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.seasonal import seasonal_decompose
import time
from tqdm import tqdm
from scipy import stats
from sklearn.ensemble import ExtraTreesClassifier
df = pd.read_csv("shortened_sensor.csv")
print('here')
pd.set_option("display.max_rows", None, "display.max_columns", None)
df.head()
#df = df.head(5)
#df.to_csv("shortened_sensor.csv", index = False)
#df = df.head(5)
# Find Duplicate Values
# Results will be the list of duplicate values
# If no duplicate values, nothing will list.
df[df['timestamp'].duplicated(keep=False)]
df.isnull().sum()
df['machine_status'].value_counts()
# Convert timestamp column into data type into datetime
df['timestamp'] = pd.to_datetime(df['timestamp'])
# Create a Series
time_period = pd.Series([])
# Assign values to series
for i in tqdm(range(df.shape[0])):
if (df["timestamp"][i].hour >= 4) and (df["timestamp"][i].hour < 10):
time_period[i]="Morning"
elif (df["timestamp"][i].hour >= 10) and (df["timestamp"][i].hour < 16):
time_period[i]="Noon"
elif (df["timestamp"][i].hour >= 16) and (df["timestamp"][i].hour < 22):
time_period[i]="Evening"
else:
time_period[i]="Night"
# Insert new column time period
df.insert(2, 'time_period', time_period)
# The columns sensor_00, sensor_06, sensor-07, sensor_08, sensor_09, sensor-51
# Missing values are filled with median value of respective columns
df['sensor_00'].fillna(df['sensor_00'].median(), inplace=True)
df['sensor_06'].fillna(df['sensor_06'].median(), inplace=True)
df['sensor_07'].fillna(df['sensor_07'].median(), inplace=True)
df['sensor_08'].fillna(df['sensor_08'].median(), inplace=True)
df['sensor_09'].fillna(df['sensor_09'].median(), inplace=True)
df['sensor_51'].fillna(df['sensor_51'].median(), inplace=True)
df['sensor_01'].fillna(df['sensor_01'].median(), inplace=True)
df['sensor_02'].fillna(df['sensor_02'].median(), inplace=True)
df['sensor_03'].fillna(df['sensor_03'].median(), inplace=True)
df['sensor_04'].fillna(df['sensor_04'].median(), inplace=True)
df['sensor_05'].fillna(df['sensor_05'].median(), inplace=True)
df['sensor_10'].fillna(df['sensor_10'].median(), inplace=True)
df
df1 = df.copy()
df.drop(["Unnamed: 0","timestamp","time_period","machine_status"], axis = 1, inplace=True)
df.head()
scaler=StandardScaler()
df=scaler.fit_transform(df)
df
columns = [f'sensor_{idx:02d}' for idx in range(52)]
df2 = pd.DataFrame(df, columns=columns)
df2["machine_status"] = df1["machine_status"]
df2.head()
# Separating the dependent and independent varibale
y = df2 [ 'machine_status' ]
X = df2.drop(['machine_status', 'sensor_15'], axis = 1 )
X.head()
model = ExtraTreesClassifier()
model.fit(X, y)
print(model.feature_importances_)Error:C:\Users\james\AppData\Local\Temp\ipykernel_11652\2032974972.py:45: FutureWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
time_period = pd.Series([])
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 393.24it/s]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0.]I am also including a reduced size cv file to use as data. Why is this happening?How to fix?
Any help appreciated.
Respectfully,
LZ
