Hi
I have the code below that loops through every combination of columns in DF to create a subset of regression models and returns the best one. The code does not throw up any errors until I run the last line: best_subset(X, Y). It returns the following error : "IndexingError: Too many indexers".
Does anyone have an idea why it is not working?
Bitten by Python
I have the code below that loops through every combination of columns in DF to create a subset of regression models and returns the best one. The code does not throw up any errors until I run the last line: best_subset(X, Y). It returns the following error : "IndexingError: Too many indexers".
Does anyone have an idea why it is not working?
import numpy as np
import pandas as pd
import urllib
from itertools import chain, combinations
import statsmodels.api as sm
#Data
Rawdata = pd.read_csv("C:\\Users\\Yell\Documents\\datafilePython.csv")
#Regression code
def best_subset(X, Y):
n_features = X.shape[1]
subsets = chain.from_iterable(combinations(range(n_features), k+1) for k in range(n_features))
best_score = -np.inf
best_subset = None
for subset in subsets:
lin_reg = sm.OLS(Y, X.iloc[:, subset]).fit()
score = lin_reg.rsquared_adj
if score > best_score:
best_score, best_subset = score, subset
return best_subset, best_score
#Define variables
X = Rawdata.iloc[:, 1:10]
y = Rawdata.iloc[:, 0]
#Run
best_subset(X, Y)Thanks,Bitten by Python
