from tpot import TPOTRegressor
from sklearn.model_selection import train_test_split
from pandas import *
# load teh data
df=read_csv('MAGIC Gamma Telescope Data.csv')
# clean the data
features = df.drop('Class', axis=1).values # = X
df['Class'] = df['Class'].map({'g':0, 'h':1}) # changing 'g','h' into 0 and 1
target = df['Class'].values # = y
# Split the data
X_train, X_test, y_train, y_test = train_test_split(features, target, train_size=0.8, test_size=0.2)
# Let Genetic Programming find best ML model and hyperparameters
tpot = TPOTRegressor( generations=5, verbosity=2 )
# usually, for generations, population_size, offspring_size, the default options are good.
# If you want to get a result quickly, adjust only generation.
tpot.fit(X_train, y_train)
# Score the accuracy
tpot.score(X_test, y_test)
print("Cross Validation(CV) score : {} / 0<= CV score <= 1(perfectly accurate) ".format(tpot.score(X_test, y_test)))
# Export the generated code
tpot.export('tpot_test1_pipeline.py')
Precondition : I assume that you got 'MAGIC Gamma Telescope Data.csv' file from my previous posting. Just copy and paste in Notepad and save it as .csv file.
Optimization Progress: 13%|█▎ | 78/600 [27:19<3:55:49, 27.11s/pipeline]
Optimization Progress: 34%|███▎ | 202/600 [1:06:22<2:12:22, 19.96s/pipeline]Generation 1 - Current best internal CV score: 0.09374578856689733
Optimization Progress: 39%|███▉ | 233/600 [1:21:01<39:46, 6.50s/pipeline]