In [ ]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.metrics import accuracy_score

We have chosen the Pokémon dataset from https://www.kaggle.com/datasets/abcsds/pokemon . Our objective is to classify Pokémon types based on the stats and find the model and hyperparameters with the best performance. There are multiple ways of going about this. But first, let's explore the dataset.

Exploring the dataset¶

In [ ]:
# import the dataset
df = pd.read_csv('data/Pokemon.csv')

original_df = df.copy()
# Print head
df.head()
Out[ ]:
# Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
0 1 Bulbasaur Grass Poison 318 45 49 49 65 65 45 1 False
1 2 Ivysaur Grass Poison 405 60 62 63 80 80 60 1 False
2 3 Venusaur Grass Poison 525 80 82 83 100 100 80 1 False
3 3 VenusaurMega Venusaur Grass Poison 625 80 100 123 122 120 80 1 False
4 4 Charmander Fire NaN 309 39 52 43 60 50 65 1 False
In [ ]:
# Print info such as data types and number of non-null values
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 800 entries, 0 to 799
Data columns (total 13 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   #           800 non-null    int64 
 1   Name        800 non-null    object
 2   Type 1      800 non-null    object
 3   Type 2      414 non-null    object
 4   Total       800 non-null    int64 
 5   HP          800 non-null    int64 
 6   Attack      800 non-null    int64 
 7   Defense     800 non-null    int64 
 8   Sp. Atk     800 non-null    int64 
 9   Sp. Def     800 non-null    int64 
 10  Speed       800 non-null    int64 
 11  Generation  800 non-null    int64 
 12  Legendary   800 non-null    bool  
dtypes: bool(1), int64(9), object(3)
memory usage: 75.9+ KB
In [ ]:
# Print summary statistics of numeric types
df.describe()
Out[ ]:
# Total HP Attack Defense Sp. Atk Sp. Def Speed Generation
count 800.000000 800.00000 800.000000 800.000000 800.000000 800.000000 800.000000 800.000000 800.00000
mean 362.813750 435.10250 69.258750 79.001250 73.842500 72.820000 71.902500 68.277500 3.32375
std 208.343798 119.96304 25.534669 32.457366 31.183501 32.722294 27.828916 29.060474 1.66129
min 1.000000 180.00000 1.000000 5.000000 5.000000 10.000000 20.000000 5.000000 1.00000
25% 184.750000 330.00000 50.000000 55.000000 50.000000 49.750000 50.000000 45.000000 2.00000
50% 364.500000 450.00000 65.000000 75.000000 70.000000 65.000000 70.000000 65.000000 3.00000
75% 539.250000 515.00000 80.000000 100.000000 90.000000 95.000000 90.000000 90.000000 5.00000
max 721.000000 780.00000 255.000000 190.000000 230.000000 194.000000 230.000000 180.000000 6.00000
In [ ]:
# Check for missing values
print(df.isnull().sum())
#               0
Name            0
Type 1          0
Type 2        386
Total           0
HP              0
Attack          0
Defense         0
Sp. Atk         0
Sp. Def         0
Speed           0
Generation      0
Legendary       0
dtype: int64

Some pokémon don't have a second type. We'll need to handle these missing values later

In [ ]:
# Plot the correlation matrix
sns.heatmap(df.drop(['#', 'Name'], axis=1).select_dtypes(include=[np.number, bool]).corr(), square=True, cmap='RdYlGn');
No description has been provided for this image

As you can see, the stat total is correlated with the 6 main stats (HP, attack, defense, sp. attack, sp. defense & speed), which makes sense, as it is the summation of these stats.

Also, the generation is not really correlated (much) with anything other than itself. This is not too surprising, as each generation introduces a wide variety of new Pokémon. (The slightly higher correlation with whether the Pokémon is legendary or not, does reflect that later generations featured higher ratios of legendaries among the new Pokémon of that generation than earlier generations.)

The lack of correlation between speed and defense is also something that stands out in this visualisation.

Common preprocessing steps¶

In [ ]:
# Drop the Name and ID columns
df.drop(['#', 'Name'], axis=1, inplace=True)
# Covert the Legendary column from bool to int
df.Legendary = df.Legendary.astype(int)
# Print the head of the dataframe
df.head()
Out[ ]:
Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
0 Grass Poison 318 45 49 49 65 65 45 1 0
1 Grass Poison 405 60 62 63 80 80 60 1 0
2 Grass Poison 525 80 82 83 100 100 80 1 0
3 Grass Poison 625 80 100 123 122 120 80 1 0
4 Fire NaN 309 39 52 43 60 50 65 1 0

We will handle the missing values by filling it up by a 'None' type

In [ ]:
# Handling missing values in 'Type 2' column
df['Type 2'] = df['Type 2'].fillna('None')


print(df.isnull().sum())
df.head()
Type 1        0
Type 2        0
Total         0
HP            0
Attack        0
Defense       0
Sp. Atk       0
Sp. Def       0
Speed         0
Generation    0
Legendary     0
dtype: int64
Out[ ]:
Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
0 Grass Poison 318 45 49 49 65 65 45 1 0
1 Grass Poison 405 60 62 63 80 80 60 1 0
2 Grass Poison 525 80 82 83 100 100 80 1 0
3 Grass Poison 625 80 100 123 122 120 80 1 0
4 Fire None 309 39 52 43 60 50 65 1 0

We copy the dataframe to use it later

In [ ]:
preprocessed_df = df.copy()

As stated before, there are many ways to classify multipe classes.

  1. Multiclass Classification:
  • 1.1. Accounting for Order of Types
    • For this task, we need to make every possible combination a class, where the order of types matters.
    • We can consider using classifiers like:
      • tree.DecisionTreeClassifier
      • neighbors.KNeighborsClassifier
      • linear_model.LogisticRegression
      • ensemble.RandomForestClassifier
      • svm.SVC
  • 1.2. Ignoring Order of Types:
    • For this task, where the order of types doesn't matter, the same classifiers as above can be used. We just need to ensure that our encoding of the labels reflects this.
  1. Multilabel Classification:
  • 2.1. Accounting for Order of Types

    • For multilabel classification where each Pokémon can have multiple types and we account for order of types, we can use:
      • tree.DecisionTreeClassifier
      • neighbors.KNeighborsClassifier
      • ensemble.RandomForestClassifier
  • 2.2 Ignoring Order of Types

    • For this task, where the order of types doesn't matter, the same classifiers as above can be used. We just need to ensure that our encoding of the labels reflects this.
  1. Multiclass Multioutput Classification:
    • For multiclass multioutput classification where each Pokémon can have two types predicted independently, we can use classifiers like:
      • tree.DecisionTreeClassifier
      • neighbors.KNeighborsClassifier
      • ensemble.RandomForestClassifier

Multi-class classification¶

Accounting for Order of Types¶

Preprocessing¶

We make a new column 'Types' where the combination of type 1 and type 2 are stored in tuples, which is ordered by default

In [ ]:
df = preprocessed_df.copy()

df['Types'] = df[['Type 1', 'Type 2']].apply(lambda x: tuple(filter(lambda y: pd.notna(y), x)), axis=1)
df.Types = df.Types.astype(str)

print(len(df['Types'].unique()))

# print two pokemon types where we know type 1 and two are the same but reversed to check if order is kept into account
print("Sableye: ",df['Types'][326])
print("Spiritomb: ",df['Types'][490])

# drop the Type 1 and Type 2 columns
df.drop(['Type 1', 'Type 2'], axis=1, inplace=True)

# print head
df.head()
154
Sableye:  ('Dark', 'Ghost')
Spiritomb:  ('Ghost', 'Dark')
Out[ ]:
Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary Types
0 318 45 49 49 65 65 45 1 0 ('Grass', 'Poison')
1 405 60 62 63 80 80 60 1 0 ('Grass', 'Poison')
2 525 80 82 83 100 100 80 1 0 ('Grass', 'Poison')
3 625 80 100 123 122 120 80 1 0 ('Grass', 'Poison')
4 309 39 52 43 60 50 65 1 0 ('Fire', 'None')
In [ ]:
# show the distribution of pokemon types
sns.countplot(df, y='Types');
No description has been provided for this image

As you can see, the dataset is heavily imbalanced. Some combinations only occur once. We decided to extract them and put them in both train and test sets to stratify the data better.

In [ ]:
# Some type combinations only occur once so we double them to stratify the data better
singleton_classes = df['Types'].value_counts()[df['Types'].value_counts() == 1].index.tolist()
singleton_data = df[df['Types'].isin(singleton_classes)]
other_data = df[~df['Types'].isin(singleton_classes)]

print("Number of singleton classes",len(singleton_classes))
print("number of unique type combinations",len(df['Types'].unique()))
print(len(df['Types']))

df.head()
Number of singleton classes 39
number of unique type combinations 154
800
Out[ ]:
Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary Types
0 318 45 49 49 65 65 45 1 0 ('Grass', 'Poison')
1 405 60 62 63 80 80 60 1 0 ('Grass', 'Poison')
2 525 80 82 83 100 100 80 1 0 ('Grass', 'Poison')
3 625 80 100 123 122 120 80 1 0 ('Grass', 'Poison')
4 309 39 52 43 60 50 65 1 0 ('Fire', 'None')

Decision tree¶

The single classes are added to both the training and test sets after being stratified on the rest of the data. This makes the actual test size slightly larger.

In [ ]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
# Split the data into training and testing sets
X = df.drop(columns=['Types'])
y = df['Types']

X_train, X_test, y_train, y_test = train_test_split(other_data.drop(columns=['Types']), other_data['Types'], test_size=0.2, stratify=other_data['Types'], random_state=42)
X_train = pd.concat([X_train, singleton_data.drop(columns=['Types'])])
y_train = pd.concat([y_train, singleton_data['Types']])
X_test = pd.concat([X_test, singleton_data.drop(columns=['Types'])])
y_test = pd.concat([y_test, singleton_data['Types']])

print("actual test size:",len(X_test)/(len(X_train)+len(X_test)))
actual test size: 0.22884386174016685
In [ ]:
from sklearn.metrics import accuracy_score, f1_score, fbeta_score

# Initialize and train the decision tree classifier
model = DecisionTreeClassifier(random_state=42)
model.fit(X_train, y_train)

# Predict labels for the test set
y_pred = model.predict(X_test)

# Calculate accuracy
print("Score: ", model.score(X_test, y_test))
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
Score:  0.2604166666666667
Accuracy: 0.2604166666666667

Hyperparameter Tuning¶

For hyperparameter tuning we use GridSearchCV and a pipeline with a StandardScaler to make the data more uniform.

In [ ]:
# Import GridSearchCV
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RandomizedSearchCV
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
# Setup the parameters and distributions to sample from: param_dist
pipeline = make_pipeline(StandardScaler(), DecisionTreeClassifier())

param_dist = {
    "decisiontreeclassifier__max_depth": np.arange(5, 15),  
    "decisiontreeclassifier__min_samples_leaf": np.arange(1, 5)
}

# Instantiate the GridSearchCV object
grid_search_cv = GridSearchCV(pipeline, param_grid=param_dist, cv=5)

# Fit grid_search_cv using the data X and labels y.
grid_search_cv.fit(X_train, y_train) 
y_pred = grid_search_cv.predict(X_test)

# Print the best score
print("Tuned Model Parameters: {}".format(grid_search_cv.best_params_))
print("Accuracy: {}".format(grid_search_cv.best_estimator_.score(X_test, y_test)))
#print(classification_report(y_test, y_pred))
y_pred = grid_search_cv.best_estimator_.predict(X_test)
c:\Users\Chau\miniconda3\Lib\site-packages\sklearn\model_selection\_split.py:737: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=5.
  warnings.warn(
Tuned Model Parameters: {'decisiontreeclassifier__max_depth': 8, 'decisiontreeclassifier__min_samples_leaf': 4}
Accuracy: 0.10416666666666667

Hyperparameter tuning can sometimes lead to worse results than using default settings. This can occur when the tuning process, typically done via cross-validation on the training data, inadvertently overfits the model. Overfitting happens when the model learns not only the underlying patterns but also the noise in the training data, which reduces its ability to generalize to new, unseen data. This issue is exacerbated when the dataset has classes with very few members, leading to unreliable splits during cross-validation.

Random forest¶

In [ ]:
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(random_state=42)

model.fit(X_train, y_train)

# Predict labels for the test set
y_pred = model.predict(X_test)
score = model.score(X_test, y_test)
# Calculate accuracy
print("Score :", score )
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
Score : 0.3072916666666667
Accuracy: 0.3072916666666667

Hyperparameter Tuning¶

In [ ]:
# Import GridSearchCV
pipeline = make_pipeline(StandardScaler(), RandomForestClassifier())

# Setup the parameters and distributions to sample from: param_dist
param_dist = {
    "randomforestclassifier__max_depth": np.arange(5, 20),  
    "randomforestclassifier__min_samples_leaf": np.arange(1, 10),
    "randomforestclassifier__n_estimators": np.arange(50, 150, 5)
}

# Instantiate the RandomizedSearchCV object: random_grid_search_cv
random_search_cv = RandomizedSearchCV(pipeline, param_distributions=param_dist, n_iter=100, cv=3, random_state=42)
#grid_search_cv = GridSearchCV(pipeline, param_grid=param_dist, cv=3)

# Fit random_search_cv using the data X and labels y
random_search_cv.fit(X_train, y_train)
#grid_search_cv.fit(X_train, y_train)

# Print the best score
print("Best score is {}".format(random_search_cv.best_estimator_.score(X_test, y_test)))
print("Best parameters are {}".format(random_search_cv.best_params_))
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\model_selection\_split.py:737: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=3.
  warnings.warn(
Best score is 0.3020833333333333
Best parameters are {'randomforestclassifier__n_estimators': 85, 'randomforestclassifier__min_samples_leaf': 3, 'randomforestclassifier__max_depth': 15}

Support Vector Machine¶

Radial base function¶

In [ ]:
from sklearn.svm import SVC

model = SVC(kernel='rbf', random_state=42)

model.fit(X_train, y_train)
y_pred = model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
Accuracy: 0.078125
Hyperparameter Tuning¶
In [ ]:
from sklearn.svm import SVC

# Define the pipeline with StandardScaler and SVC
pipeline = make_pipeline(StandardScaler(), SVC(kernel='rbf'))

# Define the parameter grid
param_grid = {
    'svc__C': [0.1,0.5, 1,5, 10],        # Regularization parameter
    'svc__coef0': [0.0, 1.0, 2.0], # Independent term in the polynomial kernel function
}

# Initialize GridSearchCV
grid_search = GridSearchCV(pipeline, param_grid, cv=5, n_jobs=-1)

# Perform grid search
grid_search.fit(X_train, y_train)

# Get the best parameters and score
best_params = grid_search.best_params_
best_score=grid_search.best_estimator_.score(X_test, y_test)

print("Best Parameters:", best_params)
print("Best score is {}".format(grid_search.best_estimator_.score(X_test, y_test)))
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\model_selection\_split.py:737: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=5.
  warnings.warn(
Best Parameters: {'svc__C': 5, 'svc__coef0': 0.0}
Best score is 0.21354166666666666

Linear¶

In [ ]:
svm_classifier = SVC(kernel='linear', random_state=42)

# Train the SVM classifier
svm_classifier.fit(X_train, y_train)
y_pred = svm_classifier.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
Accuracy: 0.3072916666666667
Hyperparameter tuning¶
In [ ]:
from sklearn.svm import SVC

# Define the pipeline with StandardScaler and SVC
pipeline = make_pipeline(StandardScaler(), SVC(kernel='linear', random_state=42))

# Define the parameter grid
param_grid = {
    'svc__C': [0.1,0.5, 1 , 5, 10],        # Regularization parameter    # Degree of the polynomial kernel
    'svc__coef0': [0.0, 1.0, 2.0], # Independent term in the polynomial kernel function
}

# Initialize GridSearchCV
grid_search = GridSearchCV(pipeline, param_grid, cv=5, n_jobs=-1)

# Perform grid search
grid_search.fit(X_train, y_train)

# Get the best parameters and score
best_params = grid_search.best_params_
best_score=grid_search.best_estimator_.score(X_test, y_test)

print("Best Parameters:", best_params)
print("Best Score:", best_score)
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\model_selection\_split.py:737: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=5.
  warnings.warn(
Best Parameters: {'svc__C': 1, 'svc__coef0': 0.0}
Best Score: 0.22916666666666666

Polynomial¶

In [ ]:
model = SVC(kernel='poly', random_state=42)

# Train the SVM classifier
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
Accuracy: 0.09375
Hyperparameter Tuning¶
In [ ]:
from sklearn.svm import SVC

# Define the pipeline with StandardScaler and SVC
pipeline = make_pipeline(StandardScaler(), SVC(kernel='poly'))

# Define the parameter grid
param_grid = {
    'svc__C': [0.1,0.5, 1, 5, 10],        # Regularization parameter
    'svc__degree': [2, 3, 4, 5, 6],      # Degree of the polynomial kernel
    'svc__coef0': [0.0, 1.0, 2.0], # Independent term in the polynomial kernel function
}

# Initialize GridSearchCV
grid_search = GridSearchCV(pipeline, param_grid, cv=5, n_jobs=-1)

# Perform grid search
grid_search.fit(X_train, y_train)

# Get the best parameters and score
best_params = grid_search.best_params_
best_score=grid_search.best_estimator_.score(X_test, y_test)

print("Best Parameters:", best_params)
print("Best Score:", best_score)
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\model_selection\_split.py:737: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=5.
  warnings.warn(
Best Parameters: {'svc__C': 1, 'svc__coef0': 2.0, 'svc__degree': 2}
Best Score: 0.20833333333333334

Sigmoid¶

In [ ]:
model = SVC(kernel='sigmoid', random_state=42)

# Train the SVM classifier
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

# Calculate accuracy
print("Score: ", model.score(X_test, y_test))

accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
Score:  0.057291666666666664
Accuracy: 0.057291666666666664
Hyperparameter Tuning¶
In [ ]:
from sklearn.svm import SVC

# Define the pipeline with StandardScaler and SVC
pipeline = make_pipeline(StandardScaler(), SVC(kernel='sigmoid'))

# Define the parameter grid
param_grid = {
    'svc__C': [0.1,0.5, 1,5, 10],        # Regularization parameter
    'svc__coef0': [0.0, 1.0, 2.0], # Independent term in the polynomial kernel function
}

# Initialize GridSearchCV
grid_search = GridSearchCV(pipeline, param_grid, cv=5, n_jobs=-1)

# Perform grid search
grid_search.fit(X_train, y_train)

# Get the best parameters and score
best_params = grid_search.best_params_
best_score=grid_search.best_estimator_.score(X_test, y_test)

print("Best Parameters:", best_params)
print("Best Score:", best_score)
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\model_selection\_split.py:737: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=5.
  warnings.warn(
Best Parameters: {'svc__C': 1, 'svc__coef0': 0.0}
Best Score: 0.06770833333333333

k Nearest neighbors¶

In [ ]:
from sklearn.neighbors import KNeighborsClassifier
model = KNeighborsClassifier()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

# Calculate accuracy
print("Score: ",model.score(X_test, y_test))

accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
Score:  0.09375
Accuracy: 0.09375

Hyperparameter Tuning¶

In [ ]:
from sklearn.neighbors import KNeighborsClassifier
param_grid = {
    'kneighborsclassifier__n_neighbors': [3, 5, 7, 9]  # List of k values to try
}

pipeline = make_pipeline(StandardScaler(), KNeighborsClassifier())

grid_search = GridSearchCV(estimator=pipeline, param_grid=param_grid, cv=5)

grid_search.fit(X_train, y_train)

best_params = grid_search.best_params_
best_score=grid_search.best_estimator_.score(X_test, y_test)

print("Best Parameters:", best_params)
print("Best Score:", best_score)
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\model_selection\_split.py:737: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=5.
  warnings.warn(
Best Parameters: {'kneighborsclassifier__n_neighbors': 7}
Best Score: 0.09375

Logistic regression¶

In [ ]:
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

# Calculate accuracy
print("Score: ", model.score(X_test, y_test))

accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
Score:  0.13020833333333334
Accuracy: 0.13020833333333334
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_logistic.py:469: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
In [ ]:
model = LogisticRegression( random_state=42, multi_class='auto', solver='liblinear', max_iter=1000) 
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

# Calculate accuracy
print("Score: ", model.score(X_test, y_test))

accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
Score:  0.15625
Accuracy: 0.15625

Hyperparameter tuning¶

In [ ]:
param_grid = {
    'logisticregression__C': np.logspace(-5, 5, 5),
    'logisticregression__penalty': ['l1', 'l2']
}

pipeline = make_pipeline(StandardScaler(), LogisticRegression(solver='liblinear'))

grid_search = GridSearchCV(estimator=pipeline, param_grid=param_grid, cv=5)

grid_search.fit(X_train, y_train)

best_params = grid_search.best_params_
best_score=grid_search.best_estimator_.score(X_test, y_test)

print("Best Parameters:", best_params)
print("Best Score:", best_score)
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\model_selection\_split.py:737: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=5.
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\svm\_base.py:1237: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\svm\_base.py:1237: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\svm\_base.py:1237: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\svm\_base.py:1237: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\svm\_base.py:1237: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\svm\_base.py:1237: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\svm\_base.py:1237: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\svm\_base.py:1237: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\svm\_base.py:1237: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\svm\_base.py:1237: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
Best Parameters: {'logisticregression__C': 316.22776601683796, 'logisticregression__penalty': 'l2'}
Best Score: 0.21354166666666666
In [ ]:
from sklearn.linear_model import LogisticRegression
model = LogisticRegression( penalty='elasticnet',l1_ratio=0.5, random_state=42, multi_class='auto', solver='saga', max_iter=1000) 
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

# Calculate accuracy
print("Score: ", model.score(X_test, y_test))

accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
Score:  0.15625
Accuracy: 0.15625
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(

Hyperparameter tuning¶

In [ ]:
param_grid = {
    'logisticregression__C': np.logspace(-4, 4, 4),
    'logisticregression__l1_ratio': np.linspace(0, 1, 10)
}

pipeline = make_pipeline(StandardScaler(), LogisticRegression(penalty='elasticnet', solver='saga'))

grid_search = GridSearchCV(estimator=pipeline, param_grid=param_grid, cv=5)

grid_search.fit(X_train, y_train)

best_params = grid_search.best_params_
best_score=grid_search.best_estimator_.score(X_test, y_test)

print("Best Parameters:", best_params)
print("Best Score:", best_score)
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\model_selection\_split.py:737: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=5.
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
Best Parameters: {'logisticregression__C': 10000.0, 'logisticregression__l1_ratio': 0.0}
Best Score: 0.22395833333333334
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(

Ignoring Order of Types¶

Preprocessing¶

We do the same as before, but we sort types in the tuples alphabetically to ignore order.

In [ ]:
df = preprocessed_df.copy()

df['Types'] = df[['Type 1', 'Type 2']].apply(lambda x: sorted(tuple(filter(lambda y: pd.notna(y), x))), axis=1)

df.Types = df.Types.astype(str)

# print two pokemon types where we know type 1 and two are the same but reversed to check if order is kept into account
print("Sableye: ",df['Types'][326])
print("Spiritomb: ",df['Types'][490])

# drop the Type 1 and Type 2 columns
df.drop(['Type 1', 'Type 2'], axis=1, inplace=True)

# print head
df.head()
Sableye:  ['Dark', 'Ghost']
Spiritomb:  ['Dark', 'Ghost']
Out[ ]:
Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary Types
0 318 45 49 49 65 65 45 1 0 ['Grass', 'Poison']
1 405 60 62 63 80 80 60 1 0 ['Grass', 'Poison']
2 525 80 82 83 100 100 80 1 0 ['Grass', 'Poison']
3 625 80 100 123 122 120 80 1 0 ['Grass', 'Poison']
4 309 39 52 43 60 50 65 1 0 ['Fire', 'None']
In [ ]:
sns.countplot(df, y='Types');
No description has been provided for this image
In [ ]:
# Check if there are any pokemon with only one type
singleton_classes = df['Types'].value_counts()[df['Types'].value_counts() == 1].index.tolist()
singleton_data = df[df['Types'].isin(singleton_classes)]
other_data = df[~df['Types'].isin(singleton_classes)]


print("Number of singleton classes",len(singleton_classes))
print("number of unique type combinations",len(df['Types'].unique()))
#print(len(df['Types']))
df.head()
Number of singleton classes 24
number of unique type combinations 133
Out[ ]:
Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary Types
0 318 45 49 49 65 65 45 1 0 ['Grass', 'Poison']
1 405 60 62 63 80 80 60 1 0 ['Grass', 'Poison']
2 525 80 82 83 100 100 80 1 0 ['Grass', 'Poison']
3 625 80 100 123 122 120 80 1 0 ['Grass', 'Poison']
4 309 39 52 43 60 50 65 1 0 ['Fire', 'None']

Decision tree¶

In [ ]:
# Split the data into training and testing sets
X = df.drop(columns=['Types'])
y = df['Types']

X_train, X_test, y_train, y_test = train_test_split(other_data.drop(columns=['Types']), other_data['Types'], test_size=0.2, stratify=other_data['Types'], random_state=42)
X_train = pd.concat([X_train, singleton_data.drop(columns=['Types'])])
y_train = pd.concat([y_train, singleton_data['Types']])
X_test = pd.concat([X_test, singleton_data.drop(columns=['Types'])])
y_test = pd.concat([y_test, singleton_data['Types']])


# Initialize and train the decision tree classifier
model = DecisionTreeClassifier(random_state=42)
model.fit(X_train, y_train)

# Predict labels for the test set
y_pred = model.predict(X_test)

# Calculate accuracy
print("Score: ", model.score(X_test, y_test))

accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
Score:  0.2111111111111111
Accuracy: 0.2111111111111111

Hyperparameter tuning¶

In [ ]:
# Setup the parameters and distributions to sample from: param_dist
pipeline = make_pipeline(StandardScaler(), DecisionTreeClassifier())

param_dist = {
    "decisiontreeclassifier__max_depth": np.arange(5, 15),  
    "decisiontreeclassifier__min_samples_leaf": np.arange(1, 10)
}

# Instantiate the GridSearchCV object
grid_search_cv = GridSearchCV(pipeline, param_grid=param_dist, cv=5)

# Fit grid_search_cv using the data X and labels y.
grid_search_cv.fit(X_train, y_train) 
y_pred = grid_search_cv.predict(X_test)

# Print the best score
print("Tuned Model Parameters: {}".format(grid_search_cv.best_params_))
print("Accuracy: {}".format(grid_search_cv.best_estimator_.score(X_test, y_test)))
c:\Users\Chau\miniconda3\Lib\site-packages\sklearn\model_selection\_split.py:737: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=5.
  warnings.warn(
Tuned Model Parameters: {'decisiontreeclassifier__max_depth': 5, 'decisiontreeclassifier__min_samples_leaf': 8}
Accuracy: 0.07777777777777778

Random forest¶

In [ ]:
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(random_state=42)

model.fit(X_train, y_train)

# Predict labels for the test set
y_pred = model.predict(X_test)

# Calculate accuracy
print("Score: ", model.score(X_test, y_test))

accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
Score:  0.2388888888888889
Accuracy: 0.2388888888888889

Hyperparameter Tuning¶

There's an issue with cross-validation because some type combinations occur only once in the training data.

In [ ]:
pipeline = make_pipeline(StandardScaler(), RandomForestClassifier())

# Setup the parameters and distributions to sample from: param_dist
param_dist = {
    "randomforestclassifier__max_depth": np.arange(5, 15),  
    "randomforestclassifier__min_samples_leaf": np.arange(1, 10),
    "randomforestclassifier__n_estimators": np.arange(60, 140)
}

random_search_cv = RandomizedSearchCV(pipeline, param_distributions=param_dist, n_iter=100, cv=3, random_state=42)
#grid_search_cv = GridSearchCV(pipeline, param_grid=param_dist, cv=3)

# Fit random_search_cv using the data X and labels y
random_search_cv.fit(X_train, y_train)
#grid_search_cv.fit(X_train, y_train)

# Print the best score
print("Best score is {}".format(random_search_cv.best_estimator_.score(X_test, y_test)))
print("Best parameters are {}".format(random_search_cv.best_params_))
c:\Users\Chau\miniconda3\Lib\site-packages\sklearn\model_selection\_split.py:737: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=3.
  warnings.warn(
Best score is 0.25555555555555554
Best parameters are {'randomforestclassifier__n_estimators': 77, 'randomforestclassifier__min_samples_leaf': 1, 'randomforestclassifier__max_depth': 8}

Support vector machine¶

Radial base function¶

In [ ]:
from sklearn.svm import SVC

model = SVC(kernel='rbf', random_state=42)

model.fit(X_train, y_train)
y_pred = model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
Accuracy: 0.07777777777777778
Hyperparameter Tuning¶
In [ ]:
from sklearn.svm import SVC

# Define the pipeline with StandardScaler and SVC
pipeline = make_pipeline(StandardScaler(), SVC(kernel='rbf'))

# Define the parameter grid
param_grid = {
    'svc__C': [0.1,0.5, 1,5, 10],        # Regularization parameter
    'svc__coef0': [0.0, 1.0, 2.0], # Independent term in the polynomial kernel function
}

# Initialize GridSearchCV
grid_search = GridSearchCV(pipeline, param_grid, cv=5, n_jobs=-1)

# Perform grid search
grid_search.fit(X_train, y_train)

# Get the best parameters and score
best_params = grid_search.best_params_
best_score = grid_search.best_estimator_.score(X_test, y_test)

print("Best Parameters:", best_params)
print("Best Score:", best_score)
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\model_selection\_split.py:737: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=5.
  warnings.warn(
Best Parameters: {'svc__C': 5, 'svc__coef0': 0.0}
Best Score: 0.18888888888888888

Linear¶

In [ ]:
svm_classifier = SVC(kernel='linear', random_state=42)

# Train the SVM classifier
svm_classifier.fit(X_train, y_train)
y_pred = svm_classifier.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
Accuracy: 0.23333333333333334
Hyperparameter tuning¶
In [ ]:
from sklearn.svm import SVC

# Define the pipeline with StandardScaler and SVC
pipeline = make_pipeline(StandardScaler(), SVC(kernel='linear'))

# Define the parameter grid
param_grid = {
    'svc__C': [0.1,0.5, 1 , 5, 10],        # Regularization parameter    # Degree of the polynomial kernel
    'svc__coef0': [0.0, 1.0, 2.0], # Independent term in the polynomial kernel function
}

# Initialize GridSearchCV
grid_search = GridSearchCV(pipeline, param_grid, cv=5, n_jobs=-1)

# Perform grid search
grid_search.fit(X_train, y_train)

# Get the best parameters and score
best_params = grid_search.best_params_
best_score = grid_search.best_estimator_.score(X_test, y_test)

print("Best Parameters:", best_params)
print("Best Score:", best_score)
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\model_selection\_split.py:737: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=5.
  warnings.warn(
Best Parameters: {'svc__C': 1, 'svc__coef0': 0.0}
Best Score: 0.21666666666666667

Polynomial¶

In [ ]:
model = SVC(kernel='poly', random_state=42)

# Train the SVM classifier
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
Accuracy: 0.06666666666666667
Hyperparameter Tuning¶
In [ ]:
from sklearn.svm import SVC

# Define the pipeline with StandardScaler and SVC
pipeline = make_pipeline(StandardScaler(), SVC(kernel='poly'))

# Define the parameter grid
param_grid = {
    'svc__C': [0.1,0.5, 1,5, 10],        # Regularization parameter
    'svc__degree': [2, 3, 4, 5, 6],      # Degree of the polynomial kernel
    'svc__coef0': [0.0, 1.0, 2.0], # Independent term in the polynomial kernel function
}

# Initialize GridSearchCV
grid_search = GridSearchCV(pipeline, param_grid, cv=5, n_jobs=-1)

# Perform grid search
grid_search.fit(X_train, y_train)

# Get the best parameters and score
best_params = grid_search.best_params_
best_score = grid_search.best_estimator_.score(X_test, y_test)

print("Best Parameters:", best_params)
print("Best Score:", best_score)
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\model_selection\_split.py:737: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=5.
  warnings.warn(
Best Parameters: {'svc__C': 1, 'svc__coef0': 1.0, 'svc__degree': 3}
Best Score: 0.2222222222222222

Sigmoid¶

In [ ]:
model = SVC(kernel='sigmoid', random_state=42)

# Train the SVM classifier
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

# Calculate accuracy
print("Score: ", model.score(X_test, y_test))

accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
Score:  0.03888888888888889
Accuracy: 0.03888888888888889
Hyperparameter Tuning¶
In [ ]:
from sklearn.svm import SVC

# Define the pipeline with StandardScaler and SVC
pipeline = make_pipeline(StandardScaler(), SVC(kernel='sigmoid'))

# Define the parameter grid
param_grid = {
    'svc__C': [0.1,0.5, 1,5, 10, 50],        # Regularization parameter
    'svc__coef0': [0.0, 1.0, 2.0], # Independent term in the polynomial kernel function
}

# Initialize GridSearchCV
grid_search = GridSearchCV(pipeline, param_grid, cv=5, n_jobs=-1)

# Perform grid search
grid_search.fit(X_train, y_train)

# Get the best parameters and score
best_params = grid_search.best_params_
best_score = grid_search.best_estimator_.score(X_test, y_test)

print("Best Parameters:", best_params)
print("Best Score:", best_score)
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\model_selection\_split.py:737: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=5.
  warnings.warn(
Best Parameters: {'svc__C': 5, 'svc__coef0': 0.0}
Best Score: 0.11666666666666667

k Nearest Neighbors¶

In [ ]:
from sklearn.neighbors import KNeighborsClassifier
model = KNeighborsClassifier()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

# Calculate accuracy
print("Score: ", model.score(X_test, y_test))

accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
Score:  0.11666666666666667
Accuracy: 0.11666666666666667

Hyperparameter Tuning¶

In [ ]:
param_grid = {
    'kneighborsclassifier__n_neighbors': [3, 5, 7, 9]  # List of k values to try
}

pipeline = make_pipeline(StandardScaler(), KNeighborsClassifier())

grid_search = GridSearchCV(estimator=pipeline, param_grid=param_grid, cv=5)

grid_search.fit(X_train, y_train)

best_params = grid_search.best_params_
best_score=grid_search.best_estimator_.score(X_test, y_test)

print("Best Parameters:", best_params)
print("Best Score:", best_score)
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\model_selection\_split.py:737: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=5.
  warnings.warn(
Best Parameters: {'kneighborsclassifier__n_neighbors': 9}
Best Score: 0.10555555555555556

Logistic regression¶

In [ ]:
model = LogisticRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

# Calculate accuracy
print("Score: ", model.score(X_test, y_test))

accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
Score:  0.1388888888888889
Accuracy: 0.1388888888888889
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_logistic.py:469: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
In [ ]:
from sklearn.linear_model import LogisticRegression
model = LogisticRegression( random_state=42, multi_class='auto', solver='liblinear', max_iter=1000) 
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

# Calculate accuracy
print("Score: ", model.score(X_test, y_test))

accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
Score:  0.16111111111111112
Accuracy: 0.16111111111111112

Hyperparameter tuning¶

In [ ]:
param_grid = {
    'logisticregression__C': np.logspace(-5, 5, 5),
    'logisticregression__penalty': ['l1', 'l2']
}

pipeline = make_pipeline(StandardScaler(), LogisticRegression(solver='liblinear'))

grid_search = GridSearchCV(estimator=pipeline, param_grid=param_grid, cv=5)

grid_search.fit(X_train, y_train)

best_params = grid_search.best_params_
best_score=grid_search.best_estimator_.score(X_test, y_test)

print("Best Parameters:", best_params)
print("Best Score:", best_score)
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\model_selection\_split.py:737: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=5.
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\svm\_base.py:1237: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\svm\_base.py:1237: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\svm\_base.py:1237: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\svm\_base.py:1237: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\svm\_base.py:1237: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\svm\_base.py:1237: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\svm\_base.py:1237: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\svm\_base.py:1237: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\svm\_base.py:1237: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\svm\_base.py:1237: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  warnings.warn(
Best Parameters: {'logisticregression__C': 316.22776601683796, 'logisticregression__penalty': 'l2'}
Best Score: 0.2111111111111111
In [ ]:
from sklearn.linear_model import LogisticRegression
model = LogisticRegression( penalty='elasticnet',l1_ratio=0.5, random_state=42, multi_class='auto', solver='saga', max_iter=1000) 
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

# Calculate accuracy
print("Score: ", model.score(X_test, y_test))

accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
Score:  0.17222222222222222
Accuracy: 0.17222222222222222
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(

Hyperparameter tuning¶

In [ ]:
param_grid = {
    'logisticregression__C': np.logspace(-5, 5, 5),
    'logisticregression__l1_ratio': np.linspace(0, 1, 10)
}

pipeline = make_pipeline(StandardScaler(), LogisticRegression(penalty='elasticnet', solver='saga'))

grid_search = GridSearchCV(estimator=pipeline, param_grid=param_grid, cv=5)

grid_search.fit(X_train, y_train)

best_params = grid_search.best_params_
best_score=grid_search.best_estimator_.score(X_test, y_test)

print("Best Parameters:", best_params)
print("Best Score:", best_score)
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\model_selection\_split.py:737: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=5.
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
Best Parameters: {'logisticregression__C': 1.0, 'logisticregression__l1_ratio': 0.0}
Best Score: 0.15
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(

Multi-label Classification¶

Accounting for Order of Types¶

Preprocessing¶

In [ ]:
df = preprocessed_df.copy()

df['Types'] = df[['Type 1', 'Type 2']].apply(lambda x: tuple(filter(lambda y: pd.notna(y), x)), axis=1)
df.Types = df.Types.astype(str)

# print two pokemon types where we know type 1 and two are the same but reversed to check if order is kept into account
print("Sableye: ",df['Types'][326])
print("Spiritomb: ",df['Types'][490])

# drop the Type 1 and Type 2 columns
df.drop(['Type 1', 'Type 2'], axis=1, inplace=True)

# print head
df.head()
Sableye:  ('Dark', 'Ghost')
Spiritomb:  ('Ghost', 'Dark')
Out[ ]:
Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary Types
0 318 45 49 49 65 65 45 1 0 ('Grass', 'Poison')
1 405 60 62 63 80 80 60 1 0 ('Grass', 'Poison')
2 525 80 82 83 100 100 80 1 0 ('Grass', 'Poison')
3 625 80 100 123 122 120 80 1 0 ('Grass', 'Poison')
4 309 39 52 43 60 50 65 1 0 ('Fire', 'None')
In [ ]:
# Find classes with only one type
singleton_classes = df['Types'].value_counts()[df['Types'].value_counts() == 1].index.tolist()

To account for order of types, we create binary labels for each type combination.

In [ ]:
# Create binary labels for each Pokémon type combination
type_combinations = df['Types'].unique()
for type in type_combinations:
    df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)

singleton_data = df[df['Types'].isin(singleton_classes)]
other_data = df[~df['Types'].isin(singleton_classes)]

print("Number of singleton classes",len(singleton_classes))
print("Number of unique type combinations",len(df['Types'].unique()))
print(len(df['Types']))
df.head()
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
Number of singleton classes 39
Number of unique type combinations 154
800
Out[ ]:
Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary Types ... ('Rock', 'Dragon') ('Rock', 'Ice') ('Fighting', 'Flying') ('Electric', 'Fairy') ('Rock', 'Fairy') ('Ghost', 'Grass') ('Flying', 'Dragon') ('Psychic', 'Ghost') ('Psychic', 'Dark') ('Fire', 'Water')
0 318 45 49 49 65 65 45 1 0 ('Grass', 'Poison') ... 0 0 0 0 0 0 0 0 0 0
1 405 60 62 63 80 80 60 1 0 ('Grass', 'Poison') ... 0 0 0 0 0 0 0 0 0 0
2 525 80 82 83 100 100 80 1 0 ('Grass', 'Poison') ... 0 0 0 0 0 0 0 0 0 0
3 625 80 100 123 122 120 80 1 0 ('Grass', 'Poison') ... 0 0 0 0 0 0 0 0 0 0
4 309 39 52 43 60 50 65 1 0 ('Fire', 'None') ... 0 0 0 0 0 0 0 0 0 0

5 rows × 164 columns

In [ ]:
# Drop the 'Types' column
df = df.drop(columns=['Types'])
other_data.drop(columns=['Types'], inplace=True)
singleton_data.drop(columns=['Types'], inplace=True)

Decision Tree¶

In [ ]:
from sklearn.metrics import classification_report
# Split the data into training and testing sets
y = df[type_combinations]

X_train, X_test, y_train, y_test = train_test_split(other_data.drop(columns=type_combinations), other_data[type_combinations], test_size=0.2, stratify=other_data[type_combinations], random_state=42)
X_train = pd.concat([X_train, singleton_data.drop(columns=type_combinations)])
y_train = pd.concat([y_train, singleton_data[type_combinations]])
X_test = pd.concat([X_test, singleton_data.drop(columns=type_combinations)])
y_test = pd.concat([y_test, singleton_data[type_combinations]])


# Initialize and train the decision tree classifier
model = DecisionTreeClassifier(random_state=42)
model.fit(X_train, y_train)

# Predict labels for the test set
y_pred = model.predict(X_test)
score = model.score(X_test, y_test)

# Calculate accuracy
print("Score: ", score)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy: ", accuracy)
print(classification_report(y_test, y_pred))
Score:  0.25
Accuracy:  0.25
              precision    recall  f1-score   support

           0       0.20      0.33      0.25         3
           1       0.00      0.00      0.00         6
           2       0.00      0.00      0.00         1
           3       1.00      1.00      1.00         1
           4       0.07      0.08      0.08        12
           5       0.00      0.00      0.00         4
           6       0.00      0.00      0.00         3
           7       0.33      0.33      0.33         3
           8       0.17      0.20      0.18         5
           9       0.12      0.17      0.14        12
          10       0.00      0.00      0.00         3
          11       0.29      0.33      0.31         6
          12       0.00      0.00      0.00         3
          13       0.00      0.00      0.00         0
          14       0.00      0.00      0.00         3
          15       0.00      0.00      0.00         1
          16       0.00      0.00      0.00         1
          17       0.00      0.00      0.00         1
          18       0.00      0.00      0.00         4
          19       1.00      1.00      1.00         1
          20       0.33      0.25      0.29         8
          21       0.00      0.00      0.00         1
          22       0.00      0.00      0.00         1
          23       0.00      0.00      0.00         1
          24       0.00      0.00      0.00         1
          25       0.00      0.00      0.00         1
          26       0.00      0.00      0.00         1
          27       0.00      0.00      0.00         0
          28       0.00      0.00      0.00         1
          29       0.00      0.00      0.00         7
          30       0.00      0.00      0.00         1
          31       0.00      0.00      0.00         0
          32       0.00      0.00      0.00         2
          33       0.00      0.00      0.00         1
          34       0.00      0.00      0.00         1
          35       0.00      0.00      0.00         1
          36       0.00      0.00      0.00         0
          37       0.00      0.00      0.00         1
          38       0.00      0.00      0.00         2
          39       0.00      0.00      0.00         1
          40       0.00      0.00      0.00         1
          41       0.00      0.00      0.00         0
          42       0.00      0.00      0.00         0
          43       0.00      0.00      0.00         1
          44       1.00      1.00      1.00         1
          45       0.00      0.00      0.00         0
          46       0.00      0.00      0.00         2
          47       0.00      0.00      0.00         1
          48       0.00      0.00      0.00         2
          49       0.00      0.00      0.00         2
          50       0.00      0.00      0.00         1
          51       0.00      0.00      0.00         2
          52       0.00      0.00      0.00         0
          53       0.00      0.00      0.00         2
          54       0.00      0.00      0.00         1
          55       0.00      0.00      0.00         0
          56       0.00      0.00      0.00         1
          57       0.00      0.00      0.00         0
          58       0.00      0.00      0.00         0
          59       1.00      1.00      1.00         1
          60       0.00      0.00      0.00         1
          61       0.00      0.00      0.00         1
          62       0.50      1.00      0.67         1
          63       0.00      0.00      0.00         1
          64       0.00      0.00      0.00         0
          65       0.00      0.00      0.00         0
          66       0.50      1.00      0.67         1
          67       1.00      1.00      1.00         1
          68       0.50      0.50      0.50         2
          69       0.00      0.00      0.00         1
          70       0.00      0.00      0.00         1
          71       0.50      1.00      0.67         1
          72       0.00      0.00      0.00         1
          73       0.00      0.00      0.00         0
          74       1.00      1.00      1.00         1
          75       0.00      0.00      0.00         0
          76       0.00      0.00      0.00         1
          77       0.00      0.00      0.00         1
          78       0.00      0.00      0.00         1
          79       0.00      0.00      0.00         1
          80       0.00      0.00      0.00         1
          81       0.00      0.00      0.00         0
          82       1.00      1.00      1.00         1
          83       0.00      0.00      0.00         0
          84       0.00      0.00      0.00         0
          85       0.00      0.00      0.00         0
          86       0.00      0.00      0.00         0
          87       0.00      0.00      0.00         3
          88       0.00      0.00      0.00         1
          89       0.00      0.00      0.00         2
          90       0.00      0.00      0.00         1
          91       1.00      1.00      1.00         1
          92       1.00      1.00      1.00         1
          93       0.50      1.00      0.67         1
          94       1.00      1.00      1.00         1
          95       0.00      0.00      0.00         1
          96       0.00      0.00      0.00         0
          97       0.00      0.00      0.00         0
          98       0.00      0.00      0.00         1
          99       1.00      1.00      1.00         1
         100       0.00      0.00      0.00         1
         101       0.00      0.00      0.00         0
         102       0.50      1.00      0.67         1
         103       0.00      0.00      0.00         0
         104       0.00      0.00      0.00         1
         105       1.00      1.00      1.00         1
         106       0.50      1.00      0.67         1
         107       0.00      0.00      0.00         1
         108       0.00      0.00      0.00         1
         109       0.00      0.00      0.00         1
         110       0.00      0.00      0.00         1
         111       1.00      1.00      1.00         1
         112       1.00      1.00      1.00         1
         113       0.00      0.00      0.00         0
         114       1.00      1.00      1.00         1
         115       0.50      1.00      0.67         1
         116       0.00      0.00      0.00         0
         117       0.00      0.00      0.00         1
         118       0.00      0.00      0.00         0
         119       0.00      0.00      0.00         0
         120       0.00      0.00      0.00         0
         121       0.00      0.00      0.00         0
         122       0.00      0.00      0.00         0
         123       0.00      0.00      0.00         0
         124       0.00      0.00      0.00         1
         125       1.00      1.00      1.00         1
         126       0.00      0.00      0.00         0
         127       0.00      0.00      0.00         0
         128       0.00      0.00      0.00         1
         129       0.00      0.00      0.00         0
         130       1.00      1.00      1.00         1
         131       1.00      1.00      1.00         1
         132       0.00      0.00      0.00         0
         133       0.33      1.00      0.50         1
         134       1.00      1.00      1.00         1
         135       0.00      0.00      0.00         1
         136       1.00      1.00      1.00         1
         137       0.00      0.00      0.00         0
         138       1.00      1.00      1.00         1
         139       0.00      0.00      0.00         1
         140       0.00      0.00      0.00         0
         141       1.00      1.00      1.00         1
         142       0.50      1.00      0.67         1
         143       0.00      0.00      0.00         0
         144       0.00      0.00      0.00         0
         145       0.00      0.00      0.00         1
         146       1.00      1.00      1.00         1
         147       1.00      1.00      1.00         1
         148       0.00      0.00      0.00         1
         149       1.00      0.50      0.67         2
         150       0.00      0.00      0.00         0
         151       1.00      1.00      1.00         1
         152       1.00      1.00      1.00         1
         153       1.00      1.00      1.00         1

   micro avg       0.26      0.25      0.25       192
   macro avg       0.22      0.25      0.23       192
weighted avg       0.23      0.25      0.23       192
 samples avg       0.25      0.25      0.25       192

C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Recall is ill-defined and being set to 0.0 in labels with no true samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no true nor predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in samples with no predicted labels. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))

Hyperparameter tuning¶

In [ ]:
pipeline = make_pipeline(StandardScaler(), DecisionTreeClassifier())

# Setup the parameters
param_dist = {
    "decisiontreeclassifier__max_depth": [5, 6, 7, 8, 9, 10, 15, 30, None], 
    "decisiontreeclassifier__min_samples_leaf": np.arange(1, 10)
}

# Instantiate the GridSearchCV object
grid_search_cv = GridSearchCV(pipeline, param_grid=param_dist, cv=5)

# Fit grid_search_cv using the data X and labels y.
grid_search_cv.fit(X_train, y_train) 
y_pred = grid_search_cv.predict(X_test)

# Print the best score
print("Tuned Model Parameters: {}".format(grid_search_cv.best_params_))
print("Accuracy: {}".format(grid_search_cv.best_estimator_.score(X_test, y_test)))
print(classification_report(y_test, y_pred))
Tuned Model Parameters: {'decisiontreeclassifier__max_depth': 30, 'decisiontreeclassifier__min_samples_leaf': 1}
Accuracy: 0.2552083333333333
              precision    recall  f1-score   support

           0       0.20      0.33      0.25         3
           1       0.00      0.00      0.00         6
           2       0.00      0.00      0.00         1
           3       1.00      1.00      1.00         1
           4       0.07      0.08      0.08        12
           5       0.00      0.00      0.00         4
           6       0.00      0.00      0.00         3
           7       0.50      0.33      0.40         3
           8       0.25      0.40      0.31         5
           9       0.06      0.08      0.07        12
          10       0.00      0.00      0.00         3
          11       0.17      0.17      0.17         6
          12       0.33      0.33      0.33         3
          13       0.00      0.00      0.00         0
          14       0.00      0.00      0.00         3
          15       0.00      0.00      0.00         1
          16       0.00      0.00      0.00         1
          17       0.00      0.00      0.00         1
          18       0.00      0.00      0.00         4
          19       1.00      1.00      1.00         1
          20       0.33      0.25      0.29         8
          21       0.00      0.00      0.00         1
          22       0.00      0.00      0.00         1
          23       0.00      0.00      0.00         1
          24       0.00      0.00      0.00         1
          25       0.00      0.00      0.00         1
          26       0.00      0.00      0.00         1
          27       0.00      0.00      0.00         0
          28       0.00      0.00      0.00         1
          29       0.00      0.00      0.00         7
          30       0.00      0.00      0.00         1
          31       0.00      0.00      0.00         0
          32       0.00      0.00      0.00         2
          33       0.00      0.00      0.00         1
          34       0.00      0.00      0.00         1
          35       0.00      0.00      0.00         1
          36       0.00      0.00      0.00         0
          37       0.00      0.00      0.00         1
          38       0.00      0.00      0.00         2
          39       0.00      0.00      0.00         1
          40       0.00      0.00      0.00         1
          41       0.00      0.00      0.00         0
          42       0.00      0.00      0.00         0
          43       0.00      0.00      0.00         1
          44       1.00      1.00      1.00         1
          45       0.00      0.00      0.00         0
          46       0.00      0.00      0.00         2
          47       0.00      0.00      0.00         1
          48       0.00      0.00      0.00         2
          49       0.00      0.00      0.00         2
          50       0.00      0.00      0.00         1
          51       0.00      0.00      0.00         2
          52       0.00      0.00      0.00         0
          53       0.00      0.00      0.00         2
          54       0.00      0.00      0.00         1
          55       0.00      0.00      0.00         0
          56       0.00      0.00      0.00         1
          57       0.00      0.00      0.00         0
          58       0.00      0.00      0.00         0
          59       0.50      1.00      0.67         1
          60       0.00      0.00      0.00         1
          61       0.00      0.00      0.00         1
          62       0.50      1.00      0.67         1
          63       0.00      0.00      0.00         1
          64       0.00      0.00      0.00         0
          65       0.00      0.00      0.00         0
          66       0.50      1.00      0.67         1
          67       1.00      1.00      1.00         1
          68       0.00      0.00      0.00         2
          69       0.00      0.00      0.00         1
          70       0.00      0.00      0.00         1
          71       0.50      1.00      0.67         1
          72       0.00      0.00      0.00         1
          73       0.00      0.00      0.00         0
          74       1.00      1.00      1.00         1
          75       0.00      0.00      0.00         0
          76       0.00      0.00      0.00         1
          77       0.00      0.00      0.00         1
          78       0.00      0.00      0.00         1
          79       0.00      0.00      0.00         1
          80       0.00      0.00      0.00         1
          81       0.00      0.00      0.00         0
          82       1.00      1.00      1.00         1
          83       0.00      0.00      0.00         0
          84       0.00      0.00      0.00         0
          85       0.00      0.00      0.00         0
          86       0.00      0.00      0.00         0
          87       0.00      0.00      0.00         3
          88       0.00      0.00      0.00         1
          89       0.00      0.00      0.00         2
          90       0.00      0.00      0.00         1
          91       0.50      1.00      0.67         1
          92       1.00      1.00      1.00         1
          93       0.50      1.00      0.67         1
          94       1.00      1.00      1.00         1
          95       0.00      0.00      0.00         1
          96       0.00      0.00      0.00         0
          97       0.00      0.00      0.00         0
          98       0.00      0.00      0.00         1
          99       1.00      1.00      1.00         1
         100       0.00      0.00      0.00         1
         101       0.00      0.00      0.00         0
         102       0.50      1.00      0.67         1
         103       0.00      0.00      0.00         0
         104       0.00      0.00      0.00         1
         105       1.00      1.00      1.00         1
         106       1.00      1.00      1.00         1
         107       0.00      0.00      0.00         1
         108       0.00      0.00      0.00         1
         109       0.00      0.00      0.00         1
         110       0.00      0.00      0.00         1
         111       1.00      1.00      1.00         1
         112       1.00      1.00      1.00         1
         113       0.00      0.00      0.00         0
         114       1.00      1.00      1.00         1
         115       0.50      1.00      0.67         1
         116       0.00      0.00      0.00         0
         117       0.00      0.00      0.00         1
         118       0.00      0.00      0.00         0
         119       0.00      0.00      0.00         0
         120       0.00      0.00      0.00         0
         121       0.00      0.00      0.00         0
         122       0.00      0.00      0.00         0
         123       0.00      0.00      0.00         0
         124       1.00      1.00      1.00         1
         125       1.00      1.00      1.00         1
         126       0.00      0.00      0.00         0
         127       0.00      0.00      0.00         0
         128       0.00      0.00      0.00         1
         129       0.00      0.00      0.00         0
         130       1.00      1.00      1.00         1
         131       1.00      1.00      1.00         1
         132       0.00      0.00      0.00         0
         133       1.00      1.00      1.00         1
         134       1.00      1.00      1.00         1
         135       0.00      0.00      0.00         1
         136       1.00      1.00      1.00         1
         137       0.00      0.00      0.00         0
         138       1.00      1.00      1.00         1
         139       0.00      0.00      0.00         1
         140       0.00      0.00      0.00         0
         141       1.00      1.00      1.00         1
         142       0.50      1.00      0.67         1
         143       0.00      0.00      0.00         0
         144       0.00      0.00      0.00         0
         145       0.00      0.00      0.00         1
         146       1.00      1.00      1.00         1
         147       1.00      1.00      1.00         1
         148       0.00      0.00      0.00         1
         149       1.00      1.00      1.00         2
         150       0.00      0.00      0.00         0
         151       1.00      1.00      1.00         1
         152       1.00      1.00      1.00         1
         153       0.50      1.00      0.67         1

   micro avg       0.26      0.26      0.26       192
   macro avg       0.23      0.26      0.24       192
weighted avg       0.23      0.26      0.24       192
 samples avg       0.26      0.26      0.26       192

C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Recall is ill-defined and being set to 0.0 in labels with no true samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no true nor predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in samples with no predicted labels. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))

Random Forest¶

In [ ]:
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(random_state=42)

model.fit(X_train, y_train)

# Predict labels for the test set
y_pred = model.predict(X_test)
score = model.score(X_test, y_test)
# Calculate accuracy
print("Score: ", score)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
print(classification_report(y_test, y_pred))
Score:  0.203125
Accuracy: 0.203125
              precision    recall  f1-score   support

           0       0.00      0.00      0.00         3
           1       0.00      0.00      0.00         6
           2       0.00      0.00      0.00         1
           3       1.00      1.00      1.00         1
           4       0.00      0.00      0.00        12
           5       0.00      0.00      0.00         4
           6       0.00      0.00      0.00         3
           7       0.00      0.00      0.00         3
           8       0.00      0.00      0.00         5
           9       0.00      0.00      0.00        12
          10       0.00      0.00      0.00         3
          11       0.00      0.00      0.00         6
          12       0.00      0.00      0.00         3
          13       0.00      0.00      0.00         0
          14       0.00      0.00      0.00         3
          15       0.00      0.00      0.00         1
          16       0.00      0.00      0.00         1
          17       0.00      0.00      0.00         1
          18       0.00      0.00      0.00         4
          19       1.00      1.00      1.00         1
          20       1.00      0.25      0.40         8
          21       0.00      0.00      0.00         1
          22       0.00      0.00      0.00         1
          23       0.00      0.00      0.00         1
          24       0.00      0.00      0.00         1
          25       0.00      0.00      0.00         1
          26       0.00      0.00      0.00         1
          27       0.00      0.00      0.00         0
          28       0.00      0.00      0.00         1
          29       0.00      0.00      0.00         7
          30       0.00      0.00      0.00         1
          31       0.00      0.00      0.00         0
          32       0.00      0.00      0.00         2
          33       0.00      0.00      0.00         1
          34       0.00      0.00      0.00         1
          35       0.00      0.00      0.00         1
          36       0.00      0.00      0.00         0
          37       0.00      0.00      0.00         1
          38       0.00      0.00      0.00         2
          39       0.00      0.00      0.00         1
          40       0.00      0.00      0.00         1
          41       0.00      0.00      0.00         0
          42       0.00      0.00      0.00         0
          43       0.00      0.00      0.00         1
          44       1.00      1.00      1.00         1
          45       0.00      0.00      0.00         0
          46       0.00      0.00      0.00         2
          47       0.00      0.00      0.00         1
          48       0.00      0.00      0.00         2
          49       0.00      0.00      0.00         2
          50       0.00      0.00      0.00         1
          51       0.00      0.00      0.00         2
          52       0.00      0.00      0.00         0
          53       0.00      0.00      0.00         2
          54       0.00      0.00      0.00         1
          55       0.00      0.00      0.00         0
          56       0.00      0.00      0.00         1
          57       0.00      0.00      0.00         0
          58       0.00      0.00      0.00         0
          59       1.00      1.00      1.00         1
          60       0.00      0.00      0.00         1
          61       0.00      0.00      0.00         1
          62       1.00      1.00      1.00         1
          63       0.00      0.00      0.00         1
          64       0.00      0.00      0.00         0
          65       0.00      0.00      0.00         0
          66       1.00      1.00      1.00         1
          67       1.00      1.00      1.00         1
          68       0.00      0.00      0.00         2
          69       0.00      0.00      0.00         1
          70       0.00      0.00      0.00         1
          71       1.00      1.00      1.00         1
          72       0.00      0.00      0.00         1
          73       0.00      0.00      0.00         0
          74       1.00      1.00      1.00         1
          75       0.00      0.00      0.00         0
          76       0.00      0.00      0.00         1
          77       0.00      0.00      0.00         1
          78       0.00      0.00      0.00         1
          79       0.00      0.00      0.00         1
          80       0.00      0.00      0.00         1
          81       0.00      0.00      0.00         0
          82       1.00      1.00      1.00         1
          83       0.00      0.00      0.00         0
          84       0.00      0.00      0.00         0
          85       0.00      0.00      0.00         0
          86       0.00      0.00      0.00         0
          87       0.00      0.00      0.00         3
          88       0.00      0.00      0.00         1
          89       0.00      0.00      0.00         2
          90       0.00      0.00      0.00         1
          91       1.00      1.00      1.00         1
          92       1.00      1.00      1.00         1
          93       1.00      1.00      1.00         1
          94       1.00      1.00      1.00         1
          95       0.00      0.00      0.00         1
          96       0.00      0.00      0.00         0
          97       0.00      0.00      0.00         0
          98       0.00      0.00      0.00         1
          99       1.00      1.00      1.00         1
         100       0.00      0.00      0.00         1
         101       0.00      0.00      0.00         0
         102       1.00      1.00      1.00         1
         103       0.00      0.00      0.00         0
         104       0.00      0.00      0.00         1
         105       1.00      1.00      1.00         1
         106       1.00      1.00      1.00         1
         107       0.00      0.00      0.00         1
         108       0.00      0.00      0.00         1
         109       0.00      0.00      0.00         1
         110       0.00      0.00      0.00         1
         111       1.00      1.00      1.00         1
         112       1.00      1.00      1.00         1
         113       0.00      0.00      0.00         0
         114       1.00      1.00      1.00         1
         115       1.00      1.00      1.00         1
         116       0.00      0.00      0.00         0
         117       0.00      0.00      0.00         1
         118       0.00      0.00      0.00         0
         119       0.00      0.00      0.00         0
         120       0.00      0.00      0.00         0
         121       0.00      0.00      0.00         0
         122       0.00      0.00      0.00         0
         123       0.00      0.00      0.00         0
         124       0.00      0.00      0.00         1
         125       1.00      1.00      1.00         1
         126       0.00      0.00      0.00         0
         127       0.00      0.00      0.00         0
         128       0.00      0.00      0.00         1
         129       0.00      0.00      0.00         0
         130       1.00      1.00      1.00         1
         131       1.00      1.00      1.00         1
         132       0.00      0.00      0.00         0
         133       1.00      1.00      1.00         1
         134       1.00      1.00      1.00         1
         135       0.00      0.00      0.00         1
         136       1.00      1.00      1.00         1
         137       0.00      0.00      0.00         0
         138       1.00      1.00      1.00         1
         139       0.00      0.00      0.00         1
         140       0.00      0.00      0.00         0
         141       1.00      1.00      1.00         1
         142       1.00      1.00      1.00         1
         143       0.00      0.00      0.00         0
         144       0.00      0.00      0.00         0
         145       0.00      0.00      0.00         1
         146       1.00      1.00      1.00         1
         147       1.00      1.00      1.00         1
         148       0.00      0.00      0.00         1
         149       1.00      0.50      0.67         2
         150       0.00      0.00      0.00         0
         151       1.00      1.00      1.00         1
         152       1.00      1.00      1.00         1
         153       1.00      1.00      1.00         1

   micro avg       0.95      0.20      0.33       192
   macro avg       0.25      0.24      0.24       192
weighted avg       0.24      0.20      0.21       192
 samples avg       0.20      0.20      0.20       192

C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Recall is ill-defined and being set to 0.0 in labels with no true samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no true nor predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in samples with no predicted labels. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))

Hyperparameter Tuning¶

In [ ]:
pipeline = make_pipeline(StandardScaler(), RandomForestClassifier())

# Setup the parameters and distributions to sample from: param_dist
param_dist = {
    "randomforestclassifier__max_depth": [5, 15, 30, None],  
    "randomforestclassifier__min_samples_leaf": np.arange(1, 10, 3),
    "randomforestclassifier__n_estimators": np.arange(60, 140, 20)
}

random_search_cv = RandomizedSearchCV(pipeline, param_distributions=param_dist, n_iter=75, cv=3, random_state=42)
#grid_search_cv = GridSearchCV(pipeline, param_grid=param_dist, cv=3)

# Fit random_search_cv using the data X and labels y
random_search_cv.fit(X_train, y_train)
#grid_search_cv.fit(X_train, y_train)

# Print the best score
print("Best score is {}".format(random_search_cv.best_estimator_.score(X_test, y_test)))
print("Best parameters are {}".format(random_search_cv.best_params_))
print(classification_report(y_test, y_pred))
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\model_selection\_search.py:318: UserWarning: The total space of parameters 48 is smaller than n_iter=75. Running 48 iterations. For exhaustive searches, use GridSearchCV.
  warnings.warn(
Best score is 0.203125
Best parameters are {'randomforestclassifier__n_estimators': 120, 'randomforestclassifier__min_samples_leaf': 1, 'randomforestclassifier__max_depth': 30}
              precision    recall  f1-score   support

           0       0.00      0.00      0.00         3
           1       0.00      0.00      0.00         6
           2       0.00      0.00      0.00         1
           3       1.00      1.00      1.00         1
           4       0.00      0.00      0.00        12
           5       0.00      0.00      0.00         4
           6       0.00      0.00      0.00         3
           7       0.00      0.00      0.00         3
           8       0.00      0.00      0.00         5
           9       0.00      0.00      0.00        12
          10       0.00      0.00      0.00         3
          11       0.00      0.00      0.00         6
          12       0.00      0.00      0.00         3
          13       0.00      0.00      0.00         0
          14       0.00      0.00      0.00         3
          15       0.00      0.00      0.00         1
          16       0.00      0.00      0.00         1
          17       0.00      0.00      0.00         1
          18       0.00      0.00      0.00         4
          19       1.00      1.00      1.00         1
          20       1.00      0.25      0.40         8
          21       0.00      0.00      0.00         1
          22       0.00      0.00      0.00         1
          23       0.00      0.00      0.00         1
          24       0.00      0.00      0.00         1
          25       0.00      0.00      0.00         1
          26       0.00      0.00      0.00         1
          27       0.00      0.00      0.00         0
          28       0.00      0.00      0.00         1
          29       0.00      0.00      0.00         7
          30       0.00      0.00      0.00         1
          31       0.00      0.00      0.00         0
          32       0.00      0.00      0.00         2
          33       0.00      0.00      0.00         1
          34       0.00      0.00      0.00         1
          35       0.00      0.00      0.00         1
          36       0.00      0.00      0.00         0
          37       0.00      0.00      0.00         1
          38       0.00      0.00      0.00         2
          39       0.00      0.00      0.00         1
          40       0.00      0.00      0.00         1
          41       0.00      0.00      0.00         0
          42       0.00      0.00      0.00         0
          43       0.00      0.00      0.00         1
          44       1.00      1.00      1.00         1
          45       0.00      0.00      0.00         0
          46       0.00      0.00      0.00         2
          47       0.00      0.00      0.00         1
          48       0.00      0.00      0.00         2
          49       0.00      0.00      0.00         2
          50       0.00      0.00      0.00         1
          51       0.00      0.00      0.00         2
          52       0.00      0.00      0.00         0
          53       0.00      0.00      0.00         2
          54       0.00      0.00      0.00         1
          55       0.00      0.00      0.00         0
          56       0.00      0.00      0.00         1
          57       0.00      0.00      0.00         0
          58       0.00      0.00      0.00         0
          59       1.00      1.00      1.00         1
          60       0.00      0.00      0.00         1
          61       0.00      0.00      0.00         1
          62       1.00      1.00      1.00         1
          63       0.00      0.00      0.00         1
          64       0.00      0.00      0.00         0
          65       0.00      0.00      0.00         0
          66       1.00      1.00      1.00         1
          67       1.00      1.00      1.00         1
          68       0.00      0.00      0.00         2
          69       0.00      0.00      0.00         1
          70       0.00      0.00      0.00         1
          71       1.00      1.00      1.00         1
          72       0.00      0.00      0.00         1
          73       0.00      0.00      0.00         0
          74       1.00      1.00      1.00         1
          75       0.00      0.00      0.00         0
          76       0.00      0.00      0.00         1
          77       0.00      0.00      0.00         1
          78       0.00      0.00      0.00         1
          79       0.00      0.00      0.00         1
          80       0.00      0.00      0.00         1
          81       0.00      0.00      0.00         0
          82       1.00      1.00      1.00         1
          83       0.00      0.00      0.00         0
          84       0.00      0.00      0.00         0
          85       0.00      0.00      0.00         0
          86       0.00      0.00      0.00         0
          87       0.00      0.00      0.00         3
          88       0.00      0.00      0.00         1
          89       0.00      0.00      0.00         2
          90       0.00      0.00      0.00         1
          91       1.00      1.00      1.00         1
          92       1.00      1.00      1.00         1
          93       1.00      1.00      1.00         1
          94       1.00      1.00      1.00         1
          95       0.00      0.00      0.00         1
          96       0.00      0.00      0.00         0
          97       0.00      0.00      0.00         0
          98       0.00      0.00      0.00         1
          99       1.00      1.00      1.00         1
         100       0.00      0.00      0.00         1
         101       0.00      0.00      0.00         0
         102       1.00      1.00      1.00         1
         103       0.00      0.00      0.00         0
         104       0.00      0.00      0.00         1
         105       1.00      1.00      1.00         1
         106       1.00      1.00      1.00         1
         107       0.00      0.00      0.00         1
         108       0.00      0.00      0.00         1
         109       0.00      0.00      0.00         1
         110       0.00      0.00      0.00         1
         111       1.00      1.00      1.00         1
         112       1.00      1.00      1.00         1
         113       0.00      0.00      0.00         0
         114       1.00      1.00      1.00         1
         115       1.00      1.00      1.00         1
         116       0.00      0.00      0.00         0
         117       0.00      0.00      0.00         1
         118       0.00      0.00      0.00         0
         119       0.00      0.00      0.00         0
         120       0.00      0.00      0.00         0
         121       0.00      0.00      0.00         0
         122       0.00      0.00      0.00         0
         123       0.00      0.00      0.00         0
         124       0.00      0.00      0.00         1
         125       1.00      1.00      1.00         1
         126       0.00      0.00      0.00         0
         127       0.00      0.00      0.00         0
         128       0.00      0.00      0.00         1
         129       0.00      0.00      0.00         0
         130       1.00      1.00      1.00         1
         131       1.00      1.00      1.00         1
         132       0.00      0.00      0.00         0
         133       1.00      1.00      1.00         1
         134       1.00      1.00      1.00         1
         135       0.00      0.00      0.00         1
         136       1.00      1.00      1.00         1
         137       0.00      0.00      0.00         0
         138       1.00      1.00      1.00         1
         139       0.00      0.00      0.00         1
         140       0.00      0.00      0.00         0
         141       1.00      1.00      1.00         1
         142       1.00      1.00      1.00         1
         143       0.00      0.00      0.00         0
         144       0.00      0.00      0.00         0
         145       0.00      0.00      0.00         1
         146       1.00      1.00      1.00         1
         147       1.00      1.00      1.00         1
         148       0.00      0.00      0.00         1
         149       1.00      0.50      0.67         2
         150       0.00      0.00      0.00         0
         151       1.00      1.00      1.00         1
         152       1.00      1.00      1.00         1
         153       1.00      1.00      1.00         1

   micro avg       0.95      0.20      0.33       192
   macro avg       0.25      0.24      0.24       192
weighted avg       0.24      0.20      0.21       192
 samples avg       0.20      0.20      0.20       192

C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Recall is ill-defined and being set to 0.0 in labels with no true samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no true nor predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in samples with no predicted labels. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))

K Nearest neighbors¶

In [ ]:
from sklearn.neighbors import KNeighborsClassifier
model = KNeighborsClassifier()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

# Calculate accuracy
print("Score: ", model.score(X_test, y_test))
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
print(classification_report(y_test, y_pred))
Score:  0.010416666666666666
Accuracy: 0.010416666666666666
              precision    recall  f1-score   support

           0       0.00      0.00      0.00         3
           1       0.00      0.00      0.00         6
           2       0.00      0.00      0.00         1
           3       0.00      0.00      0.00         1
           4       0.00      0.00      0.00        12
           5       0.00      0.00      0.00         4
           6       0.00      0.00      0.00         3
           7       0.00      0.00      0.00         3
           8       1.00      0.20      0.33         5
           9       0.00      0.00      0.00        12
          10       0.00      0.00      0.00         3
          11       0.00      0.00      0.00         6
          12       0.00      0.00      0.00         3
          13       0.00      0.00      0.00         0
          14       0.00      0.00      0.00         3
          15       0.00      0.00      0.00         1
          16       0.00      0.00      0.00         1
          17       0.00      0.00      0.00         1
          18       0.00      0.00      0.00         4
          19       0.00      0.00      0.00         1
          20       0.00      0.00      0.00         8
          21       0.00      0.00      0.00         1
          22       0.00      0.00      0.00         1
          23       0.00      0.00      0.00         1
          24       0.00      0.00      0.00         1
          25       0.00      0.00      0.00         1
          26       0.00      0.00      0.00         1
          27       0.00      0.00      0.00         0
          28       0.00      0.00      0.00         1
          29       0.00      0.00      0.00         7
          30       0.00      0.00      0.00         1
          31       0.00      0.00      0.00         0
          32       0.00      0.00      0.00         2
          33       0.00      0.00      0.00         1
          34       0.00      0.00      0.00         1
          35       0.00      0.00      0.00         1
          36       0.00      0.00      0.00         0
          37       0.00      0.00      0.00         1
          38       0.00      0.00      0.00         2
          39       0.00      0.00      0.00         1
          40       0.00      0.00      0.00         1
          41       0.00      0.00      0.00         0
          42       0.00      0.00      0.00         0
          43       0.00      0.00      0.00         1
          44       0.00      0.00      0.00         1
          45       0.00      0.00      0.00         0
          46       0.00      0.00      0.00         2
          47       0.00      0.00      0.00         1
          48       0.00      0.00      0.00         2
          49       0.00      0.00      0.00         2
          50       0.00      0.00      0.00         1
          51       0.00      0.00      0.00         2
          52       0.00      0.00      0.00         0
          53       0.00      0.00      0.00         2
          54       0.00      0.00      0.00         1
          55       0.00      0.00      0.00         0
          56       0.00      0.00      0.00         1
          57       0.00      0.00      0.00         0
          58       0.00      0.00      0.00         0
          59       0.00      0.00      0.00         1
          60       0.00      0.00      0.00         1
          61       0.00      0.00      0.00         1
          62       0.00      0.00      0.00         1
          63       0.00      0.00      0.00         1
          64       0.00      0.00      0.00         0
          65       0.00      0.00      0.00         0
          66       0.00      0.00      0.00         1
          67       0.00      0.00      0.00         1
          68       0.00      0.00      0.00         2
          69       0.00      0.00      0.00         1
          70       0.00      0.00      0.00         1
          71       0.00      0.00      0.00         1
          72       0.00      0.00      0.00         1
          73       0.00      0.00      0.00         0
          74       0.00      0.00      0.00         1
          75       0.00      0.00      0.00         0
          76       0.00      0.00      0.00         1
          77       0.00      0.00      0.00         1
          78       0.00      0.00      0.00         1
          79       0.00      0.00      0.00         1
          80       0.00      0.00      0.00         1
          81       0.00      0.00      0.00         0
          82       0.00      0.00      0.00         1
          83       0.00      0.00      0.00         0
          84       0.00      0.00      0.00         0
          85       0.00      0.00      0.00         0
          86       0.00      0.00      0.00         0
          87       0.00      0.00      0.00         3
          88       0.00      0.00      0.00         1
          89       0.00      0.00      0.00         2
          90       0.00      0.00      0.00         1
          91       0.00      0.00      0.00         1
          92       0.00      0.00      0.00         1
          93       0.00      0.00      0.00         1
          94       0.00      0.00      0.00         1
          95       0.00      0.00      0.00         1
          96       0.00      0.00      0.00         0
          97       0.00      0.00      0.00         0
          98       0.00      0.00      0.00         1
          99       0.00      0.00      0.00         1
         100       0.00      0.00      0.00         1
         101       0.00      0.00      0.00         0
         102       0.00      0.00      0.00         1
         103       0.00      0.00      0.00         0
         104       0.00      0.00      0.00         1
         105       0.00      0.00      0.00         1
         106       0.00      0.00      0.00         1
         107       0.00      0.00      0.00         1
         108       0.00      0.00      0.00         1
         109       0.00      0.00      0.00         1
         110       0.00      0.00      0.00         1
         111       0.00      0.00      0.00         1
         112       0.00      0.00      0.00         1
         113       0.00      0.00      0.00         0
         114       0.00      0.00      0.00         1
         115       0.00      0.00      0.00         1
         116       0.00      0.00      0.00         0
         117       0.00      0.00      0.00         1
         118       0.00      0.00      0.00         0
         119       0.00      0.00      0.00         0
         120       0.00      0.00      0.00         0
         121       0.00      0.00      0.00         0
         122       0.00      0.00      0.00         0
         123       0.00      0.00      0.00         0
         124       0.00      0.00      0.00         1
         125       0.00      0.00      0.00         1
         126       0.00      0.00      0.00         0
         127       0.00      0.00      0.00         0
         128       0.00      0.00      0.00         1
         129       0.00      0.00      0.00         0
         130       0.00      0.00      0.00         1
         131       0.00      0.00      0.00         1
         132       0.00      0.00      0.00         0
         133       0.00      0.00      0.00         1
         134       0.00      0.00      0.00         1
         135       0.00      0.00      0.00         1
         136       0.00      0.00      0.00         1
         137       0.00      0.00      0.00         0
         138       0.00      0.00      0.00         1
         139       0.00      0.00      0.00         1
         140       0.00      0.00      0.00         0
         141       0.00      0.00      0.00         1
         142       0.00      0.00      0.00         1
         143       0.00      0.00      0.00         0
         144       0.00      0.00      0.00         0
         145       0.00      0.00      0.00         1
         146       0.00      0.00      0.00         1
         147       0.00      0.00      0.00         1
         148       0.00      0.00      0.00         1
         149       1.00      0.50      0.67         2
         150       0.00      0.00      0.00         0
         151       0.00      0.00      0.00         1
         152       0.00      0.00      0.00         1
         153       0.00      0.00      0.00         1

   micro avg       0.40      0.01      0.02       192
   macro avg       0.01      0.00      0.01       192
weighted avg       0.04      0.01      0.02       192
 samples avg       0.01      0.01      0.01       192

C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Recall is ill-defined and being set to 0.0 in labels with no true samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no true nor predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in samples with no predicted labels. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))

Hyperparameter Tuning¶

In [ ]:
param_grid = {
    'kneighborsclassifier__n_neighbors': [3, 5, 7, 9]  # List of k values to try
}

pipeline = make_pipeline(StandardScaler(), KNeighborsClassifier())

grid_search = GridSearchCV(estimator=pipeline, param_grid=param_grid, cv=5)

grid_search.fit(X_train, y_train)

best_params = grid_search.best_params_
best_score=grid_search.best_estimator_.score(X_test, y_test)

print("Best Parameters:", best_params)
print("Best Score:", best_score)
print(classification_report(y_test, y_pred))
Best Parameters: {'kneighborsclassifier__n_neighbors': 3}
Best Score: 0.036458333333333336
              precision    recall  f1-score   support

           0       0.00      0.00      0.00         3
           1       0.00      0.00      0.00         6
           2       0.00      0.00      0.00         1
           3       0.00      0.00      0.00         1
           4       0.00      0.00      0.00        12
           5       0.00      0.00      0.00         4
           6       0.00      0.00      0.00         3
           7       0.00      0.00      0.00         3
           8       1.00      0.20      0.33         5
           9       0.00      0.00      0.00        12
          10       0.00      0.00      0.00         3
          11       0.00      0.00      0.00         6
          12       0.00      0.00      0.00         3
          13       0.00      0.00      0.00         0
          14       0.00      0.00      0.00         3
          15       0.00      0.00      0.00         1
          16       0.00      0.00      0.00         1
          17       0.00      0.00      0.00         1
          18       0.00      0.00      0.00         4
          19       0.00      0.00      0.00         1
          20       0.00      0.00      0.00         8
          21       0.00      0.00      0.00         1
          22       0.00      0.00      0.00         1
          23       0.00      0.00      0.00         1
          24       0.00      0.00      0.00         1
          25       0.00      0.00      0.00         1
          26       0.00      0.00      0.00         1
          27       0.00      0.00      0.00         0
          28       0.00      0.00      0.00         1
          29       0.00      0.00      0.00         7
          30       0.00      0.00      0.00         1
          31       0.00      0.00      0.00         0
          32       0.00      0.00      0.00         2
          33       0.00      0.00      0.00         1
          34       0.00      0.00      0.00         1
          35       0.00      0.00      0.00         1
          36       0.00      0.00      0.00         0
          37       0.00      0.00      0.00         1
          38       0.00      0.00      0.00         2
          39       0.00      0.00      0.00         1
          40       0.00      0.00      0.00         1
          41       0.00      0.00      0.00         0
          42       0.00      0.00      0.00         0
          43       0.00      0.00      0.00         1
          44       0.00      0.00      0.00         1
          45       0.00      0.00      0.00         0
          46       0.00      0.00      0.00         2
          47       0.00      0.00      0.00         1
          48       0.00      0.00      0.00         2
          49       0.00      0.00      0.00         2
          50       0.00      0.00      0.00         1
          51       0.00      0.00      0.00         2
          52       0.00      0.00      0.00         0
          53       0.00      0.00      0.00         2
          54       0.00      0.00      0.00         1
          55       0.00      0.00      0.00         0
          56       0.00      0.00      0.00         1
          57       0.00      0.00      0.00         0
          58       0.00      0.00      0.00         0
          59       0.00      0.00      0.00         1
          60       0.00      0.00      0.00         1
          61       0.00      0.00      0.00         1
          62       0.00      0.00      0.00         1
          63       0.00      0.00      0.00         1
          64       0.00      0.00      0.00         0
          65       0.00      0.00      0.00         0
          66       0.00      0.00      0.00         1
          67       0.00      0.00      0.00         1
          68       0.00      0.00      0.00         2
          69       0.00      0.00      0.00         1
          70       0.00      0.00      0.00         1
          71       0.00      0.00      0.00         1
          72       0.00      0.00      0.00         1
          73       0.00      0.00      0.00         0
          74       0.00      0.00      0.00         1
          75       0.00      0.00      0.00         0
          76       0.00      0.00      0.00         1
          77       0.00      0.00      0.00         1
          78       0.00      0.00      0.00         1
          79       0.00      0.00      0.00         1
          80       0.00      0.00      0.00         1
          81       0.00      0.00      0.00         0
          82       0.00      0.00      0.00         1
          83       0.00      0.00      0.00         0
          84       0.00      0.00      0.00         0
          85       0.00      0.00      0.00         0
          86       0.00      0.00      0.00         0
          87       0.00      0.00      0.00         3
          88       0.00      0.00      0.00         1
          89       0.00      0.00      0.00         2
          90       0.00      0.00      0.00         1
          91       0.00      0.00      0.00         1
          92       0.00      0.00      0.00         1
          93       0.00      0.00      0.00         1
          94       0.00      0.00      0.00         1
          95       0.00      0.00      0.00         1
          96       0.00      0.00      0.00         0
          97       0.00      0.00      0.00         0
          98       0.00      0.00      0.00         1
          99       0.00      0.00      0.00         1
         100       0.00      0.00      0.00         1
         101       0.00      0.00      0.00         0
         102       0.00      0.00      0.00         1
         103       0.00      0.00      0.00         0
         104       0.00      0.00      0.00         1
         105       0.00      0.00      0.00         1
         106       0.00      0.00      0.00         1
         107       0.00      0.00      0.00         1
         108       0.00      0.00      0.00         1
         109       0.00      0.00      0.00         1
         110       0.00      0.00      0.00         1
         111       0.00      0.00      0.00         1
         112       0.00      0.00      0.00         1
         113       0.00      0.00      0.00         0
         114       0.00      0.00      0.00         1
         115       0.00      0.00      0.00         1
         116       0.00      0.00      0.00         0
         117       0.00      0.00      0.00         1
         118       0.00      0.00      0.00         0
         119       0.00      0.00      0.00         0
         120       0.00      0.00      0.00         0
         121       0.00      0.00      0.00         0
         122       0.00      0.00      0.00         0
         123       0.00      0.00      0.00         0
         124       0.00      0.00      0.00         1
         125       0.00      0.00      0.00         1
         126       0.00      0.00      0.00         0
         127       0.00      0.00      0.00         0
         128       0.00      0.00      0.00         1
         129       0.00      0.00      0.00         0
         130       0.00      0.00      0.00         1
         131       0.00      0.00      0.00         1
         132       0.00      0.00      0.00         0
         133       0.00      0.00      0.00         1
         134       0.00      0.00      0.00         1
         135       0.00      0.00      0.00         1
         136       0.00      0.00      0.00         1
         137       0.00      0.00      0.00         0
         138       0.00      0.00      0.00         1
         139       0.00      0.00      0.00         1
         140       0.00      0.00      0.00         0
         141       0.00      0.00      0.00         1
         142       0.00      0.00      0.00         1
         143       0.00      0.00      0.00         0
         144       0.00      0.00      0.00         0
         145       0.00      0.00      0.00         1
         146       0.00      0.00      0.00         1
         147       0.00      0.00      0.00         1
         148       0.00      0.00      0.00         1
         149       1.00      0.50      0.67         2
         150       0.00      0.00      0.00         0
         151       0.00      0.00      0.00         1
         152       0.00      0.00      0.00         1
         153       0.00      0.00      0.00         1

   micro avg       0.40      0.01      0.02       192
   macro avg       0.01      0.00      0.01       192
weighted avg       0.04      0.01      0.02       192
 samples avg       0.01      0.01      0.01       192

C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Recall is ill-defined and being set to 0.0 in labels with no true samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no true nor predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in samples with no predicted labels. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))

Ignoring Order of Types 1¶

The first way to ignore order of types is by making a label for each type and matching the data with two labels .

Preprocessing¶

In [ ]:
df  = preprocessed_df.copy()

# Combine Type 1 and Type 2 into a single column
df['Types'] = df[['Type 1', 'Type 2']].apply(lambda x: tuple(filter(lambda y: pd.notna(y), x)), axis=1)
print(df['Types'][0])

# Get unique Pokémon types
unique_types = np.unique(df['Types'].explode())
df.drop(['Type 1', 'Type 2'], axis=1, inplace=True)

df.head()
('Grass', 'Poison')
Out[ ]:
Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary Types
0 318 45 49 49 65 65 45 1 0 (Grass, Poison)
1 405 60 62 63 80 80 60 1 0 (Grass, Poison)
2 525 80 82 83 100 100 80 1 0 (Grass, Poison)
3 625 80 100 123 122 120 80 1 0 (Grass, Poison)
4 309 39 52 43 60 50 65 1 0 (Fire, None)

For each unique type, we create a binary label. The label is 1 if the Pokemon has that type, (so was present in the type combination tuple) and 0 if it doesn't.

In [ ]:
# Create binary labels for each Pokémon type
for type in unique_types:
    df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)

df.head()
Out[ ]:
Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary Types ... Grass Ground Ice None Normal Poison Psychic Rock Steel Water
0 318 45 49 49 65 65 45 1 0 (Grass, Poison) ... 1 0 0 0 0 1 0 0 0 0
1 405 60 62 63 80 80 60 1 0 (Grass, Poison) ... 1 0 0 0 0 1 0 0 0 0
2 525 80 82 83 100 100 80 1 0 (Grass, Poison) ... 1 0 0 0 0 1 0 0 0 0
3 625 80 100 123 122 120 80 1 0 (Grass, Poison) ... 1 0 0 0 0 1 0 0 0 0
4 309 39 52 43 60 50 65 1 0 (Fire, None) ... 0 0 0 1 0 0 0 0 0 0

5 rows × 29 columns

In [ ]:
# Some type combinations only occur once so we double them to stratify the data better
singleton_classes = df['Types'].value_counts()[df['Types'].value_counts() == 1].index.tolist()
singleton_data = df[df['Types'].isin(singleton_classes)]
other_data = df[~df['Types'].isin(singleton_classes)]

df = df.drop(columns=['Types'])
other_data.drop(columns=['Types'], inplace=True)
singleton_data.drop(columns=['Types'], inplace=True)

Decision tree¶

In [ ]:
# Split the data into training and testing sets
y = df[unique_types]
X_train, X_test, y_train, y_test = train_test_split(other_data.drop(columns=unique_types), other_data[unique_types], test_size=0.2, stratify=other_data[unique_types], random_state=42)
X_train = pd.concat([X_train, singleton_data.drop(columns=unique_types)])
y_train = pd.concat([y_train, singleton_data[unique_types]])
X_test = pd.concat([X_test, singleton_data.drop(columns=unique_types)])
y_test = pd.concat([y_test, singleton_data[unique_types]])

# Initialize and train the decision tree classifier
model = DecisionTreeClassifier(random_state=42)
model.fit(X_train, y_train)

# Predict labels for the test set
y_pred = model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Score: ", model.score(X_test, y_test))
print("Accuracy:", accuracy)
print(classification_report(y_test, y_pred))
Score:  0.25
Accuracy: 0.25
              precision    recall  f1-score   support

           0       0.47      0.47      0.47        17
           1       0.56      0.42      0.48        12
           2       0.44      0.47      0.46        17
           3       0.65      0.65      0.65        17
           4       0.38      0.30      0.33        10
           5       0.40      0.31      0.35        13
           6       0.53      0.45      0.49        20
           7       0.27      0.35      0.30        23
           8       0.50      0.50      0.50        12
           9       0.24      0.27      0.26        22
          10       0.43      0.50      0.46        18
          11       0.17      0.22      0.19         9
          12       0.57      0.49      0.53        80
          13       0.27      0.33      0.30        21
          14       0.23      0.19      0.21        16
          15       0.44      0.40      0.42        20
          16       0.60      0.50      0.55        12
          17       0.56      0.60      0.58        15
          18       0.29      0.30      0.30        30

   micro avg       0.42      0.42      0.42       384
   macro avg       0.42      0.41      0.41       384
weighted avg       0.44      0.42      0.42       384
 samples avg       0.43      0.42      0.42       384

Hyperparameter tuning¶

In [ ]:
pipeline = make_pipeline(StandardScaler(), DecisionTreeClassifier())

param_dist = {
    "decisiontreeclassifier__max_depth": np.arange(10, 20), 
    "decisiontreeclassifier__min_samples_leaf": np.arange(1, 10)
}

# Instantiate the GridSearchCV object
grid_search_cv = GridSearchCV(pipeline, param_grid=param_dist, cv=5)

# Fit grid_search_cv using the data X and labels y.
grid_search_cv.fit(X_train, y_train) 
y_pred = grid_search_cv.predict(X_test)

# Print the best score
print("Tuned Model Parameters: {}".format(grid_search_cv.best_params_))
print("Accuracy: {}".format(grid_search_cv.best_estimator_.score(X_test, y_test)))
print(classification_report(y_test, y_pred))
Tuned Model Parameters: {'decisiontreeclassifier__max_depth': 16, 'decisiontreeclassifier__min_samples_leaf': 1}
Accuracy: 0.265625
              precision    recall  f1-score   support

           0       0.42      0.47      0.44        17
           1       0.50      0.50      0.50        12
           2       0.38      0.47      0.42        17
           3       0.52      0.65      0.58        17
           4       0.30      0.30      0.30        10
           5       0.50      0.38      0.43        13
           6       0.67      0.40      0.50        20
           7       0.38      0.35      0.36        23
           8       0.60      0.50      0.55        12
           9       0.29      0.27      0.28        22
          10       0.45      0.50      0.47        18
          11       0.17      0.22      0.19         9
          12       0.61      0.57      0.59        80
          13       0.27      0.29      0.28        21
          14       0.27      0.25      0.26        16
          15       0.50      0.40      0.44        20
          16       0.45      0.42      0.43        12
          17       0.62      0.53      0.57        15
          18       0.31      0.37      0.34        30

   micro avg       0.45      0.44      0.44       384
   macro avg       0.43      0.41      0.42       384
weighted avg       0.46      0.44      0.44       384
 samples avg       0.45      0.44      0.44       384

Random Forest¶

In [ ]:
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(random_state=42)

model.fit(X_train, y_train)

# Predict labels for the test set
y_pred = model.predict(X_test)

# Calculate accuracy
print("Score: ", model.score(X_test, y_test))
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
print(classification_report(y_test, y_pred))
Score:  0.20833333333333334
Accuracy: 0.20833333333333334
              precision    recall  f1-score   support

           0       1.00      0.29      0.45        17
           1       1.00      0.25      0.40        12
           2       0.69      0.53      0.60        17
           3       1.00      0.59      0.74        17
           4       1.00      0.20      0.33        10
           5       1.00      0.31      0.47        13
           6       1.00      0.45      0.62        20
           7       0.50      0.09      0.15        23
           8       0.86      0.50      0.63        12
           9       0.67      0.18      0.29        22
          10       1.00      0.28      0.43        18
          11       1.00      0.11      0.20         9
          12       0.65      0.49      0.56        80
          13       0.43      0.14      0.21        21
          14       0.75      0.19      0.30        16
          15       0.83      0.25      0.38        20
          16       0.80      0.33      0.47        12
          17       0.86      0.40      0.55        15
          18       1.00      0.17      0.29        30

   micro avg       0.77      0.33      0.46       384
   macro avg       0.84      0.30      0.43       384
weighted avg       0.80      0.33      0.44       384
 samples avg       0.44      0.33      0.36       384

C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in samples with no predicted labels. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))

Hyperparameter tuning¶

In [ ]:
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint  
# Define the parameter grid
pipeline = make_pipeline(StandardScaler(), RandomForestClassifier())

# Setup the parameters and distributions to sample from: param_dist
param_dist = {
    "randomforestclassifier__max_depth": np.arange(10, 20),  
    "randomforestclassifier__min_samples_leaf": np.arange(1, 10, 4),
    "randomforestclassifier__n_estimators": np.arange(60, 140, 5)
}

random_search_cv = RandomizedSearchCV(pipeline, param_distributions=param_dist, n_iter=50, cv=3, random_state=42)
#grid_search_cv = GridSearchCV(pipeline, param_grid=param_dist, cv=3)

# Fit random_search_cv using the data X and labels y
random_search_cv.fit(X_train, y_train)
#grid_search_cv.fit(X_train, y_train)

# Print the best score
print("Best score is {}".format(random_search_cv.best_estimator_.score(X_test, y_test)))
print("Best parameters are {}".format(random_search_cv.best_params_))

#print("Best score is {}".format(grid_search_cv.best_estimator_.score(X_test, y_test)))
#print("Best parameters are {}".format(grid_search_cv.best_params_)) 
print(classification_report(y_test, y_pred))
Best score is 0.203125
Best parameters are {'randomforestclassifier__n_estimators': 130, 'randomforestclassifier__min_samples_leaf': 1, 'randomforestclassifier__max_depth': 16}
              precision    recall  f1-score   support

           0       1.00      0.29      0.45        17
           1       1.00      0.25      0.40        12
           2       0.69      0.53      0.60        17
           3       1.00      0.59      0.74        17
           4       1.00      0.20      0.33        10
           5       1.00      0.31      0.47        13
           6       1.00      0.45      0.62        20
           7       0.50      0.09      0.15        23
           8       0.86      0.50      0.63        12
           9       0.67      0.18      0.29        22
          10       1.00      0.28      0.43        18
          11       1.00      0.11      0.20         9
          12       0.65      0.49      0.56        80
          13       0.43      0.14      0.21        21
          14       0.75      0.19      0.30        16
          15       0.83      0.25      0.38        20
          16       0.80      0.33      0.47        12
          17       0.86      0.40      0.55        15
          18       1.00      0.17      0.29        30

   micro avg       0.77      0.33      0.46       384
   macro avg       0.84      0.30      0.43       384
weighted avg       0.80      0.33      0.44       384
 samples avg       0.44      0.33      0.36       384

C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in samples with no predicted labels. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))

K Nearest neigbors¶

In [ ]:
from sklearn.neighbors import KNeighborsClassifier
model = KNeighborsClassifier()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

# Calculate accuracy
print("Score: ", model.score(X_test, y_test))
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
print(classification_report(y_test, y_pred))
Score:  0.015625
Accuracy: 0.015625
              precision    recall  f1-score   support

           0       0.67      0.12      0.20        17
           1       0.00      0.00      0.00        12
           2       0.36      0.24      0.29        17
           3       0.83      0.29      0.43        17
           4       0.00      0.00      0.00        10
           5       0.00      0.00      0.00        13
           6       0.33      0.10      0.15        20
           7       0.25      0.09      0.13        23
           8       0.33      0.08      0.13        12
           9       0.33      0.05      0.08        22
          10       0.33      0.11      0.17        18
          11       0.00      0.00      0.00         9
          12       0.56      0.50      0.53        80
          13       0.31      0.19      0.24        21
          14       0.00      0.00      0.00        16
          15       0.33      0.15      0.21        20
          16       0.67      0.17      0.27        12
          17       0.33      0.07      0.11        15
          18       0.17      0.03      0.06        30

   micro avg       0.46      0.18      0.26       384
   macro avg       0.31      0.11      0.16       384
weighted avg       0.36      0.18      0.22       384
 samples avg       0.31      0.18      0.22       384

C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in samples with no predicted labels. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))

Hyperparameter tuning¶

In [ ]:
param_grid = {
    'kneighborsclassifier__n_neighbors': [3, 5, 7, 9]  # List of k values to try
}

pipeline = make_pipeline(StandardScaler(), KNeighborsClassifier())

grid_search = GridSearchCV(estimator=pipeline, param_grid=param_grid, cv=5)

grid_search.fit(X_train, y_train)

best_params = grid_search.best_params_
best_score=grid_search.best_estimator_.score(X_test, y_test)

print("Best Parameters:", best_params)
print("Best Score:", best_score)
print(classification_report(y_test, y_pred))
Best Parameters: {'kneighborsclassifier__n_neighbors': 3}
Best Score: 0.046875
              precision    recall  f1-score   support

           0       0.67      0.12      0.20        17
           1       0.00      0.00      0.00        12
           2       0.36      0.24      0.29        17
           3       0.83      0.29      0.43        17
           4       0.00      0.00      0.00        10
           5       0.00      0.00      0.00        13
           6       0.33      0.10      0.15        20
           7       0.25      0.09      0.13        23
           8       0.33      0.08      0.13        12
           9       0.33      0.05      0.08        22
          10       0.33      0.11      0.17        18
          11       0.00      0.00      0.00         9
          12       0.56      0.50      0.53        80
          13       0.31      0.19      0.24        21
          14       0.00      0.00      0.00        16
          15       0.33      0.15      0.21        20
          16       0.67      0.17      0.27        12
          17       0.33      0.07      0.11        15
          18       0.17      0.03      0.06        30

   micro avg       0.46      0.18      0.26       384
   macro avg       0.31      0.11      0.16       384
weighted avg       0.36      0.18      0.22       384
 samples avg       0.31      0.18      0.22       384

C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in samples with no predicted labels. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))

Ignoring Order of Types 2¶

The second way of ignoring order of types is to use binary labels for each sorted type combination.

Preprocessing¶

We again combine the Type 1 and Type 2 columns into a single column that contains a list of type combination tuples that are sorted.

In [ ]:
df = preprocessed_df.copy()

df['Types'] = df[['Type 1', 'Type 2']].apply(lambda x: sorted(tuple(filter(lambda y: pd.notna(y), x))), axis=1)

df['Types'] = df['Types'].astype(str)

# print two pokemon types where we know type 1 and two are the same but reversed to check if order is kept into account
print("Sableye: ",df['Types'][326])
print("Spiritomb: ",df['Types'][490])

# drop the Type 1 and Type 2 columns
df.drop(['Type 1', 'Type 2'], axis=1, inplace=True)
# print head
df.head()
Sableye:  ['Dark', 'Ghost']
Spiritomb:  ['Dark', 'Ghost']
Out[ ]:
Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary Types
0 318 45 49 49 65 65 45 1 0 ['Grass', 'Poison']
1 405 60 62 63 80 80 60 1 0 ['Grass', 'Poison']
2 525 80 82 83 100 100 80 1 0 ['Grass', 'Poison']
3 625 80 100 123 122 120 80 1 0 ['Grass', 'Poison']
4 309 39 52 43 60 50 65 1 0 ['Fire', 'None']
In [ ]:
# Find classes with only one type
singleton_classes = df['Types'].value_counts()[df['Types'].value_counts() == 1].index.tolist()

Make binary labels for each sorted type combination

In [ ]:
# Create binary labels for each Pokémon type combination
unique_type_combinations = df['Types'].unique()
for type_combination in unique_type_combinations:
    df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0)

singleton_data = df[df['Types'].isin(singleton_classes)]
other_data = df[~df['Types'].isin(singleton_classes)]

print("Number of singleton classes",len(singleton_classes))
print("Number of unique type combinations",len(df['Types'].unique()))
print(len(df['Types']))
df.head()
Number of singleton classes 24
Number of unique type combinations 133
800
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0)
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0)
Out[ ]:
Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary Types ... ['Dragon', 'Poison'] ['Electric', 'Normal'] ['Dragon', 'Rock'] ['Ice', 'Rock'] ['Fighting', 'Flying'] ['Electric', 'Fairy'] ['Fairy', 'Rock'] ['Ghost', 'Grass'] ['Ghost', 'Psychic'] ['Fire', 'Water']
0 318 45 49 49 65 65 45 1 0 ['Grass', 'Poison'] ... 0 0 0 0 0 0 0 0 0 0
1 405 60 62 63 80 80 60 1 0 ['Grass', 'Poison'] ... 0 0 0 0 0 0 0 0 0 0
2 525 80 82 83 100 100 80 1 0 ['Grass', 'Poison'] ... 0 0 0 0 0 0 0 0 0 0
3 625 80 100 123 122 120 80 1 0 ['Grass', 'Poison'] ... 0 0 0 0 0 0 0 0 0 0
4 309 39 52 43 60 50 65 1 0 ['Fire', 'None'] ... 0 0 0 0 0 0 0 0 0 0

5 rows × 143 columns

In [ ]:
# Drop the 'Types' column
df = df.drop(columns=['Types'])
other_data.drop(columns=['Types'], inplace=True)
singleton_data.drop(columns=['Types'], inplace=True)

Decision Tree¶

In [ ]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(other_data.drop(columns=unique_type_combinations), other_data[unique_type_combinations], test_size=0.2, stratify=other_data[unique_type_combinations], random_state=42)
X_train = pd.concat([X_train, singleton_data.drop(columns=unique_type_combinations)])
y_train = pd.concat([y_train, singleton_data[unique_type_combinations]])
X_test = pd.concat([X_test, singleton_data.drop(columns=unique_type_combinations)])
y_test = pd.concat([y_test, singleton_data[unique_type_combinations]])


# Initialize and train the decision tree classifier
model = DecisionTreeClassifier(random_state=42)
model.fit(X_train, y_train)

# Predict labels for the test set
y_pred = model.predict(X_test)

# Calculate accuracy
print("Score: ", model.score(X_test, y_test))
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
print(classification_report(y_test, y_pred))
Score:  0.7611111111111111
Accuracy: 0.7611111111111111
              precision    recall  f1-score   support

           0       0.31      0.38      0.34        29
           1       0.31      0.38      0.34        29
           2       0.31      0.38      0.34        29
           3       0.31      0.38      0.34        29
           4       0.31      0.38      0.34        29
           5       0.31      0.38      0.34        29
           6       0.31      0.38      0.34        29
           7       0.31      0.38      0.34        29
           8       0.31      0.38      0.34        29
           9       0.31      0.38      0.34        29
          10       0.31      0.38      0.34        29
          11       0.31      0.38      0.34        29
          12       0.31      0.38      0.34        29
          13       0.31      0.38      0.34        29
          14       0.31      0.38      0.34        29
          15       0.31      0.38      0.34        29
          16       0.31      0.38      0.34        29
          17       0.31      0.38      0.34        29
          18       0.31      0.38      0.34        29
          19       0.31      0.38      0.34        29
          20       0.31      0.38      0.34        29
          21       0.31      0.38      0.34        29
          22       0.31      0.38      0.34        29
          23       0.31      0.38      0.34        29
          24       0.31      0.38      0.34        29
          25       0.31      0.38      0.34        29
          26       0.31      0.38      0.34        29
          27       0.31      0.38      0.34        29
          28       0.31      0.38      0.34        29
          29       0.31      0.38      0.34        29
          30       0.31      0.38      0.34        29
          31       0.31      0.38      0.34        29
          32       0.31      0.38      0.34        29
          33       0.31      0.38      0.34        29
          34       0.31      0.38      0.34        29
          35       0.31      0.38      0.34        29
          36       0.31      0.38      0.34        29
          37       0.31      0.38      0.34        29
          38       0.31      0.38      0.34        29
          39       0.31      0.38      0.34        29
          40       0.31      0.38      0.34        29
          41       0.31      0.38      0.34        29
          42       0.31      0.38      0.34        29
          43       0.31      0.38      0.34        29
          44       0.31      0.38      0.34        29
          45       0.31      0.38      0.34        29
          46       0.31      0.38      0.34        29
          47       0.31      0.38      0.34        29
          48       0.31      0.38      0.34        29
          49       0.31      0.38      0.34        29
          50       0.31      0.38      0.34        29
          51       0.31      0.38      0.34        29
          52       0.31      0.38      0.34        29
          53       0.31      0.38      0.34        29
          54       0.31      0.38      0.34        29
          55       0.31      0.38      0.34        29
          56       0.31      0.38      0.34        29
          57       0.31      0.38      0.34        29
          58       0.31      0.38      0.34        29
          59       0.31      0.38      0.34        29
          60       0.31      0.38      0.34        29
          61       0.31      0.38      0.34        29
          62       0.31      0.38      0.34        29
          63       0.31      0.38      0.34        29
          64       0.31      0.38      0.34        29
          65       0.31      0.38      0.34        29
          66       0.31      0.38      0.34        29
          67       0.31      0.38      0.34        29
          68       0.31      0.38      0.34        29
          69       0.31      0.38      0.34        29
          70       0.31      0.38      0.34        29
          71       0.31      0.38      0.34        29
          72       0.31      0.38      0.34        29
          73       0.31      0.38      0.34        29
          74       0.31      0.38      0.34        29
          75       0.31      0.38      0.34        29
          76       0.31      0.38      0.34        29
          77       0.31      0.38      0.34        29
          78       0.31      0.38      0.34        29
          79       0.31      0.38      0.34        29
          80       0.31      0.38      0.34        29
          81       0.31      0.38      0.34        29
          82       0.31      0.38      0.34        29
          83       0.31      0.38      0.34        29
          84       0.31      0.38      0.34        29
          85       0.31      0.38      0.34        29
          86       0.31      0.38      0.34        29
          87       0.31      0.38      0.34        29
          88       0.31      0.38      0.34        29
          89       0.31      0.38      0.34        29
          90       0.31      0.38      0.34        29
          91       0.31      0.38      0.34        29
          92       0.31      0.38      0.34        29
          93       0.31      0.38      0.34        29
          94       0.31      0.38      0.34        29
          95       0.31      0.38      0.34        29
          96       0.31      0.38      0.34        29
          97       0.31      0.38      0.34        29
          98       0.31      0.38      0.34        29
          99       0.31      0.38      0.34        29
         100       0.31      0.38      0.34        29
         101       0.31      0.38      0.34        29
         102       0.31      0.38      0.34        29
         103       0.31      0.38      0.34        29
         104       0.31      0.38      0.34        29
         105       0.31      0.38      0.34        29
         106       0.31      0.38      0.34        29
         107       0.31      0.38      0.34        29
         108       0.31      0.38      0.34        29
         109       0.31      0.38      0.34        29
         110       0.31      0.38      0.34        29
         111       0.31      0.38      0.34        29
         112       0.31      0.38      0.34        29
         113       0.31      0.38      0.34        29
         114       0.31      0.38      0.34        29
         115       0.31      0.38      0.34        29
         116       0.31      0.38      0.34        29
         117       0.31      0.38      0.34        29
         118       0.31      0.38      0.34        29
         119       0.31      0.38      0.34        29
         120       0.31      0.38      0.34        29
         121       0.31      0.38      0.34        29
         122       0.31      0.38      0.34        29
         123       0.31      0.38      0.34        29
         124       0.31      0.38      0.34        29
         125       0.31      0.38      0.34        29
         126       0.31      0.38      0.34        29
         127       0.31      0.38      0.34        29
         128       0.31      0.38      0.34        29
         129       0.31      0.38      0.34        29
         130       0.31      0.38      0.34        29
         131       0.31      0.38      0.34        29
         132       0.31      0.38      0.34        29

   micro avg       0.31      0.38      0.34      3857
   macro avg       0.31      0.38      0.34      3857
weighted avg       0.31      0.38      0.34      3857
 samples avg       0.06      0.06      0.06      3857

C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in samples with no predicted labels. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Recall is ill-defined and being set to 0.0 in samples with no true labels. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in samples with no true nor predicted labels. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))

Hyperparameter Tuning¶

In [ ]:
pipeline = make_pipeline(StandardScaler(), DecisionTreeClassifier())

param_dist = {
    "decisiontreeclassifier__max_depth": [5, 6, 7, 8, 9, 10, 15, 30, None], 
    "decisiontreeclassifier__min_samples_leaf": np.arange(1, 10)
}

# Instantiate the GridSearchCV object
grid_search_cv = GridSearchCV(pipeline, param_grid=param_dist, cv=5)

# Fit grid_search_cv using the data X and labels y.
grid_search_cv.fit(X_train, y_train) 
y_pred = grid_search_cv.predict(X_test)

# Print the best score
print("Tuned Model Parameters: {}".format(grid_search_cv.best_params_))
print("Accuracy: {}".format(grid_search_cv.best_estimator_.score(X_test, y_test)))
print(classification_report(y_test, y_pred))
Tuned Model Parameters: {'decisiontreeclassifier__max_depth': 6, 'decisiontreeclassifier__min_samples_leaf': 9}
Accuracy: 0.8166666666666667
              precision    recall  f1-score   support

           0       0.17      0.03      0.06        29
           1       0.17      0.03      0.06        29
           2       0.17      0.03      0.06        29
           3       0.17      0.03      0.06        29
           4       0.17      0.03      0.06        29
           5       0.17      0.03      0.06        29
           6       0.17      0.03      0.06        29
           7       0.17      0.03      0.06        29
           8       0.17      0.03      0.06        29
           9       0.17      0.03      0.06        29
          10       0.17      0.03      0.06        29
          11       0.17      0.03      0.06        29
          12       0.17      0.03      0.06        29
          13       0.17      0.03      0.06        29
          14       0.17      0.03      0.06        29
          15       0.17      0.03      0.06        29
          16       0.17      0.03      0.06        29
          17       0.17      0.03      0.06        29
          18       0.17      0.03      0.06        29
          19       0.17      0.03      0.06        29
          20       0.17      0.03      0.06        29
          21       0.17      0.03      0.06        29
          22       0.17      0.03      0.06        29
          23       0.17      0.03      0.06        29
          24       0.17      0.03      0.06        29
          25       0.17      0.03      0.06        29
          26       0.17      0.03      0.06        29
          27       0.17      0.03      0.06        29
          28       0.17      0.03      0.06        29
          29       0.17      0.03      0.06        29
          30       0.17      0.03      0.06        29
          31       0.17      0.03      0.06        29
          32       0.17      0.03      0.06        29
          33       0.17      0.03      0.06        29
          34       0.17      0.03      0.06        29
          35       0.17      0.03      0.06        29
          36       0.17      0.03      0.06        29
          37       0.17      0.03      0.06        29
          38       0.17      0.03      0.06        29
          39       0.17      0.03      0.06        29
          40       0.17      0.03      0.06        29
          41       0.17      0.03      0.06        29
          42       0.17      0.03      0.06        29
          43       0.17      0.03      0.06        29
          44       0.17      0.03      0.06        29
          45       0.17      0.03      0.06        29
          46       0.17      0.03      0.06        29
          47       0.17      0.03      0.06        29
          48       0.17      0.03      0.06        29
          49       0.17      0.03      0.06        29
          50       0.17      0.03      0.06        29
          51       0.17      0.03      0.06        29
          52       0.17      0.03      0.06        29
          53       0.17      0.03      0.06        29
          54       0.17      0.03      0.06        29
          55       0.17      0.03      0.06        29
          56       0.17      0.03      0.06        29
          57       0.17      0.03      0.06        29
          58       0.17      0.03      0.06        29
          59       0.17      0.03      0.06        29
          60       0.17      0.03      0.06        29
          61       0.17      0.03      0.06        29
          62       0.17      0.03      0.06        29
          63       0.17      0.03      0.06        29
          64       0.17      0.03      0.06        29
          65       0.17      0.03      0.06        29
          66       0.17      0.03      0.06        29
          67       0.17      0.03      0.06        29
          68       0.17      0.03      0.06        29
          69       0.17      0.03      0.06        29
          70       0.17      0.03      0.06        29
          71       0.17      0.03      0.06        29
          72       0.17      0.03      0.06        29
          73       0.17      0.03      0.06        29
          74       0.17      0.03      0.06        29
          75       0.17      0.03      0.06        29
          76       0.17      0.03      0.06        29
          77       0.17      0.03      0.06        29
          78       0.17      0.03      0.06        29
          79       0.17      0.03      0.06        29
          80       0.17      0.03      0.06        29
          81       0.17      0.03      0.06        29
          82       0.17      0.03      0.06        29
          83       0.17      0.03      0.06        29
          84       0.17      0.03      0.06        29
          85       0.17      0.03      0.06        29
          86       0.17      0.03      0.06        29
          87       0.17      0.03      0.06        29
          88       0.17      0.03      0.06        29
          89       0.17      0.03      0.06        29
          90       0.17      0.03      0.06        29
          91       0.17      0.03      0.06        29
          92       0.17      0.03      0.06        29
          93       0.17      0.03      0.06        29
          94       0.17      0.03      0.06        29
          95       0.17      0.03      0.06        29
          96       0.17      0.03      0.06        29
          97       0.17      0.03      0.06        29
          98       0.17      0.03      0.06        29
          99       0.17      0.03      0.06        29
         100       0.17      0.03      0.06        29
         101       0.17      0.03      0.06        29
         102       0.17      0.03      0.06        29
         103       0.17      0.03      0.06        29
         104       0.17      0.03      0.06        29
         105       0.17      0.03      0.06        29
         106       0.17      0.03      0.06        29
         107       0.17      0.03      0.06        29
         108       0.17      0.03      0.06        29
         109       0.17      0.03      0.06        29
         110       0.17      0.03      0.06        29
         111       0.17      0.03      0.06        29
         112       0.17      0.03      0.06        29
         113       0.17      0.03      0.06        29
         114       0.17      0.03      0.06        29
         115       0.17      0.03      0.06        29
         116       0.17      0.03      0.06        29
         117       0.17      0.03      0.06        29
         118       0.17      0.03      0.06        29
         119       0.17      0.03      0.06        29
         120       0.17      0.03      0.06        29
         121       0.17      0.03      0.06        29
         122       0.17      0.03      0.06        29
         123       0.17      0.03      0.06        29
         124       0.17      0.03      0.06        29
         125       0.17      0.03      0.06        29
         126       0.17      0.03      0.06        29
         127       0.17      0.03      0.06        29
         128       0.17      0.03      0.06        29
         129       0.17      0.03      0.06        29
         130       0.17      0.03      0.06        29
         131       0.17      0.03      0.06        29
         132       0.17      0.03      0.06        29

   micro avg       0.17      0.03      0.06      3857
   macro avg       0.17      0.03      0.06      3857
weighted avg       0.17      0.03      0.06      3857
 samples avg       0.01      0.01      0.01      3857

C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in samples with no predicted labels. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Recall is ill-defined and being set to 0.0 in samples with no true labels. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in samples with no true nor predicted labels. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))

Random forests¶

In [ ]:
# Initialize and train the decision tree classifier
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)

# Predict labels for the test set
y_pred = model.predict(X_test)

# Calculate accuracy
print("Score: ", model.score(X_test,y_test))
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
print(classification_report(y_test, y_pred))
Score:  0.8555555555555555
Accuracy: 0.8555555555555555
              precision    recall  f1-score   support

           0       0.71      0.17      0.28        29
           1       0.71      0.17      0.28        29
           2       0.71      0.17      0.28        29
           3       0.71      0.17      0.28        29
           4       0.71      0.17      0.28        29
           5       0.71      0.17      0.28        29
           6       0.71      0.17      0.28        29
           7       0.71      0.17      0.28        29
           8       0.71      0.17      0.28        29
           9       0.71      0.17      0.28        29
          10       0.71      0.17      0.28        29
          11       0.71      0.17      0.28        29
          12       0.71      0.17      0.28        29
          13       0.71      0.17      0.28        29
          14       0.71      0.17      0.28        29
          15       0.71      0.17      0.28        29
          16       0.71      0.17      0.28        29
          17       0.71      0.17      0.28        29
          18       0.71      0.17      0.28        29
          19       0.71      0.17      0.28        29
          20       0.71      0.17      0.28        29
          21       0.71      0.17      0.28        29
          22       0.71      0.17      0.28        29
          23       0.71      0.17      0.28        29
          24       0.71      0.17      0.28        29
          25       0.71      0.17      0.28        29
          26       0.71      0.17      0.28        29
          27       0.71      0.17      0.28        29
          28       0.71      0.17      0.28        29
          29       0.71      0.17      0.28        29
          30       0.71      0.17      0.28        29
          31       0.71      0.17      0.28        29
          32       0.71      0.17      0.28        29
          33       0.71      0.17      0.28        29
          34       0.71      0.17      0.28        29
          35       0.71      0.17      0.28        29
          36       0.71      0.17      0.28        29
          37       0.71      0.17      0.28        29
          38       0.71      0.17      0.28        29
          39       0.71      0.17      0.28        29
          40       0.71      0.17      0.28        29
          41       0.71      0.17      0.28        29
          42       0.71      0.17      0.28        29
          43       0.71      0.17      0.28        29
          44       0.71      0.17      0.28        29
          45       0.71      0.17      0.28        29
          46       0.71      0.17      0.28        29
          47       0.71      0.17      0.28        29
          48       0.71      0.17      0.28        29
          49       0.71      0.17      0.28        29
          50       0.71      0.17      0.28        29
          51       0.71      0.17      0.28        29
          52       0.71      0.17      0.28        29
          53       0.71      0.17      0.28        29
          54       0.71      0.17      0.28        29
          55       0.71      0.17      0.28        29
          56       0.71      0.17      0.28        29
          57       0.71      0.17      0.28        29
          58       0.71      0.17      0.28        29
          59       0.71      0.17      0.28        29
          60       0.71      0.17      0.28        29
          61       0.71      0.17      0.28        29
          62       0.71      0.17      0.28        29
          63       0.71      0.17      0.28        29
          64       0.71      0.17      0.28        29
          65       0.71      0.17      0.28        29
          66       0.71      0.17      0.28        29
          67       0.71      0.17      0.28        29
          68       0.71      0.17      0.28        29
          69       0.71      0.17      0.28        29
          70       0.71      0.17      0.28        29
          71       0.71      0.17      0.28        29
          72       0.71      0.17      0.28        29
          73       0.71      0.17      0.28        29
          74       0.71      0.17      0.28        29
          75       0.71      0.17      0.28        29
          76       0.71      0.17      0.28        29
          77       0.71      0.17      0.28        29
          78       0.71      0.17      0.28        29
          79       0.71      0.17      0.28        29
          80       0.71      0.17      0.28        29
          81       0.71      0.17      0.28        29
          82       0.71      0.17      0.28        29
          83       0.71      0.17      0.28        29
          84       0.71      0.17      0.28        29
          85       0.71      0.17      0.28        29
          86       0.71      0.17      0.28        29
          87       0.71      0.17      0.28        29
          88       0.71      0.17      0.28        29
          89       0.71      0.17      0.28        29
          90       0.71      0.17      0.28        29
          91       0.71      0.17      0.28        29
          92       0.71      0.17      0.28        29
          93       0.71      0.17      0.28        29
          94       0.71      0.17      0.28        29
          95       0.71      0.17      0.28        29
          96       0.71      0.17      0.28        29
          97       0.71      0.17      0.28        29
          98       0.71      0.17      0.28        29
          99       0.71      0.17      0.28        29
         100       0.71      0.17      0.28        29
         101       0.71      0.17      0.28        29
         102       0.71      0.17      0.28        29
         103       0.71      0.17      0.28        29
         104       0.71      0.17      0.28        29
         105       0.71      0.17      0.28        29
         106       0.71      0.17      0.28        29
         107       0.71      0.17      0.28        29
         108       0.71      0.17      0.28        29
         109       0.71      0.17      0.28        29
         110       0.71      0.17      0.28        29
         111       0.71      0.17      0.28        29
         112       0.71      0.17      0.28        29
         113       0.71      0.17      0.28        29
         114       0.71      0.17      0.28        29
         115       0.71      0.17      0.28        29
         116       0.71      0.17      0.28        29
         117       0.71      0.17      0.28        29
         118       0.71      0.17      0.28        29
         119       0.71      0.17      0.28        29
         120       0.71      0.17      0.28        29
         121       0.71      0.17      0.28        29
         122       0.71      0.17      0.28        29
         123       0.71      0.17      0.28        29
         124       0.71      0.17      0.28        29
         125       0.71      0.17      0.28        29
         126       0.71      0.17      0.28        29
         127       0.71      0.17      0.28        29
         128       0.71      0.17      0.28        29
         129       0.71      0.17      0.28        29
         130       0.71      0.17      0.28        29
         131       0.71      0.17      0.28        29
         132       0.71      0.17      0.28        29

   micro avg       0.71      0.17      0.28      3857
   macro avg       0.71      0.17      0.28      3857
weighted avg       0.71      0.17      0.28      3857
 samples avg       0.03      0.03      0.03      3857

C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in samples with no predicted labels. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Recall is ill-defined and being set to 0.0 in samples with no true labels. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in samples with no true nor predicted labels. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))

Hyperparameter Tuning¶

In [ ]:
pipeline = make_pipeline(StandardScaler(), RandomForestClassifier())

# Setup the parameters and distributions to sample from: param_dist
param_dist = {
    "randomforestclassifier__max_depth": np.arange(5, 21),
    "randomforestclassifier__min_samples_leaf": np.arange(1, 10),
    "randomforestclassifier__n_estimators": np.arange(50, 150, 5)
}

# Instantiate the RandomizedSearchCV object: random_grid_search_cv
random_search_cv = RandomizedSearchCV(pipeline, param_distributions=param_dist, n_iter=50, cv=3, random_state=42)
#grid_search_cv = GridSearchCV(pipeline, param_grid=param_dist, cv=3)

# Fit random_search_cv using the data X and labels y
random_search_cv.fit(X_train, y_train)
#grid_search_cv.fit(X_train, y_train)

# Print the best score
print("Best score is {}".format(random_search_cv.best_estimator_.score(X_test, y_test)))
print("Best parameters are {}".format(random_search_cv.best_params_))
print(classification_report(y_test, y_pred))
Best score is 0.8388888888888889
Best parameters are {'randomforestclassifier__n_estimators': 55, 'randomforestclassifier__min_samples_leaf': 4, 'randomforestclassifier__max_depth': 12}
              precision    recall  f1-score   support

           0       0.71      0.17      0.28        29
           1       0.71      0.17      0.28        29
           2       0.71      0.17      0.28        29
           3       0.71      0.17      0.28        29
           4       0.71      0.17      0.28        29
           5       0.71      0.17      0.28        29
           6       0.71      0.17      0.28        29
           7       0.71      0.17      0.28        29
           8       0.71      0.17      0.28        29
           9       0.71      0.17      0.28        29
          10       0.71      0.17      0.28        29
          11       0.71      0.17      0.28        29
          12       0.71      0.17      0.28        29
          13       0.71      0.17      0.28        29
          14       0.71      0.17      0.28        29
          15       0.71      0.17      0.28        29
          16       0.71      0.17      0.28        29
          17       0.71      0.17      0.28        29
          18       0.71      0.17      0.28        29
          19       0.71      0.17      0.28        29
          20       0.71      0.17      0.28        29
          21       0.71      0.17      0.28        29
          22       0.71      0.17      0.28        29
          23       0.71      0.17      0.28        29
          24       0.71      0.17      0.28        29
          25       0.71      0.17      0.28        29
          26       0.71      0.17      0.28        29
          27       0.71      0.17      0.28        29
          28       0.71      0.17      0.28        29
          29       0.71      0.17      0.28        29
          30       0.71      0.17      0.28        29
          31       0.71      0.17      0.28        29
          32       0.71      0.17      0.28        29
          33       0.71      0.17      0.28        29
          34       0.71      0.17      0.28        29
          35       0.71      0.17      0.28        29
          36       0.71      0.17      0.28        29
          37       0.71      0.17      0.28        29
          38       0.71      0.17      0.28        29
          39       0.71      0.17      0.28        29
          40       0.71      0.17      0.28        29
          41       0.71      0.17      0.28        29
          42       0.71      0.17      0.28        29
          43       0.71      0.17      0.28        29
          44       0.71      0.17      0.28        29
          45       0.71      0.17      0.28        29
          46       0.71      0.17      0.28        29
          47       0.71      0.17      0.28        29
          48       0.71      0.17      0.28        29
          49       0.71      0.17      0.28        29
          50       0.71      0.17      0.28        29
          51       0.71      0.17      0.28        29
          52       0.71      0.17      0.28        29
          53       0.71      0.17      0.28        29
          54       0.71      0.17      0.28        29
          55       0.71      0.17      0.28        29
          56       0.71      0.17      0.28        29
          57       0.71      0.17      0.28        29
          58       0.71      0.17      0.28        29
          59       0.71      0.17      0.28        29
          60       0.71      0.17      0.28        29
          61       0.71      0.17      0.28        29
          62       0.71      0.17      0.28        29
          63       0.71      0.17      0.28        29
          64       0.71      0.17      0.28        29
          65       0.71      0.17      0.28        29
          66       0.71      0.17      0.28        29
          67       0.71      0.17      0.28        29
          68       0.71      0.17      0.28        29
          69       0.71      0.17      0.28        29
          70       0.71      0.17      0.28        29
          71       0.71      0.17      0.28        29
          72       0.71      0.17      0.28        29
          73       0.71      0.17      0.28        29
          74       0.71      0.17      0.28        29
          75       0.71      0.17      0.28        29
          76       0.71      0.17      0.28        29
          77       0.71      0.17      0.28        29
          78       0.71      0.17      0.28        29
          79       0.71      0.17      0.28        29
          80       0.71      0.17      0.28        29
          81       0.71      0.17      0.28        29
          82       0.71      0.17      0.28        29
          83       0.71      0.17      0.28        29
          84       0.71      0.17      0.28        29
          85       0.71      0.17      0.28        29
          86       0.71      0.17      0.28        29
          87       0.71      0.17      0.28        29
          88       0.71      0.17      0.28        29
          89       0.71      0.17      0.28        29
          90       0.71      0.17      0.28        29
          91       0.71      0.17      0.28        29
          92       0.71      0.17      0.28        29
          93       0.71      0.17      0.28        29
          94       0.71      0.17      0.28        29
          95       0.71      0.17      0.28        29
          96       0.71      0.17      0.28        29
          97       0.71      0.17      0.28        29
          98       0.71      0.17      0.28        29
          99       0.71      0.17      0.28        29
         100       0.71      0.17      0.28        29
         101       0.71      0.17      0.28        29
         102       0.71      0.17      0.28        29
         103       0.71      0.17      0.28        29
         104       0.71      0.17      0.28        29
         105       0.71      0.17      0.28        29
         106       0.71      0.17      0.28        29
         107       0.71      0.17      0.28        29
         108       0.71      0.17      0.28        29
         109       0.71      0.17      0.28        29
         110       0.71      0.17      0.28        29
         111       0.71      0.17      0.28        29
         112       0.71      0.17      0.28        29
         113       0.71      0.17      0.28        29
         114       0.71      0.17      0.28        29
         115       0.71      0.17      0.28        29
         116       0.71      0.17      0.28        29
         117       0.71      0.17      0.28        29
         118       0.71      0.17      0.28        29
         119       0.71      0.17      0.28        29
         120       0.71      0.17      0.28        29
         121       0.71      0.17      0.28        29
         122       0.71      0.17      0.28        29
         123       0.71      0.17      0.28        29
         124       0.71      0.17      0.28        29
         125       0.71      0.17      0.28        29
         126       0.71      0.17      0.28        29
         127       0.71      0.17      0.28        29
         128       0.71      0.17      0.28        29
         129       0.71      0.17      0.28        29
         130       0.71      0.17      0.28        29
         131       0.71      0.17      0.28        29
         132       0.71      0.17      0.28        29

   micro avg       0.71      0.17      0.28      3857
   macro avg       0.71      0.17      0.28      3857
weighted avg       0.71      0.17      0.28      3857
 samples avg       0.03      0.03      0.03      3857

C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in samples with no predicted labels. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Recall is ill-defined and being set to 0.0 in samples with no true labels. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in samples with no true nor predicted labels. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))

K Nearest Neighbors¶

In [ ]:
from sklearn.neighbors import KNeighborsClassifier
model = KNeighborsClassifier()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

# Calculate accuracy
print("Score: ", model.score(X_test, y_test))
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
print(classification_report(y_test, y_pred))
Score:  0.8222222222222222
Accuracy: 0.8222222222222222
              precision    recall  f1-score   support

           0       0.33      0.10      0.16        29
           1       0.33      0.10      0.16        29
           2       0.33      0.10      0.16        29
           3       0.33      0.10      0.16        29
           4       0.33      0.10      0.16        29
           5       0.33      0.10      0.16        29
           6       0.33      0.10      0.16        29
           7       0.33      0.10      0.16        29
           8       0.33      0.10      0.16        29
           9       0.33      0.10      0.16        29
          10       0.33      0.10      0.16        29
          11       0.33      0.10      0.16        29
          12       0.33      0.10      0.16        29
          13       0.33      0.10      0.16        29
          14       0.33      0.10      0.16        29
          15       0.33      0.10      0.16        29
          16       0.33      0.10      0.16        29
          17       0.33      0.10      0.16        29
          18       0.33      0.10      0.16        29
          19       0.33      0.10      0.16        29
          20       0.33      0.10      0.16        29
          21       0.33      0.10      0.16        29
          22       0.33      0.10      0.16        29
          23       0.33      0.10      0.16        29
          24       0.33      0.10      0.16        29
          25       0.33      0.10      0.16        29
          26       0.33      0.10      0.16        29
          27       0.33      0.10      0.16        29
          28       0.33      0.10      0.16        29
          29       0.33      0.10      0.16        29
          30       0.33      0.10      0.16        29
          31       0.33      0.10      0.16        29
          32       0.33      0.10      0.16        29
          33       0.33      0.10      0.16        29
          34       0.33      0.10      0.16        29
          35       0.33      0.10      0.16        29
          36       0.33      0.10      0.16        29
          37       0.33      0.10      0.16        29
          38       0.33      0.10      0.16        29
          39       0.33      0.10      0.16        29
          40       0.33      0.10      0.16        29
          41       0.33      0.10      0.16        29
          42       0.33      0.10      0.16        29
          43       0.33      0.10      0.16        29
          44       0.33      0.10      0.16        29
          45       0.33      0.10      0.16        29
          46       0.33      0.10      0.16        29
          47       0.33      0.10      0.16        29
          48       0.33      0.10      0.16        29
          49       0.33      0.10      0.16        29
          50       0.33      0.10      0.16        29
          51       0.33      0.10      0.16        29
          52       0.33      0.10      0.16        29
          53       0.33      0.10      0.16        29
          54       0.33      0.10      0.16        29
          55       0.33      0.10      0.16        29
          56       0.33      0.10      0.16        29
          57       0.33      0.10      0.16        29
          58       0.33      0.10      0.16        29
          59       0.33      0.10      0.16        29
          60       0.33      0.10      0.16        29
          61       0.33      0.10      0.16        29
          62       0.33      0.10      0.16        29
          63       0.33      0.10      0.16        29
          64       0.33      0.10      0.16        29
          65       0.33      0.10      0.16        29
          66       0.33      0.10      0.16        29
          67       0.33      0.10      0.16        29
          68       0.33      0.10      0.16        29
          69       0.33      0.10      0.16        29
          70       0.33      0.10      0.16        29
          71       0.33      0.10      0.16        29
          72       0.33      0.10      0.16        29
          73       0.33      0.10      0.16        29
          74       0.33      0.10      0.16        29
          75       0.33      0.10      0.16        29
          76       0.33      0.10      0.16        29
          77       0.33      0.10      0.16        29
          78       0.33      0.10      0.16        29
          79       0.33      0.10      0.16        29
          80       0.33      0.10      0.16        29
          81       0.33      0.10      0.16        29
          82       0.33      0.10      0.16        29
          83       0.33      0.10      0.16        29
          84       0.33      0.10      0.16        29
          85       0.33      0.10      0.16        29
          86       0.33      0.10      0.16        29
          87       0.33      0.10      0.16        29
          88       0.33      0.10      0.16        29
          89       0.33      0.10      0.16        29
          90       0.33      0.10      0.16        29
          91       0.33      0.10      0.16        29
          92       0.33      0.10      0.16        29
          93       0.33      0.10      0.16        29
          94       0.33      0.10      0.16        29
          95       0.33      0.10      0.16        29
          96       0.33      0.10      0.16        29
          97       0.33      0.10      0.16        29
          98       0.33      0.10      0.16        29
          99       0.33      0.10      0.16        29
         100       0.33      0.10      0.16        29
         101       0.33      0.10      0.16        29
         102       0.33      0.10      0.16        29
         103       0.33      0.10      0.16        29
         104       0.33      0.10      0.16        29
         105       0.33      0.10      0.16        29
         106       0.33      0.10      0.16        29
         107       0.33      0.10      0.16        29
         108       0.33      0.10      0.16        29
         109       0.33      0.10      0.16        29
         110       0.33      0.10      0.16        29
         111       0.33      0.10      0.16        29
         112       0.33      0.10      0.16        29
         113       0.33      0.10      0.16        29
         114       0.33      0.10      0.16        29
         115       0.33      0.10      0.16        29
         116       0.33      0.10      0.16        29
         117       0.33      0.10      0.16        29
         118       0.33      0.10      0.16        29
         119       0.33      0.10      0.16        29
         120       0.33      0.10      0.16        29
         121       0.33      0.10      0.16        29
         122       0.33      0.10      0.16        29
         123       0.33      0.10      0.16        29
         124       0.33      0.10      0.16        29
         125       0.33      0.10      0.16        29
         126       0.33      0.10      0.16        29
         127       0.33      0.10      0.16        29
         128       0.33      0.10      0.16        29
         129       0.33      0.10      0.16        29
         130       0.33      0.10      0.16        29
         131       0.33      0.10      0.16        29
         132       0.33      0.10      0.16        29

   micro avg       0.33      0.10      0.16      3857
   macro avg       0.33      0.10      0.16      3857
weighted avg       0.33      0.10      0.16      3857
 samples avg       0.02      0.02      0.02      3857

C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in samples with no predicted labels. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Recall is ill-defined and being set to 0.0 in samples with no true labels. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in samples with no true nor predicted labels. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))

Hyperparameter Tuning¶

In [ ]:
from sklearn.neighbors import KNeighborsClassifier
param_grid = {
    'kneighborsclassifier__n_neighbors': [3, 5, 7, 9]  # List of k values to try
}


pipeline = make_pipeline(StandardScaler(), KNeighborsClassifier())

grid_search = GridSearchCV(estimator=pipeline, param_grid=param_grid, cv=5)

grid_search.fit(X_train, y_train)

best_params = grid_search.best_params_
best_score=grid_search.best_estimator_.score(X_test, y_test)

print("Best Parameters:", best_params)
print("Best Score:", best_score)
print(classification_report(y_test, y_pred))
Best Parameters: {'kneighborsclassifier__n_neighbors': 7}
Best Score: 0.8166666666666667
              precision    recall  f1-score   support

           0       0.33      0.10      0.16        29
           1       0.33      0.10      0.16        29
           2       0.33      0.10      0.16        29
           3       0.33      0.10      0.16        29
           4       0.33      0.10      0.16        29
           5       0.33      0.10      0.16        29
           6       0.33      0.10      0.16        29
           7       0.33      0.10      0.16        29
           8       0.33      0.10      0.16        29
           9       0.33      0.10      0.16        29
          10       0.33      0.10      0.16        29
          11       0.33      0.10      0.16        29
          12       0.33      0.10      0.16        29
          13       0.33      0.10      0.16        29
          14       0.33      0.10      0.16        29
          15       0.33      0.10      0.16        29
          16       0.33      0.10      0.16        29
          17       0.33      0.10      0.16        29
          18       0.33      0.10      0.16        29
          19       0.33      0.10      0.16        29
          20       0.33      0.10      0.16        29
          21       0.33      0.10      0.16        29
          22       0.33      0.10      0.16        29
          23       0.33      0.10      0.16        29
          24       0.33      0.10      0.16        29
          25       0.33      0.10      0.16        29
          26       0.33      0.10      0.16        29
          27       0.33      0.10      0.16        29
          28       0.33      0.10      0.16        29
          29       0.33      0.10      0.16        29
          30       0.33      0.10      0.16        29
          31       0.33      0.10      0.16        29
          32       0.33      0.10      0.16        29
          33       0.33      0.10      0.16        29
          34       0.33      0.10      0.16        29
          35       0.33      0.10      0.16        29
          36       0.33      0.10      0.16        29
          37       0.33      0.10      0.16        29
          38       0.33      0.10      0.16        29
          39       0.33      0.10      0.16        29
          40       0.33      0.10      0.16        29
          41       0.33      0.10      0.16        29
          42       0.33      0.10      0.16        29
          43       0.33      0.10      0.16        29
          44       0.33      0.10      0.16        29
          45       0.33      0.10      0.16        29
          46       0.33      0.10      0.16        29
          47       0.33      0.10      0.16        29
          48       0.33      0.10      0.16        29
          49       0.33      0.10      0.16        29
          50       0.33      0.10      0.16        29
          51       0.33      0.10      0.16        29
          52       0.33      0.10      0.16        29
          53       0.33      0.10      0.16        29
          54       0.33      0.10      0.16        29
          55       0.33      0.10      0.16        29
          56       0.33      0.10      0.16        29
          57       0.33      0.10      0.16        29
          58       0.33      0.10      0.16        29
          59       0.33      0.10      0.16        29
          60       0.33      0.10      0.16        29
          61       0.33      0.10      0.16        29
          62       0.33      0.10      0.16        29
          63       0.33      0.10      0.16        29
          64       0.33      0.10      0.16        29
          65       0.33      0.10      0.16        29
          66       0.33      0.10      0.16        29
          67       0.33      0.10      0.16        29
          68       0.33      0.10      0.16        29
          69       0.33      0.10      0.16        29
          70       0.33      0.10      0.16        29
          71       0.33      0.10      0.16        29
          72       0.33      0.10      0.16        29
          73       0.33      0.10      0.16        29
          74       0.33      0.10      0.16        29
          75       0.33      0.10      0.16        29
          76       0.33      0.10      0.16        29
          77       0.33      0.10      0.16        29
          78       0.33      0.10      0.16        29
          79       0.33      0.10      0.16        29
          80       0.33      0.10      0.16        29
          81       0.33      0.10      0.16        29
          82       0.33      0.10      0.16        29
          83       0.33      0.10      0.16        29
          84       0.33      0.10      0.16        29
          85       0.33      0.10      0.16        29
          86       0.33      0.10      0.16        29
          87       0.33      0.10      0.16        29
          88       0.33      0.10      0.16        29
          89       0.33      0.10      0.16        29
          90       0.33      0.10      0.16        29
          91       0.33      0.10      0.16        29
          92       0.33      0.10      0.16        29
          93       0.33      0.10      0.16        29
          94       0.33      0.10      0.16        29
          95       0.33      0.10      0.16        29
          96       0.33      0.10      0.16        29
          97       0.33      0.10      0.16        29
          98       0.33      0.10      0.16        29
          99       0.33      0.10      0.16        29
         100       0.33      0.10      0.16        29
         101       0.33      0.10      0.16        29
         102       0.33      0.10      0.16        29
         103       0.33      0.10      0.16        29
         104       0.33      0.10      0.16        29
         105       0.33      0.10      0.16        29
         106       0.33      0.10      0.16        29
         107       0.33      0.10      0.16        29
         108       0.33      0.10      0.16        29
         109       0.33      0.10      0.16        29
         110       0.33      0.10      0.16        29
         111       0.33      0.10      0.16        29
         112       0.33      0.10      0.16        29
         113       0.33      0.10      0.16        29
         114       0.33      0.10      0.16        29
         115       0.33      0.10      0.16        29
         116       0.33      0.10      0.16        29
         117       0.33      0.10      0.16        29
         118       0.33      0.10      0.16        29
         119       0.33      0.10      0.16        29
         120       0.33      0.10      0.16        29
         121       0.33      0.10      0.16        29
         122       0.33      0.10      0.16        29
         123       0.33      0.10      0.16        29
         124       0.33      0.10      0.16        29
         125       0.33      0.10      0.16        29
         126       0.33      0.10      0.16        29
         127       0.33      0.10      0.16        29
         128       0.33      0.10      0.16        29
         129       0.33      0.10      0.16        29
         130       0.33      0.10      0.16        29
         131       0.33      0.10      0.16        29
         132       0.33      0.10      0.16        29

   micro avg       0.33      0.10      0.16      3857
   macro avg       0.33      0.10      0.16      3857
weighted avg       0.33      0.10      0.16      3857
 samples avg       0.02      0.02      0.02      3857

C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in samples with no predicted labels. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Recall is ill-defined and being set to 0.0 in samples with no true labels. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in samples with no true nor predicted labels. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))

The accuracy scores generally appear to be higher for multilabel classifiers than for the multiclass classifiers. However, we should take these accuracy scores for the multilabel classifiers with a grain of salt. In a multilabel setting, accuracy can be deceptively high when the majority of labels are negative. Since only one type combination (or two types) can be true for each Pokémon (row), all others are false, meaning we are definitely dealing with an imbalanced dataset. Recall is a mathematical measure of how many of the actual positive cases were correctly identified by a model. As we can see, recall scores are generally pretty low for the multilabel classifiers.

Multiclass multioutput Classification¶

Finally, there is Multiclass multioutput Classification, which is supported natively by some scikit-learn models.

Preprocessing¶

Here, we don't want to drop the Type 1 and Type 2 columns because we want to use them as the y vector.

In [ ]:
df = preprocessed_df.copy()

# Some type combinations only occur once so we extract them and add them to the test data
df['Types'] = df[['Type 1', 'Type 2']].apply(lambda x: tuple(filter(lambda y: pd.notna(y), x)), axis=1)

singleton_classes = df['Types'].value_counts()[df['Types'].value_counts() == 1].index.tolist()
singleton_data = df[df['Types'].isin(singleton_classes)]
other_data = df[~df['Types'].isin(singleton_classes)]
df = df.drop(columns=['Types'])
other_data.drop(columns=['Types'], inplace=True)
singleton_data.drop(columns=['Types'], inplace=True)
y = df[['Type 1', 'Type 2']]
df.head()
Out[ ]:
Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
0 Grass Poison 318 45 49 49 65 65 45 1 0
1 Grass Poison 405 60 62 63 80 80 60 1 0
2 Grass Poison 525 80 82 83 100 100 80 1 0
3 Grass Poison 625 80 100 123 122 120 80 1 0
4 Fire None 309 39 52 43 60 50 65 1 0
In [ ]:
y.head()
Out[ ]:
Type 1 Type 2
0 Grass Poison
1 Grass Poison
2 Grass Poison
3 Grass Poison
4 Fire None

Decision Tree¶

We used a multioutput classifier to measure the accuracy score.

We calculated the accuracy for each type individually and together (where both types are correctly predicted).

Because the documentation on MultOutputClassifier's .score() method was somewhat unclear abour how/what it scored, we made our own method to calculate the accuracy for both types together and concluded it was similar to .score().

In [ ]:
from sklearn.multioutput import MultiOutputClassifier

# Train-Test Split
X_train, X_test, y_train, y_test = train_test_split(other_data.drop(columns=['Type 1', 'Type 2']), other_data[['Type 1', 'Type 2']], test_size=0.2, stratify=other_data[['Type 1', 'Type 2']], random_state=42)
X_train = pd.concat([X_train, singleton_data.drop(columns=['Type 1', 'Type 2'])])
y_train = pd.concat([y_train, singleton_data[['Type 1', 'Type 2']]])
X_test = pd.concat([X_test, singleton_data.drop(columns=['Type 1', 'Type 2'])])
y_test = pd.concat([y_test, singleton_data[['Type 1', 'Type 2']]])


base_classifier = DecisionTreeClassifier()

multi_output_classifier = MultiOutputClassifier(base_classifier)
multi_output_classifier.fit(X_train, y_train)
base_classifier.fit(X_train, y_train)

# Step 4: Model Evaluation
y_pred = multi_output_classifier.predict(X_test)

# Evaluate

print("Score: ", multi_output_classifier.score(X_test, y_test))

y_pred = base_classifier.predict(X_test)

# our own score function
a = (y_test == y_pred)
b = []
for i, j in enumerate(a.iterrows()):
    b.append(j[1]['Type 1'] and j[1]['Type 2'])

nb_correct = 0
for i in b:
    if i:
        nb_correct += 1
score_ratio = nb_correct/len(b)
print("score ratio: ",score_ratio)

# accuracy score for each type
accuracy_list=[]
y_test = np.asarray(y_test)
y_pred = np.asarray(y_pred)
for i in range(2):
    accuracy = accuracy_score(y_test[:, i], y_pred[:, i])
    accuracy_list.append(accuracy)
    print("Accuracy type ", i+1, ": ", accuracy )
print("Averaged Accuracy for types: ",np.mean(accuracy_list))
Score:  0.24479166666666666
score ratio:  0.2604166666666667
Accuracy type  1 :  0.3541666666666667
Accuracy type  2 :  0.4479166666666667
Averaged Accuracy for types:  0.4010416666666667

Hyperparameter Tuning¶

In [ ]:
pipeline = make_pipeline(StandardScaler(), MultiOutputClassifier(DecisionTreeClassifier()))

param_dist = {
    "multioutputclassifier__estimator__max_depth": [5, 6, 7, 8, 9, 10, 15, 30, None], 
    "multioutputclassifier__estimator__min_samples_leaf": np.arange(1, 10)
}

# Instantiate the GridSearchCV object
grid_search_cv = GridSearchCV(pipeline, param_grid=param_dist, cv=5)

# Fit grid_search_cv using the data X and labels y.
grid_search_cv.fit(X_train, y_train) 
y_pred = grid_search_cv.predict(X_test)

# Print the best score
print("Tuned Model Parameters: {}".format(grid_search_cv.best_params_))
print("Best score is {}".format(grid_search_cv.best_estimator_.score(X_test, y_test)))
Tuned Model Parameters: {'multioutputclassifier__estimator__max_depth': 9, 'multioutputclassifier__estimator__min_samples_leaf': 8}
Best score is 0.09895833333333333

Random Forest¶

In [ ]:
from sklearn.model_selection import cross_validate, KFold
from sklearn.multioutput import MultiOutputClassifier

base_classifier = RandomForestClassifier()

multi_output_classifier = MultiOutputClassifier(base_classifier)
multi_output_classifier.fit(X_train, y_train)

# Step 4: Model Evaluation
accuracy_list=[]
y_pred = multi_output_classifier.predict(X_test)
print("score: ", multi_output_classifier.score(X_test, y_test))
y_test = np.asarray(y_test)
y_pred = np.asarray(y_pred)
for i in range(2):
    accuracy = accuracy_score(y_test[:, i], y_pred[:, i])
    print("Accuracy type ", i+1, ": ", accuracy )
    accuracy_list.append(accuracy)
print("Averaged Accuracy for types: ",np.mean(accuracy_list))
score:  0.3072916666666667
Accuracy type  1 :  0.390625
Accuracy type  2 :  0.5989583333333334
Averaged Accuracy for types:  0.4947916666666667

Hyperparameter Tuning¶

In [ ]:
pipeline = make_pipeline(StandardScaler(), MultiOutputClassifier(RandomForestClassifier()))

# Setup the parameters and distributions to sample from: param_dist
param_dist = {
    "multioutputclassifier__estimator__max_depth": [5, 10, 15, 30, None],  
    "multioutputclassifier__estimator__min_samples_leaf": np.arange(1, 10),
    "multioutputclassifier__estimator__n_estimators": np.arange(60, 140)
}

# Instantiate the RandomizedSearchCV object: random_grid_search_cv
random_search_cv = RandomizedSearchCV(pipeline, param_distributions=param_dist, n_iter=60, cv=3, random_state=42)
#grid_search_cv = GridSearchCV(pipeline, param_grid=param_dist, cv=3)

# Fit random_search_cv using the data X and labels y
random_search_cv.fit(X_train, y_train)
#grid_search_cv.fit(X_train, y_train)
# Print the best score
print("Best score is {}".format(random_search_cv.best_estimator_.score(X_test, y_test)))
print("Best parameters are {}".format(random_search_cv.best_params_)) 
Best score is 0.3229166666666667
Best parameters are {'multioutputclassifier__estimator__n_estimators': 129, 'multioutputclassifier__estimator__min_samples_leaf': 1, 'multioutputclassifier__estimator__max_depth': 15}

KNeighborsClassifier¶

In [ ]:
from sklearn.multioutput import MultiOutputClassifier

base_classifier = KNeighborsClassifier()

multi_output_classifier = MultiOutputClassifier(base_classifier)
multi_output_classifier.fit(X_train, y_train)

# Step 4: Model Evaluation
y_pred = multi_output_classifier.predict(X_test)
print("score: ", multi_output_classifier.score(X_test, y_test))
y_test = np.asarray(y_test)
y_pred = np.asarray(y_pred)

accuracy_list=[]
for i in range(2):
    accuracy = accuracy_score(y_test[:, i], y_pred[:, i])
    print("Accuracy type ", i+1, ": ", accuracy )
    accuracy_list.append(accuracy)

print("Averaged Accuracy for types: ",np.mean(accuracy_list))
score:  0.10416666666666667
Accuracy type  1 :  0.265625
Accuracy type  2 :  0.4322916666666667
Averaged Accuracy for types:  0.34895833333333337

Hyperparameter Tuning¶

In [ ]:
param_grid = {
    'multioutputclassifier__estimator__n_neighbors': [3, 5, 7, 9]  # List of k values to try
}

pipeline = make_pipeline(StandardScaler(), MultiOutputClassifier(KNeighborsClassifier()))

grid_search = GridSearchCV(estimator=pipeline, param_grid=param_grid, cv=5)

grid_search.fit(X_train, y_train)

best_params = grid_search.best_params_
best_score=grid_search.best_estimator_.score(X_test, y_test)

print("Best Parameters:", best_params)
print("Best Score:", best_score)
Best Parameters: {'multioutputclassifier__estimator__n_neighbors': 7}
Best Score: 0.078125