import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import accuracy_score
We have chosen the Pokémon dataset from https://www.kaggle.com/datasets/abcsds/pokemon . Our objective is to classify Pokémon types based on the stats and find the model and hyperparameters with the best performance. There are multiple ways of going about this. But first, let's explore the dataset.
Exploring the dataset¶
# import the dataset
df = pd.read_csv('data/Pokemon.csv')
original_df = df.copy()
# Print head
df.head()
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Bulbasaur | Grass | Poison | 318 | 45 | 49 | 49 | 65 | 65 | 45 | 1 | False |
1 | 2 | Ivysaur | Grass | Poison | 405 | 60 | 62 | 63 | 80 | 80 | 60 | 1 | False |
2 | 3 | Venusaur | Grass | Poison | 525 | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False |
3 | 3 | VenusaurMega Venusaur | Grass | Poison | 625 | 80 | 100 | 123 | 122 | 120 | 80 | 1 | False |
4 | 4 | Charmander | Fire | NaN | 309 | 39 | 52 | 43 | 60 | 50 | 65 | 1 | False |
# Print info such as data types and number of non-null values
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 800 entries, 0 to 799 Data columns (total 13 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 # 800 non-null int64 1 Name 800 non-null object 2 Type 1 800 non-null object 3 Type 2 414 non-null object 4 Total 800 non-null int64 5 HP 800 non-null int64 6 Attack 800 non-null int64 7 Defense 800 non-null int64 8 Sp. Atk 800 non-null int64 9 Sp. Def 800 non-null int64 10 Speed 800 non-null int64 11 Generation 800 non-null int64 12 Legendary 800 non-null bool dtypes: bool(1), int64(9), object(3) memory usage: 75.9+ KB
# Print summary statistics of numeric types
df.describe()
# | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | |
---|---|---|---|---|---|---|---|---|---|
count | 800.000000 | 800.00000 | 800.000000 | 800.000000 | 800.000000 | 800.000000 | 800.000000 | 800.000000 | 800.00000 |
mean | 362.813750 | 435.10250 | 69.258750 | 79.001250 | 73.842500 | 72.820000 | 71.902500 | 68.277500 | 3.32375 |
std | 208.343798 | 119.96304 | 25.534669 | 32.457366 | 31.183501 | 32.722294 | 27.828916 | 29.060474 | 1.66129 |
min | 1.000000 | 180.00000 | 1.000000 | 5.000000 | 5.000000 | 10.000000 | 20.000000 | 5.000000 | 1.00000 |
25% | 184.750000 | 330.00000 | 50.000000 | 55.000000 | 50.000000 | 49.750000 | 50.000000 | 45.000000 | 2.00000 |
50% | 364.500000 | 450.00000 | 65.000000 | 75.000000 | 70.000000 | 65.000000 | 70.000000 | 65.000000 | 3.00000 |
75% | 539.250000 | 515.00000 | 80.000000 | 100.000000 | 90.000000 | 95.000000 | 90.000000 | 90.000000 | 5.00000 |
max | 721.000000 | 780.00000 | 255.000000 | 190.000000 | 230.000000 | 194.000000 | 230.000000 | 180.000000 | 6.00000 |
# Check for missing values
print(df.isnull().sum())
# 0 Name 0 Type 1 0 Type 2 386 Total 0 HP 0 Attack 0 Defense 0 Sp. Atk 0 Sp. Def 0 Speed 0 Generation 0 Legendary 0 dtype: int64
Some pokémon don't have a second type. We'll need to handle these missing values later
# Plot the correlation matrix
sns.heatmap(df.drop(['#', 'Name'], axis=1).select_dtypes(include=[np.number, bool]).corr(), square=True, cmap='RdYlGn');
As you can see, the stat total is correlated with the 6 main stats (HP, attack, defense, sp. attack, sp. defense & speed), which makes sense, as it is the summation of these stats.
Also, the generation is not really correlated (much) with anything other than itself. This is not too surprising, as each generation introduces a wide variety of new Pokémon. (The slightly higher correlation with whether the Pokémon is legendary or not, does reflect that later generations featured higher ratios of legendaries among the new Pokémon of that generation than earlier generations.)
The lack of correlation between speed and defense is also something that stands out in this visualisation.
Common preprocessing steps¶
# Drop the Name and ID columns
df.drop(['#', 'Name'], axis=1, inplace=True)
# Covert the Legendary column from bool to int
df.Legendary = df.Legendary.astype(int)
# Print the head of the dataframe
df.head()
Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Grass | Poison | 318 | 45 | 49 | 49 | 65 | 65 | 45 | 1 | 0 |
1 | Grass | Poison | 405 | 60 | 62 | 63 | 80 | 80 | 60 | 1 | 0 |
2 | Grass | Poison | 525 | 80 | 82 | 83 | 100 | 100 | 80 | 1 | 0 |
3 | Grass | Poison | 625 | 80 | 100 | 123 | 122 | 120 | 80 | 1 | 0 |
4 | Fire | NaN | 309 | 39 | 52 | 43 | 60 | 50 | 65 | 1 | 0 |
We will handle the missing values by filling it up by a 'None' type
# Handling missing values in 'Type 2' column
df['Type 2'] = df['Type 2'].fillna('None')
print(df.isnull().sum())
df.head()
Type 1 0 Type 2 0 Total 0 HP 0 Attack 0 Defense 0 Sp. Atk 0 Sp. Def 0 Speed 0 Generation 0 Legendary 0 dtype: int64
Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Grass | Poison | 318 | 45 | 49 | 49 | 65 | 65 | 45 | 1 | 0 |
1 | Grass | Poison | 405 | 60 | 62 | 63 | 80 | 80 | 60 | 1 | 0 |
2 | Grass | Poison | 525 | 80 | 82 | 83 | 100 | 100 | 80 | 1 | 0 |
3 | Grass | Poison | 625 | 80 | 100 | 123 | 122 | 120 | 80 | 1 | 0 |
4 | Fire | None | 309 | 39 | 52 | 43 | 60 | 50 | 65 | 1 | 0 |
We copy the dataframe to use it later
preprocessed_df = df.copy()
As stated before, there are many ways to classify multipe classes.
- Multiclass Classification:
- 1.1. Accounting for Order of Types
- For this task, we need to make every possible combination a class, where the order of types matters.
- We can consider using classifiers like:
tree.DecisionTreeClassifier
neighbors.KNeighborsClassifier
linear_model.LogisticRegression
ensemble.RandomForestClassifier
svm.SVC
- 1.2. Ignoring Order of Types:
- For this task, where the order of types doesn't matter, the same classifiers as above can be used. We just need to ensure that our encoding of the labels reflects this.
- Multilabel Classification:
2.1. Accounting for Order of Types
- For multilabel classification where each Pokémon can have multiple types and we account for order of types, we can use:
tree.DecisionTreeClassifier
neighbors.KNeighborsClassifier
ensemble.RandomForestClassifier
- For multilabel classification where each Pokémon can have multiple types and we account for order of types, we can use:
2.2 Ignoring Order of Types
- For this task, where the order of types doesn't matter, the same classifiers as above can be used. We just need to ensure that our encoding of the labels reflects this.
- Multiclass Multioutput Classification:
- For multiclass multioutput classification where each Pokémon can have two types predicted independently, we can use classifiers like:
tree.DecisionTreeClassifier
neighbors.KNeighborsClassifier
ensemble.RandomForestClassifier
- For multiclass multioutput classification where each Pokémon can have two types predicted independently, we can use classifiers like:
Multi-class classification¶
Accounting for Order of Types¶
Preprocessing¶
We make a new column 'Types' where the combination of type 1 and type 2 are stored in tuples, which is ordered by default
df = preprocessed_df.copy()
df['Types'] = df[['Type 1', 'Type 2']].apply(lambda x: tuple(filter(lambda y: pd.notna(y), x)), axis=1)
df.Types = df.Types.astype(str)
print(len(df['Types'].unique()))
# print two pokemon types where we know type 1 and two are the same but reversed to check if order is kept into account
print("Sableye: ",df['Types'][326])
print("Spiritomb: ",df['Types'][490])
# drop the Type 1 and Type 2 columns
df.drop(['Type 1', 'Type 2'], axis=1, inplace=True)
# print head
df.head()
154 Sableye: ('Dark', 'Ghost') Spiritomb: ('Ghost', 'Dark')
Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | Types | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 318 | 45 | 49 | 49 | 65 | 65 | 45 | 1 | 0 | ('Grass', 'Poison') |
1 | 405 | 60 | 62 | 63 | 80 | 80 | 60 | 1 | 0 | ('Grass', 'Poison') |
2 | 525 | 80 | 82 | 83 | 100 | 100 | 80 | 1 | 0 | ('Grass', 'Poison') |
3 | 625 | 80 | 100 | 123 | 122 | 120 | 80 | 1 | 0 | ('Grass', 'Poison') |
4 | 309 | 39 | 52 | 43 | 60 | 50 | 65 | 1 | 0 | ('Fire', 'None') |
# show the distribution of pokemon types
sns.countplot(df, y='Types');
As you can see, the dataset is heavily imbalanced. Some combinations only occur once. We decided to extract them and put them in both train and test sets to stratify the data better.
# Some type combinations only occur once so we double them to stratify the data better
singleton_classes = df['Types'].value_counts()[df['Types'].value_counts() == 1].index.tolist()
singleton_data = df[df['Types'].isin(singleton_classes)]
other_data = df[~df['Types'].isin(singleton_classes)]
print("Number of singleton classes",len(singleton_classes))
print("number of unique type combinations",len(df['Types'].unique()))
print(len(df['Types']))
df.head()
Number of singleton classes 39 number of unique type combinations 154 800
Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | Types | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 318 | 45 | 49 | 49 | 65 | 65 | 45 | 1 | 0 | ('Grass', 'Poison') |
1 | 405 | 60 | 62 | 63 | 80 | 80 | 60 | 1 | 0 | ('Grass', 'Poison') |
2 | 525 | 80 | 82 | 83 | 100 | 100 | 80 | 1 | 0 | ('Grass', 'Poison') |
3 | 625 | 80 | 100 | 123 | 122 | 120 | 80 | 1 | 0 | ('Grass', 'Poison') |
4 | 309 | 39 | 52 | 43 | 60 | 50 | 65 | 1 | 0 | ('Fire', 'None') |
Decision tree¶
The single classes are added to both the training and test sets after being stratified on the rest of the data. This makes the actual test size slightly larger.
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
# Split the data into training and testing sets
X = df.drop(columns=['Types'])
y = df['Types']
X_train, X_test, y_train, y_test = train_test_split(other_data.drop(columns=['Types']), other_data['Types'], test_size=0.2, stratify=other_data['Types'], random_state=42)
X_train = pd.concat([X_train, singleton_data.drop(columns=['Types'])])
y_train = pd.concat([y_train, singleton_data['Types']])
X_test = pd.concat([X_test, singleton_data.drop(columns=['Types'])])
y_test = pd.concat([y_test, singleton_data['Types']])
print("actual test size:",len(X_test)/(len(X_train)+len(X_test)))
actual test size: 0.22884386174016685
from sklearn.metrics import accuracy_score, f1_score, fbeta_score
# Initialize and train the decision tree classifier
model = DecisionTreeClassifier(random_state=42)
model.fit(X_train, y_train)
# Predict labels for the test set
y_pred = model.predict(X_test)
# Calculate accuracy
print("Score: ", model.score(X_test, y_test))
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
Score: 0.2604166666666667 Accuracy: 0.2604166666666667
Hyperparameter Tuning¶
For hyperparameter tuning we use GridSearchCV and a pipeline with a StandardScaler to make the data more uniform.
# Import GridSearchCV
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RandomizedSearchCV
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
# Setup the parameters and distributions to sample from: param_dist
pipeline = make_pipeline(StandardScaler(), DecisionTreeClassifier())
param_dist = {
"decisiontreeclassifier__max_depth": np.arange(5, 15),
"decisiontreeclassifier__min_samples_leaf": np.arange(1, 5)
}
# Instantiate the GridSearchCV object
grid_search_cv = GridSearchCV(pipeline, param_grid=param_dist, cv=5)
# Fit grid_search_cv using the data X and labels y.
grid_search_cv.fit(X_train, y_train)
y_pred = grid_search_cv.predict(X_test)
# Print the best score
print("Tuned Model Parameters: {}".format(grid_search_cv.best_params_))
print("Accuracy: {}".format(grid_search_cv.best_estimator_.score(X_test, y_test)))
#print(classification_report(y_test, y_pred))
y_pred = grid_search_cv.best_estimator_.predict(X_test)
c:\Users\Chau\miniconda3\Lib\site-packages\sklearn\model_selection\_split.py:737: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=5. warnings.warn(
Tuned Model Parameters: {'decisiontreeclassifier__max_depth': 8, 'decisiontreeclassifier__min_samples_leaf': 4} Accuracy: 0.10416666666666667
Hyperparameter tuning can sometimes lead to worse results than using default settings. This can occur when the tuning process, typically done via cross-validation on the training data, inadvertently overfits the model. Overfitting happens when the model learns not only the underlying patterns but also the noise in the training data, which reduces its ability to generalize to new, unseen data. This issue is exacerbated when the dataset has classes with very few members, leading to unreliable splits during cross-validation.
Random forest¶
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)
# Predict labels for the test set
y_pred = model.predict(X_test)
score = model.score(X_test, y_test)
# Calculate accuracy
print("Score :", score )
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
Score : 0.3072916666666667 Accuracy: 0.3072916666666667
Hyperparameter Tuning¶
# Import GridSearchCV
pipeline = make_pipeline(StandardScaler(), RandomForestClassifier())
# Setup the parameters and distributions to sample from: param_dist
param_dist = {
"randomforestclassifier__max_depth": np.arange(5, 20),
"randomforestclassifier__min_samples_leaf": np.arange(1, 10),
"randomforestclassifier__n_estimators": np.arange(50, 150, 5)
}
# Instantiate the RandomizedSearchCV object: random_grid_search_cv
random_search_cv = RandomizedSearchCV(pipeline, param_distributions=param_dist, n_iter=100, cv=3, random_state=42)
#grid_search_cv = GridSearchCV(pipeline, param_grid=param_dist, cv=3)
# Fit random_search_cv using the data X and labels y
random_search_cv.fit(X_train, y_train)
#grid_search_cv.fit(X_train, y_train)
# Print the best score
print("Best score is {}".format(random_search_cv.best_estimator_.score(X_test, y_test)))
print("Best parameters are {}".format(random_search_cv.best_params_))
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\model_selection\_split.py:737: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=3. warnings.warn(
Best score is 0.3020833333333333 Best parameters are {'randomforestclassifier__n_estimators': 85, 'randomforestclassifier__min_samples_leaf': 3, 'randomforestclassifier__max_depth': 15}
Support Vector Machine¶
Radial base function¶
from sklearn.svm import SVC
model = SVC(kernel='rbf', random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
Accuracy: 0.078125
Hyperparameter Tuning¶
from sklearn.svm import SVC
# Define the pipeline with StandardScaler and SVC
pipeline = make_pipeline(StandardScaler(), SVC(kernel='rbf'))
# Define the parameter grid
param_grid = {
'svc__C': [0.1,0.5, 1,5, 10], # Regularization parameter
'svc__coef0': [0.0, 1.0, 2.0], # Independent term in the polynomial kernel function
}
# Initialize GridSearchCV
grid_search = GridSearchCV(pipeline, param_grid, cv=5, n_jobs=-1)
# Perform grid search
grid_search.fit(X_train, y_train)
# Get the best parameters and score
best_params = grid_search.best_params_
best_score=grid_search.best_estimator_.score(X_test, y_test)
print("Best Parameters:", best_params)
print("Best score is {}".format(grid_search.best_estimator_.score(X_test, y_test)))
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\model_selection\_split.py:737: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=5. warnings.warn(
Best Parameters: {'svc__C': 5, 'svc__coef0': 0.0} Best score is 0.21354166666666666
Linear¶
svm_classifier = SVC(kernel='linear', random_state=42)
# Train the SVM classifier
svm_classifier.fit(X_train, y_train)
y_pred = svm_classifier.predict(X_test)
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
Accuracy: 0.3072916666666667
Hyperparameter tuning¶
from sklearn.svm import SVC
# Define the pipeline with StandardScaler and SVC
pipeline = make_pipeline(StandardScaler(), SVC(kernel='linear', random_state=42))
# Define the parameter grid
param_grid = {
'svc__C': [0.1,0.5, 1 , 5, 10], # Regularization parameter # Degree of the polynomial kernel
'svc__coef0': [0.0, 1.0, 2.0], # Independent term in the polynomial kernel function
}
# Initialize GridSearchCV
grid_search = GridSearchCV(pipeline, param_grid, cv=5, n_jobs=-1)
# Perform grid search
grid_search.fit(X_train, y_train)
# Get the best parameters and score
best_params = grid_search.best_params_
best_score=grid_search.best_estimator_.score(X_test, y_test)
print("Best Parameters:", best_params)
print("Best Score:", best_score)
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\model_selection\_split.py:737: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=5. warnings.warn(
Best Parameters: {'svc__C': 1, 'svc__coef0': 0.0} Best Score: 0.22916666666666666
Polynomial¶
model = SVC(kernel='poly', random_state=42)
# Train the SVM classifier
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
Accuracy: 0.09375
Hyperparameter Tuning¶
from sklearn.svm import SVC
# Define the pipeline with StandardScaler and SVC
pipeline = make_pipeline(StandardScaler(), SVC(kernel='poly'))
# Define the parameter grid
param_grid = {
'svc__C': [0.1,0.5, 1, 5, 10], # Regularization parameter
'svc__degree': [2, 3, 4, 5, 6], # Degree of the polynomial kernel
'svc__coef0': [0.0, 1.0, 2.0], # Independent term in the polynomial kernel function
}
# Initialize GridSearchCV
grid_search = GridSearchCV(pipeline, param_grid, cv=5, n_jobs=-1)
# Perform grid search
grid_search.fit(X_train, y_train)
# Get the best parameters and score
best_params = grid_search.best_params_
best_score=grid_search.best_estimator_.score(X_test, y_test)
print("Best Parameters:", best_params)
print("Best Score:", best_score)
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\model_selection\_split.py:737: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=5. warnings.warn(
Best Parameters: {'svc__C': 1, 'svc__coef0': 2.0, 'svc__degree': 2} Best Score: 0.20833333333333334
Sigmoid¶
model = SVC(kernel='sigmoid', random_state=42)
# Train the SVM classifier
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
# Calculate accuracy
print("Score: ", model.score(X_test, y_test))
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
Score: 0.057291666666666664 Accuracy: 0.057291666666666664
Hyperparameter Tuning¶
from sklearn.svm import SVC
# Define the pipeline with StandardScaler and SVC
pipeline = make_pipeline(StandardScaler(), SVC(kernel='sigmoid'))
# Define the parameter grid
param_grid = {
'svc__C': [0.1,0.5, 1,5, 10], # Regularization parameter
'svc__coef0': [0.0, 1.0, 2.0], # Independent term in the polynomial kernel function
}
# Initialize GridSearchCV
grid_search = GridSearchCV(pipeline, param_grid, cv=5, n_jobs=-1)
# Perform grid search
grid_search.fit(X_train, y_train)
# Get the best parameters and score
best_params = grid_search.best_params_
best_score=grid_search.best_estimator_.score(X_test, y_test)
print("Best Parameters:", best_params)
print("Best Score:", best_score)
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\model_selection\_split.py:737: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=5. warnings.warn(
Best Parameters: {'svc__C': 1, 'svc__coef0': 0.0} Best Score: 0.06770833333333333
k Nearest neighbors¶
from sklearn.neighbors import KNeighborsClassifier
model = KNeighborsClassifier()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
# Calculate accuracy
print("Score: ",model.score(X_test, y_test))
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
Score: 0.09375 Accuracy: 0.09375
Hyperparameter Tuning¶
from sklearn.neighbors import KNeighborsClassifier
param_grid = {
'kneighborsclassifier__n_neighbors': [3, 5, 7, 9] # List of k values to try
}
pipeline = make_pipeline(StandardScaler(), KNeighborsClassifier())
grid_search = GridSearchCV(estimator=pipeline, param_grid=param_grid, cv=5)
grid_search.fit(X_train, y_train)
best_params = grid_search.best_params_
best_score=grid_search.best_estimator_.score(X_test, y_test)
print("Best Parameters:", best_params)
print("Best Score:", best_score)
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\model_selection\_split.py:737: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=5. warnings.warn(
Best Parameters: {'kneighborsclassifier__n_neighbors': 7} Best Score: 0.09375
Logistic regression¶
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
# Calculate accuracy
print("Score: ", model.score(X_test, y_test))
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
Score: 0.13020833333333334 Accuracy: 0.13020833333333334
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_logistic.py:469: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result(
model = LogisticRegression( random_state=42, multi_class='auto', solver='liblinear', max_iter=1000)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
# Calculate accuracy
print("Score: ", model.score(X_test, y_test))
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
Score: 0.15625 Accuracy: 0.15625
Hyperparameter tuning¶
param_grid = {
'logisticregression__C': np.logspace(-5, 5, 5),
'logisticregression__penalty': ['l1', 'l2']
}
pipeline = make_pipeline(StandardScaler(), LogisticRegression(solver='liblinear'))
grid_search = GridSearchCV(estimator=pipeline, param_grid=param_grid, cv=5)
grid_search.fit(X_train, y_train)
best_params = grid_search.best_params_
best_score=grid_search.best_estimator_.score(X_test, y_test)
print("Best Parameters:", best_params)
print("Best Score:", best_score)
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\model_selection\_split.py:737: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=5. warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\svm\_base.py:1237: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations. warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\svm\_base.py:1237: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations. warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\svm\_base.py:1237: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations. warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\svm\_base.py:1237: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations. warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\svm\_base.py:1237: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations. warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\svm\_base.py:1237: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations. warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\svm\_base.py:1237: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations. warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\svm\_base.py:1237: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations. warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\svm\_base.py:1237: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations. warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\svm\_base.py:1237: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations. warnings.warn(
Best Parameters: {'logisticregression__C': 316.22776601683796, 'logisticregression__penalty': 'l2'} Best Score: 0.21354166666666666
from sklearn.linear_model import LogisticRegression
model = LogisticRegression( penalty='elasticnet',l1_ratio=0.5, random_state=42, multi_class='auto', solver='saga', max_iter=1000)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
# Calculate accuracy
print("Score: ", model.score(X_test, y_test))
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
Score: 0.15625 Accuracy: 0.15625
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn(
Hyperparameter tuning¶
param_grid = {
'logisticregression__C': np.logspace(-4, 4, 4),
'logisticregression__l1_ratio': np.linspace(0, 1, 10)
}
pipeline = make_pipeline(StandardScaler(), LogisticRegression(penalty='elasticnet', solver='saga'))
grid_search = GridSearchCV(estimator=pipeline, param_grid=param_grid, cv=5)
grid_search.fit(X_train, y_train)
best_params = grid_search.best_params_
best_score=grid_search.best_estimator_.score(X_test, y_test)
print("Best Parameters:", best_params)
print("Best Score:", best_score)
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\model_selection\_split.py:737: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=5. warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn(
Best Parameters: {'logisticregression__C': 10000.0, 'logisticregression__l1_ratio': 0.0} Best Score: 0.22395833333333334
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn(
Ignoring Order of Types¶
Preprocessing¶
We do the same as before, but we sort types in the tuples alphabetically to ignore order.
df = preprocessed_df.copy()
df['Types'] = df[['Type 1', 'Type 2']].apply(lambda x: sorted(tuple(filter(lambda y: pd.notna(y), x))), axis=1)
df.Types = df.Types.astype(str)
# print two pokemon types where we know type 1 and two are the same but reversed to check if order is kept into account
print("Sableye: ",df['Types'][326])
print("Spiritomb: ",df['Types'][490])
# drop the Type 1 and Type 2 columns
df.drop(['Type 1', 'Type 2'], axis=1, inplace=True)
# print head
df.head()
Sableye: ['Dark', 'Ghost'] Spiritomb: ['Dark', 'Ghost']
Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | Types | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 318 | 45 | 49 | 49 | 65 | 65 | 45 | 1 | 0 | ['Grass', 'Poison'] |
1 | 405 | 60 | 62 | 63 | 80 | 80 | 60 | 1 | 0 | ['Grass', 'Poison'] |
2 | 525 | 80 | 82 | 83 | 100 | 100 | 80 | 1 | 0 | ['Grass', 'Poison'] |
3 | 625 | 80 | 100 | 123 | 122 | 120 | 80 | 1 | 0 | ['Grass', 'Poison'] |
4 | 309 | 39 | 52 | 43 | 60 | 50 | 65 | 1 | 0 | ['Fire', 'None'] |
sns.countplot(df, y='Types');
# Check if there are any pokemon with only one type
singleton_classes = df['Types'].value_counts()[df['Types'].value_counts() == 1].index.tolist()
singleton_data = df[df['Types'].isin(singleton_classes)]
other_data = df[~df['Types'].isin(singleton_classes)]
print("Number of singleton classes",len(singleton_classes))
print("number of unique type combinations",len(df['Types'].unique()))
#print(len(df['Types']))
df.head()
Number of singleton classes 24 number of unique type combinations 133
Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | Types | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 318 | 45 | 49 | 49 | 65 | 65 | 45 | 1 | 0 | ['Grass', 'Poison'] |
1 | 405 | 60 | 62 | 63 | 80 | 80 | 60 | 1 | 0 | ['Grass', 'Poison'] |
2 | 525 | 80 | 82 | 83 | 100 | 100 | 80 | 1 | 0 | ['Grass', 'Poison'] |
3 | 625 | 80 | 100 | 123 | 122 | 120 | 80 | 1 | 0 | ['Grass', 'Poison'] |
4 | 309 | 39 | 52 | 43 | 60 | 50 | 65 | 1 | 0 | ['Fire', 'None'] |
Decision tree¶
# Split the data into training and testing sets
X = df.drop(columns=['Types'])
y = df['Types']
X_train, X_test, y_train, y_test = train_test_split(other_data.drop(columns=['Types']), other_data['Types'], test_size=0.2, stratify=other_data['Types'], random_state=42)
X_train = pd.concat([X_train, singleton_data.drop(columns=['Types'])])
y_train = pd.concat([y_train, singleton_data['Types']])
X_test = pd.concat([X_test, singleton_data.drop(columns=['Types'])])
y_test = pd.concat([y_test, singleton_data['Types']])
# Initialize and train the decision tree classifier
model = DecisionTreeClassifier(random_state=42)
model.fit(X_train, y_train)
# Predict labels for the test set
y_pred = model.predict(X_test)
# Calculate accuracy
print("Score: ", model.score(X_test, y_test))
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
Score: 0.2111111111111111 Accuracy: 0.2111111111111111
Hyperparameter tuning¶
# Setup the parameters and distributions to sample from: param_dist
pipeline = make_pipeline(StandardScaler(), DecisionTreeClassifier())
param_dist = {
"decisiontreeclassifier__max_depth": np.arange(5, 15),
"decisiontreeclassifier__min_samples_leaf": np.arange(1, 10)
}
# Instantiate the GridSearchCV object
grid_search_cv = GridSearchCV(pipeline, param_grid=param_dist, cv=5)
# Fit grid_search_cv using the data X and labels y.
grid_search_cv.fit(X_train, y_train)
y_pred = grid_search_cv.predict(X_test)
# Print the best score
print("Tuned Model Parameters: {}".format(grid_search_cv.best_params_))
print("Accuracy: {}".format(grid_search_cv.best_estimator_.score(X_test, y_test)))
c:\Users\Chau\miniconda3\Lib\site-packages\sklearn\model_selection\_split.py:737: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=5. warnings.warn(
Tuned Model Parameters: {'decisiontreeclassifier__max_depth': 5, 'decisiontreeclassifier__min_samples_leaf': 8} Accuracy: 0.07777777777777778
Random forest¶
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)
# Predict labels for the test set
y_pred = model.predict(X_test)
# Calculate accuracy
print("Score: ", model.score(X_test, y_test))
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
Score: 0.2388888888888889 Accuracy: 0.2388888888888889
Hyperparameter Tuning¶
There's an issue with cross-validation because some type combinations occur only once in the training data.
pipeline = make_pipeline(StandardScaler(), RandomForestClassifier())
# Setup the parameters and distributions to sample from: param_dist
param_dist = {
"randomforestclassifier__max_depth": np.arange(5, 15),
"randomforestclassifier__min_samples_leaf": np.arange(1, 10),
"randomforestclassifier__n_estimators": np.arange(60, 140)
}
random_search_cv = RandomizedSearchCV(pipeline, param_distributions=param_dist, n_iter=100, cv=3, random_state=42)
#grid_search_cv = GridSearchCV(pipeline, param_grid=param_dist, cv=3)
# Fit random_search_cv using the data X and labels y
random_search_cv.fit(X_train, y_train)
#grid_search_cv.fit(X_train, y_train)
# Print the best score
print("Best score is {}".format(random_search_cv.best_estimator_.score(X_test, y_test)))
print("Best parameters are {}".format(random_search_cv.best_params_))
c:\Users\Chau\miniconda3\Lib\site-packages\sklearn\model_selection\_split.py:737: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=3. warnings.warn(
Best score is 0.25555555555555554 Best parameters are {'randomforestclassifier__n_estimators': 77, 'randomforestclassifier__min_samples_leaf': 1, 'randomforestclassifier__max_depth': 8}
Support vector machine¶
Radial base function¶
from sklearn.svm import SVC
model = SVC(kernel='rbf', random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
Accuracy: 0.07777777777777778
Hyperparameter Tuning¶
from sklearn.svm import SVC
# Define the pipeline with StandardScaler and SVC
pipeline = make_pipeline(StandardScaler(), SVC(kernel='rbf'))
# Define the parameter grid
param_grid = {
'svc__C': [0.1,0.5, 1,5, 10], # Regularization parameter
'svc__coef0': [0.0, 1.0, 2.0], # Independent term in the polynomial kernel function
}
# Initialize GridSearchCV
grid_search = GridSearchCV(pipeline, param_grid, cv=5, n_jobs=-1)
# Perform grid search
grid_search.fit(X_train, y_train)
# Get the best parameters and score
best_params = grid_search.best_params_
best_score = grid_search.best_estimator_.score(X_test, y_test)
print("Best Parameters:", best_params)
print("Best Score:", best_score)
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\model_selection\_split.py:737: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=5. warnings.warn(
Best Parameters: {'svc__C': 5, 'svc__coef0': 0.0} Best Score: 0.18888888888888888
Linear¶
svm_classifier = SVC(kernel='linear', random_state=42)
# Train the SVM classifier
svm_classifier.fit(X_train, y_train)
y_pred = svm_classifier.predict(X_test)
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
Accuracy: 0.23333333333333334
Hyperparameter tuning¶
from sklearn.svm import SVC
# Define the pipeline with StandardScaler and SVC
pipeline = make_pipeline(StandardScaler(), SVC(kernel='linear'))
# Define the parameter grid
param_grid = {
'svc__C': [0.1,0.5, 1 , 5, 10], # Regularization parameter # Degree of the polynomial kernel
'svc__coef0': [0.0, 1.0, 2.0], # Independent term in the polynomial kernel function
}
# Initialize GridSearchCV
grid_search = GridSearchCV(pipeline, param_grid, cv=5, n_jobs=-1)
# Perform grid search
grid_search.fit(X_train, y_train)
# Get the best parameters and score
best_params = grid_search.best_params_
best_score = grid_search.best_estimator_.score(X_test, y_test)
print("Best Parameters:", best_params)
print("Best Score:", best_score)
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\model_selection\_split.py:737: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=5. warnings.warn(
Best Parameters: {'svc__C': 1, 'svc__coef0': 0.0} Best Score: 0.21666666666666667
Polynomial¶
model = SVC(kernel='poly', random_state=42)
# Train the SVM classifier
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
Accuracy: 0.06666666666666667
Hyperparameter Tuning¶
from sklearn.svm import SVC
# Define the pipeline with StandardScaler and SVC
pipeline = make_pipeline(StandardScaler(), SVC(kernel='poly'))
# Define the parameter grid
param_grid = {
'svc__C': [0.1,0.5, 1,5, 10], # Regularization parameter
'svc__degree': [2, 3, 4, 5, 6], # Degree of the polynomial kernel
'svc__coef0': [0.0, 1.0, 2.0], # Independent term in the polynomial kernel function
}
# Initialize GridSearchCV
grid_search = GridSearchCV(pipeline, param_grid, cv=5, n_jobs=-1)
# Perform grid search
grid_search.fit(X_train, y_train)
# Get the best parameters and score
best_params = grid_search.best_params_
best_score = grid_search.best_estimator_.score(X_test, y_test)
print("Best Parameters:", best_params)
print("Best Score:", best_score)
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\model_selection\_split.py:737: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=5. warnings.warn(
Best Parameters: {'svc__C': 1, 'svc__coef0': 1.0, 'svc__degree': 3} Best Score: 0.2222222222222222
Sigmoid¶
model = SVC(kernel='sigmoid', random_state=42)
# Train the SVM classifier
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
# Calculate accuracy
print("Score: ", model.score(X_test, y_test))
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
Score: 0.03888888888888889 Accuracy: 0.03888888888888889
Hyperparameter Tuning¶
from sklearn.svm import SVC
# Define the pipeline with StandardScaler and SVC
pipeline = make_pipeline(StandardScaler(), SVC(kernel='sigmoid'))
# Define the parameter grid
param_grid = {
'svc__C': [0.1,0.5, 1,5, 10, 50], # Regularization parameter
'svc__coef0': [0.0, 1.0, 2.0], # Independent term in the polynomial kernel function
}
# Initialize GridSearchCV
grid_search = GridSearchCV(pipeline, param_grid, cv=5, n_jobs=-1)
# Perform grid search
grid_search.fit(X_train, y_train)
# Get the best parameters and score
best_params = grid_search.best_params_
best_score = grid_search.best_estimator_.score(X_test, y_test)
print("Best Parameters:", best_params)
print("Best Score:", best_score)
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\model_selection\_split.py:737: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=5. warnings.warn(
Best Parameters: {'svc__C': 5, 'svc__coef0': 0.0} Best Score: 0.11666666666666667
k Nearest Neighbors¶
from sklearn.neighbors import KNeighborsClassifier
model = KNeighborsClassifier()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
# Calculate accuracy
print("Score: ", model.score(X_test, y_test))
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
Score: 0.11666666666666667 Accuracy: 0.11666666666666667
Hyperparameter Tuning¶
param_grid = {
'kneighborsclassifier__n_neighbors': [3, 5, 7, 9] # List of k values to try
}
pipeline = make_pipeline(StandardScaler(), KNeighborsClassifier())
grid_search = GridSearchCV(estimator=pipeline, param_grid=param_grid, cv=5)
grid_search.fit(X_train, y_train)
best_params = grid_search.best_params_
best_score=grid_search.best_estimator_.score(X_test, y_test)
print("Best Parameters:", best_params)
print("Best Score:", best_score)
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\model_selection\_split.py:737: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=5. warnings.warn(
Best Parameters: {'kneighborsclassifier__n_neighbors': 9} Best Score: 0.10555555555555556
Logistic regression¶
model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
# Calculate accuracy
print("Score: ", model.score(X_test, y_test))
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
Score: 0.1388888888888889 Accuracy: 0.1388888888888889
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_logistic.py:469: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result(
from sklearn.linear_model import LogisticRegression
model = LogisticRegression( random_state=42, multi_class='auto', solver='liblinear', max_iter=1000)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
# Calculate accuracy
print("Score: ", model.score(X_test, y_test))
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
Score: 0.16111111111111112 Accuracy: 0.16111111111111112
Hyperparameter tuning¶
param_grid = {
'logisticregression__C': np.logspace(-5, 5, 5),
'logisticregression__penalty': ['l1', 'l2']
}
pipeline = make_pipeline(StandardScaler(), LogisticRegression(solver='liblinear'))
grid_search = GridSearchCV(estimator=pipeline, param_grid=param_grid, cv=5)
grid_search.fit(X_train, y_train)
best_params = grid_search.best_params_
best_score=grid_search.best_estimator_.score(X_test, y_test)
print("Best Parameters:", best_params)
print("Best Score:", best_score)
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\model_selection\_split.py:737: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=5. warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\svm\_base.py:1237: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations. warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\svm\_base.py:1237: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations. warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\svm\_base.py:1237: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations. warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\svm\_base.py:1237: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations. warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\svm\_base.py:1237: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations. warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\svm\_base.py:1237: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations. warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\svm\_base.py:1237: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations. warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\svm\_base.py:1237: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations. warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\svm\_base.py:1237: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations. warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\svm\_base.py:1237: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations. warnings.warn(
Best Parameters: {'logisticregression__C': 316.22776601683796, 'logisticregression__penalty': 'l2'} Best Score: 0.2111111111111111
from sklearn.linear_model import LogisticRegression
model = LogisticRegression( penalty='elasticnet',l1_ratio=0.5, random_state=42, multi_class='auto', solver='saga', max_iter=1000)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
# Calculate accuracy
print("Score: ", model.score(X_test, y_test))
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
Score: 0.17222222222222222 Accuracy: 0.17222222222222222
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn(
Hyperparameter tuning¶
param_grid = {
'logisticregression__C': np.logspace(-5, 5, 5),
'logisticregression__l1_ratio': np.linspace(0, 1, 10)
}
pipeline = make_pipeline(StandardScaler(), LogisticRegression(penalty='elasticnet', solver='saga'))
grid_search = GridSearchCV(estimator=pipeline, param_grid=param_grid, cv=5)
grid_search.fit(X_train, y_train)
best_params = grid_search.best_params_
best_score=grid_search.best_estimator_.score(X_test, y_test)
print("Best Parameters:", best_params)
print("Best Score:", best_score)
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\model_selection\_split.py:737: UserWarning: The least populated class in y has only 1 members, which is less than n_splits=5. warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn( C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn(
Best Parameters: {'logisticregression__C': 1.0, 'logisticregression__l1_ratio': 0.0} Best Score: 0.15
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\linear_model\_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge warnings.warn(
Multi-label Classification¶
Accounting for Order of Types¶
Preprocessing¶
df = preprocessed_df.copy()
df['Types'] = df[['Type 1', 'Type 2']].apply(lambda x: tuple(filter(lambda y: pd.notna(y), x)), axis=1)
df.Types = df.Types.astype(str)
# print two pokemon types where we know type 1 and two are the same but reversed to check if order is kept into account
print("Sableye: ",df['Types'][326])
print("Spiritomb: ",df['Types'][490])
# drop the Type 1 and Type 2 columns
df.drop(['Type 1', 'Type 2'], axis=1, inplace=True)
# print head
df.head()
Sableye: ('Dark', 'Ghost') Spiritomb: ('Ghost', 'Dark')
Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | Types | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 318 | 45 | 49 | 49 | 65 | 65 | 45 | 1 | 0 | ('Grass', 'Poison') |
1 | 405 | 60 | 62 | 63 | 80 | 80 | 60 | 1 | 0 | ('Grass', 'Poison') |
2 | 525 | 80 | 82 | 83 | 100 | 100 | 80 | 1 | 0 | ('Grass', 'Poison') |
3 | 625 | 80 | 100 | 123 | 122 | 120 | 80 | 1 | 0 | ('Grass', 'Poison') |
4 | 309 | 39 | 52 | 43 | 60 | 50 | 65 | 1 | 0 | ('Fire', 'None') |
# Find classes with only one type
singleton_classes = df['Types'].value_counts()[df['Types'].value_counts() == 1].index.tolist()
To account for order of types, we create binary labels for each type combination.
# Create binary labels for each Pokémon type combination
type_combinations = df['Types'].unique()
for type in type_combinations:
df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
singleton_data = df[df['Types'].isin(singleton_classes)]
other_data = df[~df['Types'].isin(singleton_classes)]
print("Number of singleton classes",len(singleton_classes))
print("Number of unique type combinations",len(df['Types'].unique()))
print(len(df['Types']))
df.head()
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\3351533611.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
Number of singleton classes 39 Number of unique type combinations 154 800
Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | Types | ... | ('Rock', 'Dragon') | ('Rock', 'Ice') | ('Fighting', 'Flying') | ('Electric', 'Fairy') | ('Rock', 'Fairy') | ('Ghost', 'Grass') | ('Flying', 'Dragon') | ('Psychic', 'Ghost') | ('Psychic', 'Dark') | ('Fire', 'Water') | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 318 | 45 | 49 | 49 | 65 | 65 | 45 | 1 | 0 | ('Grass', 'Poison') | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 405 | 60 | 62 | 63 | 80 | 80 | 60 | 1 | 0 | ('Grass', 'Poison') | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 525 | 80 | 82 | 83 | 100 | 100 | 80 | 1 | 0 | ('Grass', 'Poison') | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | 625 | 80 | 100 | 123 | 122 | 120 | 80 | 1 | 0 | ('Grass', 'Poison') | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | 309 | 39 | 52 | 43 | 60 | 50 | 65 | 1 | 0 | ('Fire', 'None') | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
5 rows × 164 columns
# Drop the 'Types' column
df = df.drop(columns=['Types'])
other_data.drop(columns=['Types'], inplace=True)
singleton_data.drop(columns=['Types'], inplace=True)
Decision Tree¶
from sklearn.metrics import classification_report
# Split the data into training and testing sets
y = df[type_combinations]
X_train, X_test, y_train, y_test = train_test_split(other_data.drop(columns=type_combinations), other_data[type_combinations], test_size=0.2, stratify=other_data[type_combinations], random_state=42)
X_train = pd.concat([X_train, singleton_data.drop(columns=type_combinations)])
y_train = pd.concat([y_train, singleton_data[type_combinations]])
X_test = pd.concat([X_test, singleton_data.drop(columns=type_combinations)])
y_test = pd.concat([y_test, singleton_data[type_combinations]])
# Initialize and train the decision tree classifier
model = DecisionTreeClassifier(random_state=42)
model.fit(X_train, y_train)
# Predict labels for the test set
y_pred = model.predict(X_test)
score = model.score(X_test, y_test)
# Calculate accuracy
print("Score: ", score)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy: ", accuracy)
print(classification_report(y_test, y_pred))
Score: 0.25 Accuracy: 0.25 precision recall f1-score support 0 0.20 0.33 0.25 3 1 0.00 0.00 0.00 6 2 0.00 0.00 0.00 1 3 1.00 1.00 1.00 1 4 0.07 0.08 0.08 12 5 0.00 0.00 0.00 4 6 0.00 0.00 0.00 3 7 0.33 0.33 0.33 3 8 0.17 0.20 0.18 5 9 0.12 0.17 0.14 12 10 0.00 0.00 0.00 3 11 0.29 0.33 0.31 6 12 0.00 0.00 0.00 3 13 0.00 0.00 0.00 0 14 0.00 0.00 0.00 3 15 0.00 0.00 0.00 1 16 0.00 0.00 0.00 1 17 0.00 0.00 0.00 1 18 0.00 0.00 0.00 4 19 1.00 1.00 1.00 1 20 0.33 0.25 0.29 8 21 0.00 0.00 0.00 1 22 0.00 0.00 0.00 1 23 0.00 0.00 0.00 1 24 0.00 0.00 0.00 1 25 0.00 0.00 0.00 1 26 0.00 0.00 0.00 1 27 0.00 0.00 0.00 0 28 0.00 0.00 0.00 1 29 0.00 0.00 0.00 7 30 0.00 0.00 0.00 1 31 0.00 0.00 0.00 0 32 0.00 0.00 0.00 2 33 0.00 0.00 0.00 1 34 0.00 0.00 0.00 1 35 0.00 0.00 0.00 1 36 0.00 0.00 0.00 0 37 0.00 0.00 0.00 1 38 0.00 0.00 0.00 2 39 0.00 0.00 0.00 1 40 0.00 0.00 0.00 1 41 0.00 0.00 0.00 0 42 0.00 0.00 0.00 0 43 0.00 0.00 0.00 1 44 1.00 1.00 1.00 1 45 0.00 0.00 0.00 0 46 0.00 0.00 0.00 2 47 0.00 0.00 0.00 1 48 0.00 0.00 0.00 2 49 0.00 0.00 0.00 2 50 0.00 0.00 0.00 1 51 0.00 0.00 0.00 2 52 0.00 0.00 0.00 0 53 0.00 0.00 0.00 2 54 0.00 0.00 0.00 1 55 0.00 0.00 0.00 0 56 0.00 0.00 0.00 1 57 0.00 0.00 0.00 0 58 0.00 0.00 0.00 0 59 1.00 1.00 1.00 1 60 0.00 0.00 0.00 1 61 0.00 0.00 0.00 1 62 0.50 1.00 0.67 1 63 0.00 0.00 0.00 1 64 0.00 0.00 0.00 0 65 0.00 0.00 0.00 0 66 0.50 1.00 0.67 1 67 1.00 1.00 1.00 1 68 0.50 0.50 0.50 2 69 0.00 0.00 0.00 1 70 0.00 0.00 0.00 1 71 0.50 1.00 0.67 1 72 0.00 0.00 0.00 1 73 0.00 0.00 0.00 0 74 1.00 1.00 1.00 1 75 0.00 0.00 0.00 0 76 0.00 0.00 0.00 1 77 0.00 0.00 0.00 1 78 0.00 0.00 0.00 1 79 0.00 0.00 0.00 1 80 0.00 0.00 0.00 1 81 0.00 0.00 0.00 0 82 1.00 1.00 1.00 1 83 0.00 0.00 0.00 0 84 0.00 0.00 0.00 0 85 0.00 0.00 0.00 0 86 0.00 0.00 0.00 0 87 0.00 0.00 0.00 3 88 0.00 0.00 0.00 1 89 0.00 0.00 0.00 2 90 0.00 0.00 0.00 1 91 1.00 1.00 1.00 1 92 1.00 1.00 1.00 1 93 0.50 1.00 0.67 1 94 1.00 1.00 1.00 1 95 0.00 0.00 0.00 1 96 0.00 0.00 0.00 0 97 0.00 0.00 0.00 0 98 0.00 0.00 0.00 1 99 1.00 1.00 1.00 1 100 0.00 0.00 0.00 1 101 0.00 0.00 0.00 0 102 0.50 1.00 0.67 1 103 0.00 0.00 0.00 0 104 0.00 0.00 0.00 1 105 1.00 1.00 1.00 1 106 0.50 1.00 0.67 1 107 0.00 0.00 0.00 1 108 0.00 0.00 0.00 1 109 0.00 0.00 0.00 1 110 0.00 0.00 0.00 1 111 1.00 1.00 1.00 1 112 1.00 1.00 1.00 1 113 0.00 0.00 0.00 0 114 1.00 1.00 1.00 1 115 0.50 1.00 0.67 1 116 0.00 0.00 0.00 0 117 0.00 0.00 0.00 1 118 0.00 0.00 0.00 0 119 0.00 0.00 0.00 0 120 0.00 0.00 0.00 0 121 0.00 0.00 0.00 0 122 0.00 0.00 0.00 0 123 0.00 0.00 0.00 0 124 0.00 0.00 0.00 1 125 1.00 1.00 1.00 1 126 0.00 0.00 0.00 0 127 0.00 0.00 0.00 0 128 0.00 0.00 0.00 1 129 0.00 0.00 0.00 0 130 1.00 1.00 1.00 1 131 1.00 1.00 1.00 1 132 0.00 0.00 0.00 0 133 0.33 1.00 0.50 1 134 1.00 1.00 1.00 1 135 0.00 0.00 0.00 1 136 1.00 1.00 1.00 1 137 0.00 0.00 0.00 0 138 1.00 1.00 1.00 1 139 0.00 0.00 0.00 1 140 0.00 0.00 0.00 0 141 1.00 1.00 1.00 1 142 0.50 1.00 0.67 1 143 0.00 0.00 0.00 0 144 0.00 0.00 0.00 0 145 0.00 0.00 0.00 1 146 1.00 1.00 1.00 1 147 1.00 1.00 1.00 1 148 0.00 0.00 0.00 1 149 1.00 0.50 0.67 2 150 0.00 0.00 0.00 0 151 1.00 1.00 1.00 1 152 1.00 1.00 1.00 1 153 1.00 1.00 1.00 1 micro avg 0.26 0.25 0.25 192 macro avg 0.22 0.25 0.23 192 weighted avg 0.23 0.25 0.23 192 samples avg 0.25 0.25 0.25 192
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result)) C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Recall is ill-defined and being set to 0.0 in labels with no true samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result)) C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no true nor predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result)) C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in samples with no predicted labels. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
Hyperparameter tuning¶
pipeline = make_pipeline(StandardScaler(), DecisionTreeClassifier())
# Setup the parameters
param_dist = {
"decisiontreeclassifier__max_depth": [5, 6, 7, 8, 9, 10, 15, 30, None],
"decisiontreeclassifier__min_samples_leaf": np.arange(1, 10)
}
# Instantiate the GridSearchCV object
grid_search_cv = GridSearchCV(pipeline, param_grid=param_dist, cv=5)
# Fit grid_search_cv using the data X and labels y.
grid_search_cv.fit(X_train, y_train)
y_pred = grid_search_cv.predict(X_test)
# Print the best score
print("Tuned Model Parameters: {}".format(grid_search_cv.best_params_))
print("Accuracy: {}".format(grid_search_cv.best_estimator_.score(X_test, y_test)))
print(classification_report(y_test, y_pred))
Tuned Model Parameters: {'decisiontreeclassifier__max_depth': 30, 'decisiontreeclassifier__min_samples_leaf': 1} Accuracy: 0.2552083333333333 precision recall f1-score support 0 0.20 0.33 0.25 3 1 0.00 0.00 0.00 6 2 0.00 0.00 0.00 1 3 1.00 1.00 1.00 1 4 0.07 0.08 0.08 12 5 0.00 0.00 0.00 4 6 0.00 0.00 0.00 3 7 0.50 0.33 0.40 3 8 0.25 0.40 0.31 5 9 0.06 0.08 0.07 12 10 0.00 0.00 0.00 3 11 0.17 0.17 0.17 6 12 0.33 0.33 0.33 3 13 0.00 0.00 0.00 0 14 0.00 0.00 0.00 3 15 0.00 0.00 0.00 1 16 0.00 0.00 0.00 1 17 0.00 0.00 0.00 1 18 0.00 0.00 0.00 4 19 1.00 1.00 1.00 1 20 0.33 0.25 0.29 8 21 0.00 0.00 0.00 1 22 0.00 0.00 0.00 1 23 0.00 0.00 0.00 1 24 0.00 0.00 0.00 1 25 0.00 0.00 0.00 1 26 0.00 0.00 0.00 1 27 0.00 0.00 0.00 0 28 0.00 0.00 0.00 1 29 0.00 0.00 0.00 7 30 0.00 0.00 0.00 1 31 0.00 0.00 0.00 0 32 0.00 0.00 0.00 2 33 0.00 0.00 0.00 1 34 0.00 0.00 0.00 1 35 0.00 0.00 0.00 1 36 0.00 0.00 0.00 0 37 0.00 0.00 0.00 1 38 0.00 0.00 0.00 2 39 0.00 0.00 0.00 1 40 0.00 0.00 0.00 1 41 0.00 0.00 0.00 0 42 0.00 0.00 0.00 0 43 0.00 0.00 0.00 1 44 1.00 1.00 1.00 1 45 0.00 0.00 0.00 0 46 0.00 0.00 0.00 2 47 0.00 0.00 0.00 1 48 0.00 0.00 0.00 2 49 0.00 0.00 0.00 2 50 0.00 0.00 0.00 1 51 0.00 0.00 0.00 2 52 0.00 0.00 0.00 0 53 0.00 0.00 0.00 2 54 0.00 0.00 0.00 1 55 0.00 0.00 0.00 0 56 0.00 0.00 0.00 1 57 0.00 0.00 0.00 0 58 0.00 0.00 0.00 0 59 0.50 1.00 0.67 1 60 0.00 0.00 0.00 1 61 0.00 0.00 0.00 1 62 0.50 1.00 0.67 1 63 0.00 0.00 0.00 1 64 0.00 0.00 0.00 0 65 0.00 0.00 0.00 0 66 0.50 1.00 0.67 1 67 1.00 1.00 1.00 1 68 0.00 0.00 0.00 2 69 0.00 0.00 0.00 1 70 0.00 0.00 0.00 1 71 0.50 1.00 0.67 1 72 0.00 0.00 0.00 1 73 0.00 0.00 0.00 0 74 1.00 1.00 1.00 1 75 0.00 0.00 0.00 0 76 0.00 0.00 0.00 1 77 0.00 0.00 0.00 1 78 0.00 0.00 0.00 1 79 0.00 0.00 0.00 1 80 0.00 0.00 0.00 1 81 0.00 0.00 0.00 0 82 1.00 1.00 1.00 1 83 0.00 0.00 0.00 0 84 0.00 0.00 0.00 0 85 0.00 0.00 0.00 0 86 0.00 0.00 0.00 0 87 0.00 0.00 0.00 3 88 0.00 0.00 0.00 1 89 0.00 0.00 0.00 2 90 0.00 0.00 0.00 1 91 0.50 1.00 0.67 1 92 1.00 1.00 1.00 1 93 0.50 1.00 0.67 1 94 1.00 1.00 1.00 1 95 0.00 0.00 0.00 1 96 0.00 0.00 0.00 0 97 0.00 0.00 0.00 0 98 0.00 0.00 0.00 1 99 1.00 1.00 1.00 1 100 0.00 0.00 0.00 1 101 0.00 0.00 0.00 0 102 0.50 1.00 0.67 1 103 0.00 0.00 0.00 0 104 0.00 0.00 0.00 1 105 1.00 1.00 1.00 1 106 1.00 1.00 1.00 1 107 0.00 0.00 0.00 1 108 0.00 0.00 0.00 1 109 0.00 0.00 0.00 1 110 0.00 0.00 0.00 1 111 1.00 1.00 1.00 1 112 1.00 1.00 1.00 1 113 0.00 0.00 0.00 0 114 1.00 1.00 1.00 1 115 0.50 1.00 0.67 1 116 0.00 0.00 0.00 0 117 0.00 0.00 0.00 1 118 0.00 0.00 0.00 0 119 0.00 0.00 0.00 0 120 0.00 0.00 0.00 0 121 0.00 0.00 0.00 0 122 0.00 0.00 0.00 0 123 0.00 0.00 0.00 0 124 1.00 1.00 1.00 1 125 1.00 1.00 1.00 1 126 0.00 0.00 0.00 0 127 0.00 0.00 0.00 0 128 0.00 0.00 0.00 1 129 0.00 0.00 0.00 0 130 1.00 1.00 1.00 1 131 1.00 1.00 1.00 1 132 0.00 0.00 0.00 0 133 1.00 1.00 1.00 1 134 1.00 1.00 1.00 1 135 0.00 0.00 0.00 1 136 1.00 1.00 1.00 1 137 0.00 0.00 0.00 0 138 1.00 1.00 1.00 1 139 0.00 0.00 0.00 1 140 0.00 0.00 0.00 0 141 1.00 1.00 1.00 1 142 0.50 1.00 0.67 1 143 0.00 0.00 0.00 0 144 0.00 0.00 0.00 0 145 0.00 0.00 0.00 1 146 1.00 1.00 1.00 1 147 1.00 1.00 1.00 1 148 0.00 0.00 0.00 1 149 1.00 1.00 1.00 2 150 0.00 0.00 0.00 0 151 1.00 1.00 1.00 1 152 1.00 1.00 1.00 1 153 0.50 1.00 0.67 1 micro avg 0.26 0.26 0.26 192 macro avg 0.23 0.26 0.24 192 weighted avg 0.23 0.26 0.24 192 samples avg 0.26 0.26 0.26 192
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result)) C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Recall is ill-defined and being set to 0.0 in labels with no true samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result)) C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no true nor predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result)) C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in samples with no predicted labels. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
Random Forest¶
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)
# Predict labels for the test set
y_pred = model.predict(X_test)
score = model.score(X_test, y_test)
# Calculate accuracy
print("Score: ", score)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
print(classification_report(y_test, y_pred))
Score: 0.203125 Accuracy: 0.203125 precision recall f1-score support 0 0.00 0.00 0.00 3 1 0.00 0.00 0.00 6 2 0.00 0.00 0.00 1 3 1.00 1.00 1.00 1 4 0.00 0.00 0.00 12 5 0.00 0.00 0.00 4 6 0.00 0.00 0.00 3 7 0.00 0.00 0.00 3 8 0.00 0.00 0.00 5 9 0.00 0.00 0.00 12 10 0.00 0.00 0.00 3 11 0.00 0.00 0.00 6 12 0.00 0.00 0.00 3 13 0.00 0.00 0.00 0 14 0.00 0.00 0.00 3 15 0.00 0.00 0.00 1 16 0.00 0.00 0.00 1 17 0.00 0.00 0.00 1 18 0.00 0.00 0.00 4 19 1.00 1.00 1.00 1 20 1.00 0.25 0.40 8 21 0.00 0.00 0.00 1 22 0.00 0.00 0.00 1 23 0.00 0.00 0.00 1 24 0.00 0.00 0.00 1 25 0.00 0.00 0.00 1 26 0.00 0.00 0.00 1 27 0.00 0.00 0.00 0 28 0.00 0.00 0.00 1 29 0.00 0.00 0.00 7 30 0.00 0.00 0.00 1 31 0.00 0.00 0.00 0 32 0.00 0.00 0.00 2 33 0.00 0.00 0.00 1 34 0.00 0.00 0.00 1 35 0.00 0.00 0.00 1 36 0.00 0.00 0.00 0 37 0.00 0.00 0.00 1 38 0.00 0.00 0.00 2 39 0.00 0.00 0.00 1 40 0.00 0.00 0.00 1 41 0.00 0.00 0.00 0 42 0.00 0.00 0.00 0 43 0.00 0.00 0.00 1 44 1.00 1.00 1.00 1 45 0.00 0.00 0.00 0 46 0.00 0.00 0.00 2 47 0.00 0.00 0.00 1 48 0.00 0.00 0.00 2 49 0.00 0.00 0.00 2 50 0.00 0.00 0.00 1 51 0.00 0.00 0.00 2 52 0.00 0.00 0.00 0 53 0.00 0.00 0.00 2 54 0.00 0.00 0.00 1 55 0.00 0.00 0.00 0 56 0.00 0.00 0.00 1 57 0.00 0.00 0.00 0 58 0.00 0.00 0.00 0 59 1.00 1.00 1.00 1 60 0.00 0.00 0.00 1 61 0.00 0.00 0.00 1 62 1.00 1.00 1.00 1 63 0.00 0.00 0.00 1 64 0.00 0.00 0.00 0 65 0.00 0.00 0.00 0 66 1.00 1.00 1.00 1 67 1.00 1.00 1.00 1 68 0.00 0.00 0.00 2 69 0.00 0.00 0.00 1 70 0.00 0.00 0.00 1 71 1.00 1.00 1.00 1 72 0.00 0.00 0.00 1 73 0.00 0.00 0.00 0 74 1.00 1.00 1.00 1 75 0.00 0.00 0.00 0 76 0.00 0.00 0.00 1 77 0.00 0.00 0.00 1 78 0.00 0.00 0.00 1 79 0.00 0.00 0.00 1 80 0.00 0.00 0.00 1 81 0.00 0.00 0.00 0 82 1.00 1.00 1.00 1 83 0.00 0.00 0.00 0 84 0.00 0.00 0.00 0 85 0.00 0.00 0.00 0 86 0.00 0.00 0.00 0 87 0.00 0.00 0.00 3 88 0.00 0.00 0.00 1 89 0.00 0.00 0.00 2 90 0.00 0.00 0.00 1 91 1.00 1.00 1.00 1 92 1.00 1.00 1.00 1 93 1.00 1.00 1.00 1 94 1.00 1.00 1.00 1 95 0.00 0.00 0.00 1 96 0.00 0.00 0.00 0 97 0.00 0.00 0.00 0 98 0.00 0.00 0.00 1 99 1.00 1.00 1.00 1 100 0.00 0.00 0.00 1 101 0.00 0.00 0.00 0 102 1.00 1.00 1.00 1 103 0.00 0.00 0.00 0 104 0.00 0.00 0.00 1 105 1.00 1.00 1.00 1 106 1.00 1.00 1.00 1 107 0.00 0.00 0.00 1 108 0.00 0.00 0.00 1 109 0.00 0.00 0.00 1 110 0.00 0.00 0.00 1 111 1.00 1.00 1.00 1 112 1.00 1.00 1.00 1 113 0.00 0.00 0.00 0 114 1.00 1.00 1.00 1 115 1.00 1.00 1.00 1 116 0.00 0.00 0.00 0 117 0.00 0.00 0.00 1 118 0.00 0.00 0.00 0 119 0.00 0.00 0.00 0 120 0.00 0.00 0.00 0 121 0.00 0.00 0.00 0 122 0.00 0.00 0.00 0 123 0.00 0.00 0.00 0 124 0.00 0.00 0.00 1 125 1.00 1.00 1.00 1 126 0.00 0.00 0.00 0 127 0.00 0.00 0.00 0 128 0.00 0.00 0.00 1 129 0.00 0.00 0.00 0 130 1.00 1.00 1.00 1 131 1.00 1.00 1.00 1 132 0.00 0.00 0.00 0 133 1.00 1.00 1.00 1 134 1.00 1.00 1.00 1 135 0.00 0.00 0.00 1 136 1.00 1.00 1.00 1 137 0.00 0.00 0.00 0 138 1.00 1.00 1.00 1 139 0.00 0.00 0.00 1 140 0.00 0.00 0.00 0 141 1.00 1.00 1.00 1 142 1.00 1.00 1.00 1 143 0.00 0.00 0.00 0 144 0.00 0.00 0.00 0 145 0.00 0.00 0.00 1 146 1.00 1.00 1.00 1 147 1.00 1.00 1.00 1 148 0.00 0.00 0.00 1 149 1.00 0.50 0.67 2 150 0.00 0.00 0.00 0 151 1.00 1.00 1.00 1 152 1.00 1.00 1.00 1 153 1.00 1.00 1.00 1 micro avg 0.95 0.20 0.33 192 macro avg 0.25 0.24 0.24 192 weighted avg 0.24 0.20 0.21 192 samples avg 0.20 0.20 0.20 192
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result)) C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Recall is ill-defined and being set to 0.0 in labels with no true samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result)) C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no true nor predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result)) C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in samples with no predicted labels. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
Hyperparameter Tuning¶
pipeline = make_pipeline(StandardScaler(), RandomForestClassifier())
# Setup the parameters and distributions to sample from: param_dist
param_dist = {
"randomforestclassifier__max_depth": [5, 15, 30, None],
"randomforestclassifier__min_samples_leaf": np.arange(1, 10, 3),
"randomforestclassifier__n_estimators": np.arange(60, 140, 20)
}
random_search_cv = RandomizedSearchCV(pipeline, param_distributions=param_dist, n_iter=75, cv=3, random_state=42)
#grid_search_cv = GridSearchCV(pipeline, param_grid=param_dist, cv=3)
# Fit random_search_cv using the data X and labels y
random_search_cv.fit(X_train, y_train)
#grid_search_cv.fit(X_train, y_train)
# Print the best score
print("Best score is {}".format(random_search_cv.best_estimator_.score(X_test, y_test)))
print("Best parameters are {}".format(random_search_cv.best_params_))
print(classification_report(y_test, y_pred))
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\model_selection\_search.py:318: UserWarning: The total space of parameters 48 is smaller than n_iter=75. Running 48 iterations. For exhaustive searches, use GridSearchCV. warnings.warn(
Best score is 0.203125 Best parameters are {'randomforestclassifier__n_estimators': 120, 'randomforestclassifier__min_samples_leaf': 1, 'randomforestclassifier__max_depth': 30} precision recall f1-score support 0 0.00 0.00 0.00 3 1 0.00 0.00 0.00 6 2 0.00 0.00 0.00 1 3 1.00 1.00 1.00 1 4 0.00 0.00 0.00 12 5 0.00 0.00 0.00 4 6 0.00 0.00 0.00 3 7 0.00 0.00 0.00 3 8 0.00 0.00 0.00 5 9 0.00 0.00 0.00 12 10 0.00 0.00 0.00 3 11 0.00 0.00 0.00 6 12 0.00 0.00 0.00 3 13 0.00 0.00 0.00 0 14 0.00 0.00 0.00 3 15 0.00 0.00 0.00 1 16 0.00 0.00 0.00 1 17 0.00 0.00 0.00 1 18 0.00 0.00 0.00 4 19 1.00 1.00 1.00 1 20 1.00 0.25 0.40 8 21 0.00 0.00 0.00 1 22 0.00 0.00 0.00 1 23 0.00 0.00 0.00 1 24 0.00 0.00 0.00 1 25 0.00 0.00 0.00 1 26 0.00 0.00 0.00 1 27 0.00 0.00 0.00 0 28 0.00 0.00 0.00 1 29 0.00 0.00 0.00 7 30 0.00 0.00 0.00 1 31 0.00 0.00 0.00 0 32 0.00 0.00 0.00 2 33 0.00 0.00 0.00 1 34 0.00 0.00 0.00 1 35 0.00 0.00 0.00 1 36 0.00 0.00 0.00 0 37 0.00 0.00 0.00 1 38 0.00 0.00 0.00 2 39 0.00 0.00 0.00 1 40 0.00 0.00 0.00 1 41 0.00 0.00 0.00 0 42 0.00 0.00 0.00 0 43 0.00 0.00 0.00 1 44 1.00 1.00 1.00 1 45 0.00 0.00 0.00 0 46 0.00 0.00 0.00 2 47 0.00 0.00 0.00 1 48 0.00 0.00 0.00 2 49 0.00 0.00 0.00 2 50 0.00 0.00 0.00 1 51 0.00 0.00 0.00 2 52 0.00 0.00 0.00 0 53 0.00 0.00 0.00 2 54 0.00 0.00 0.00 1 55 0.00 0.00 0.00 0 56 0.00 0.00 0.00 1 57 0.00 0.00 0.00 0 58 0.00 0.00 0.00 0 59 1.00 1.00 1.00 1 60 0.00 0.00 0.00 1 61 0.00 0.00 0.00 1 62 1.00 1.00 1.00 1 63 0.00 0.00 0.00 1 64 0.00 0.00 0.00 0 65 0.00 0.00 0.00 0 66 1.00 1.00 1.00 1 67 1.00 1.00 1.00 1 68 0.00 0.00 0.00 2 69 0.00 0.00 0.00 1 70 0.00 0.00 0.00 1 71 1.00 1.00 1.00 1 72 0.00 0.00 0.00 1 73 0.00 0.00 0.00 0 74 1.00 1.00 1.00 1 75 0.00 0.00 0.00 0 76 0.00 0.00 0.00 1 77 0.00 0.00 0.00 1 78 0.00 0.00 0.00 1 79 0.00 0.00 0.00 1 80 0.00 0.00 0.00 1 81 0.00 0.00 0.00 0 82 1.00 1.00 1.00 1 83 0.00 0.00 0.00 0 84 0.00 0.00 0.00 0 85 0.00 0.00 0.00 0 86 0.00 0.00 0.00 0 87 0.00 0.00 0.00 3 88 0.00 0.00 0.00 1 89 0.00 0.00 0.00 2 90 0.00 0.00 0.00 1 91 1.00 1.00 1.00 1 92 1.00 1.00 1.00 1 93 1.00 1.00 1.00 1 94 1.00 1.00 1.00 1 95 0.00 0.00 0.00 1 96 0.00 0.00 0.00 0 97 0.00 0.00 0.00 0 98 0.00 0.00 0.00 1 99 1.00 1.00 1.00 1 100 0.00 0.00 0.00 1 101 0.00 0.00 0.00 0 102 1.00 1.00 1.00 1 103 0.00 0.00 0.00 0 104 0.00 0.00 0.00 1 105 1.00 1.00 1.00 1 106 1.00 1.00 1.00 1 107 0.00 0.00 0.00 1 108 0.00 0.00 0.00 1 109 0.00 0.00 0.00 1 110 0.00 0.00 0.00 1 111 1.00 1.00 1.00 1 112 1.00 1.00 1.00 1 113 0.00 0.00 0.00 0 114 1.00 1.00 1.00 1 115 1.00 1.00 1.00 1 116 0.00 0.00 0.00 0 117 0.00 0.00 0.00 1 118 0.00 0.00 0.00 0 119 0.00 0.00 0.00 0 120 0.00 0.00 0.00 0 121 0.00 0.00 0.00 0 122 0.00 0.00 0.00 0 123 0.00 0.00 0.00 0 124 0.00 0.00 0.00 1 125 1.00 1.00 1.00 1 126 0.00 0.00 0.00 0 127 0.00 0.00 0.00 0 128 0.00 0.00 0.00 1 129 0.00 0.00 0.00 0 130 1.00 1.00 1.00 1 131 1.00 1.00 1.00 1 132 0.00 0.00 0.00 0 133 1.00 1.00 1.00 1 134 1.00 1.00 1.00 1 135 0.00 0.00 0.00 1 136 1.00 1.00 1.00 1 137 0.00 0.00 0.00 0 138 1.00 1.00 1.00 1 139 0.00 0.00 0.00 1 140 0.00 0.00 0.00 0 141 1.00 1.00 1.00 1 142 1.00 1.00 1.00 1 143 0.00 0.00 0.00 0 144 0.00 0.00 0.00 0 145 0.00 0.00 0.00 1 146 1.00 1.00 1.00 1 147 1.00 1.00 1.00 1 148 0.00 0.00 0.00 1 149 1.00 0.50 0.67 2 150 0.00 0.00 0.00 0 151 1.00 1.00 1.00 1 152 1.00 1.00 1.00 1 153 1.00 1.00 1.00 1 micro avg 0.95 0.20 0.33 192 macro avg 0.25 0.24 0.24 192 weighted avg 0.24 0.20 0.21 192 samples avg 0.20 0.20 0.20 192
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result)) C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Recall is ill-defined and being set to 0.0 in labels with no true samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result)) C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no true nor predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result)) C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in samples with no predicted labels. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
K Nearest neighbors¶
from sklearn.neighbors import KNeighborsClassifier
model = KNeighborsClassifier()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
# Calculate accuracy
print("Score: ", model.score(X_test, y_test))
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
print(classification_report(y_test, y_pred))
Score: 0.010416666666666666 Accuracy: 0.010416666666666666 precision recall f1-score support 0 0.00 0.00 0.00 3 1 0.00 0.00 0.00 6 2 0.00 0.00 0.00 1 3 0.00 0.00 0.00 1 4 0.00 0.00 0.00 12 5 0.00 0.00 0.00 4 6 0.00 0.00 0.00 3 7 0.00 0.00 0.00 3 8 1.00 0.20 0.33 5 9 0.00 0.00 0.00 12 10 0.00 0.00 0.00 3 11 0.00 0.00 0.00 6 12 0.00 0.00 0.00 3 13 0.00 0.00 0.00 0 14 0.00 0.00 0.00 3 15 0.00 0.00 0.00 1 16 0.00 0.00 0.00 1 17 0.00 0.00 0.00 1 18 0.00 0.00 0.00 4 19 0.00 0.00 0.00 1 20 0.00 0.00 0.00 8 21 0.00 0.00 0.00 1 22 0.00 0.00 0.00 1 23 0.00 0.00 0.00 1 24 0.00 0.00 0.00 1 25 0.00 0.00 0.00 1 26 0.00 0.00 0.00 1 27 0.00 0.00 0.00 0 28 0.00 0.00 0.00 1 29 0.00 0.00 0.00 7 30 0.00 0.00 0.00 1 31 0.00 0.00 0.00 0 32 0.00 0.00 0.00 2 33 0.00 0.00 0.00 1 34 0.00 0.00 0.00 1 35 0.00 0.00 0.00 1 36 0.00 0.00 0.00 0 37 0.00 0.00 0.00 1 38 0.00 0.00 0.00 2 39 0.00 0.00 0.00 1 40 0.00 0.00 0.00 1 41 0.00 0.00 0.00 0 42 0.00 0.00 0.00 0 43 0.00 0.00 0.00 1 44 0.00 0.00 0.00 1 45 0.00 0.00 0.00 0 46 0.00 0.00 0.00 2 47 0.00 0.00 0.00 1 48 0.00 0.00 0.00 2 49 0.00 0.00 0.00 2 50 0.00 0.00 0.00 1 51 0.00 0.00 0.00 2 52 0.00 0.00 0.00 0 53 0.00 0.00 0.00 2 54 0.00 0.00 0.00 1 55 0.00 0.00 0.00 0 56 0.00 0.00 0.00 1 57 0.00 0.00 0.00 0 58 0.00 0.00 0.00 0 59 0.00 0.00 0.00 1 60 0.00 0.00 0.00 1 61 0.00 0.00 0.00 1 62 0.00 0.00 0.00 1 63 0.00 0.00 0.00 1 64 0.00 0.00 0.00 0 65 0.00 0.00 0.00 0 66 0.00 0.00 0.00 1 67 0.00 0.00 0.00 1 68 0.00 0.00 0.00 2 69 0.00 0.00 0.00 1 70 0.00 0.00 0.00 1 71 0.00 0.00 0.00 1 72 0.00 0.00 0.00 1 73 0.00 0.00 0.00 0 74 0.00 0.00 0.00 1 75 0.00 0.00 0.00 0 76 0.00 0.00 0.00 1 77 0.00 0.00 0.00 1 78 0.00 0.00 0.00 1 79 0.00 0.00 0.00 1 80 0.00 0.00 0.00 1 81 0.00 0.00 0.00 0 82 0.00 0.00 0.00 1 83 0.00 0.00 0.00 0 84 0.00 0.00 0.00 0 85 0.00 0.00 0.00 0 86 0.00 0.00 0.00 0 87 0.00 0.00 0.00 3 88 0.00 0.00 0.00 1 89 0.00 0.00 0.00 2 90 0.00 0.00 0.00 1 91 0.00 0.00 0.00 1 92 0.00 0.00 0.00 1 93 0.00 0.00 0.00 1 94 0.00 0.00 0.00 1 95 0.00 0.00 0.00 1 96 0.00 0.00 0.00 0 97 0.00 0.00 0.00 0 98 0.00 0.00 0.00 1 99 0.00 0.00 0.00 1 100 0.00 0.00 0.00 1 101 0.00 0.00 0.00 0 102 0.00 0.00 0.00 1 103 0.00 0.00 0.00 0 104 0.00 0.00 0.00 1 105 0.00 0.00 0.00 1 106 0.00 0.00 0.00 1 107 0.00 0.00 0.00 1 108 0.00 0.00 0.00 1 109 0.00 0.00 0.00 1 110 0.00 0.00 0.00 1 111 0.00 0.00 0.00 1 112 0.00 0.00 0.00 1 113 0.00 0.00 0.00 0 114 0.00 0.00 0.00 1 115 0.00 0.00 0.00 1 116 0.00 0.00 0.00 0 117 0.00 0.00 0.00 1 118 0.00 0.00 0.00 0 119 0.00 0.00 0.00 0 120 0.00 0.00 0.00 0 121 0.00 0.00 0.00 0 122 0.00 0.00 0.00 0 123 0.00 0.00 0.00 0 124 0.00 0.00 0.00 1 125 0.00 0.00 0.00 1 126 0.00 0.00 0.00 0 127 0.00 0.00 0.00 0 128 0.00 0.00 0.00 1 129 0.00 0.00 0.00 0 130 0.00 0.00 0.00 1 131 0.00 0.00 0.00 1 132 0.00 0.00 0.00 0 133 0.00 0.00 0.00 1 134 0.00 0.00 0.00 1 135 0.00 0.00 0.00 1 136 0.00 0.00 0.00 1 137 0.00 0.00 0.00 0 138 0.00 0.00 0.00 1 139 0.00 0.00 0.00 1 140 0.00 0.00 0.00 0 141 0.00 0.00 0.00 1 142 0.00 0.00 0.00 1 143 0.00 0.00 0.00 0 144 0.00 0.00 0.00 0 145 0.00 0.00 0.00 1 146 0.00 0.00 0.00 1 147 0.00 0.00 0.00 1 148 0.00 0.00 0.00 1 149 1.00 0.50 0.67 2 150 0.00 0.00 0.00 0 151 0.00 0.00 0.00 1 152 0.00 0.00 0.00 1 153 0.00 0.00 0.00 1 micro avg 0.40 0.01 0.02 192 macro avg 0.01 0.00 0.01 192 weighted avg 0.04 0.01 0.02 192 samples avg 0.01 0.01 0.01 192
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result)) C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Recall is ill-defined and being set to 0.0 in labels with no true samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result)) C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no true nor predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result)) C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in samples with no predicted labels. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
Hyperparameter Tuning¶
param_grid = {
'kneighborsclassifier__n_neighbors': [3, 5, 7, 9] # List of k values to try
}
pipeline = make_pipeline(StandardScaler(), KNeighborsClassifier())
grid_search = GridSearchCV(estimator=pipeline, param_grid=param_grid, cv=5)
grid_search.fit(X_train, y_train)
best_params = grid_search.best_params_
best_score=grid_search.best_estimator_.score(X_test, y_test)
print("Best Parameters:", best_params)
print("Best Score:", best_score)
print(classification_report(y_test, y_pred))
Best Parameters: {'kneighborsclassifier__n_neighbors': 3} Best Score: 0.036458333333333336 precision recall f1-score support 0 0.00 0.00 0.00 3 1 0.00 0.00 0.00 6 2 0.00 0.00 0.00 1 3 0.00 0.00 0.00 1 4 0.00 0.00 0.00 12 5 0.00 0.00 0.00 4 6 0.00 0.00 0.00 3 7 0.00 0.00 0.00 3 8 1.00 0.20 0.33 5 9 0.00 0.00 0.00 12 10 0.00 0.00 0.00 3 11 0.00 0.00 0.00 6 12 0.00 0.00 0.00 3 13 0.00 0.00 0.00 0 14 0.00 0.00 0.00 3 15 0.00 0.00 0.00 1 16 0.00 0.00 0.00 1 17 0.00 0.00 0.00 1 18 0.00 0.00 0.00 4 19 0.00 0.00 0.00 1 20 0.00 0.00 0.00 8 21 0.00 0.00 0.00 1 22 0.00 0.00 0.00 1 23 0.00 0.00 0.00 1 24 0.00 0.00 0.00 1 25 0.00 0.00 0.00 1 26 0.00 0.00 0.00 1 27 0.00 0.00 0.00 0 28 0.00 0.00 0.00 1 29 0.00 0.00 0.00 7 30 0.00 0.00 0.00 1 31 0.00 0.00 0.00 0 32 0.00 0.00 0.00 2 33 0.00 0.00 0.00 1 34 0.00 0.00 0.00 1 35 0.00 0.00 0.00 1 36 0.00 0.00 0.00 0 37 0.00 0.00 0.00 1 38 0.00 0.00 0.00 2 39 0.00 0.00 0.00 1 40 0.00 0.00 0.00 1 41 0.00 0.00 0.00 0 42 0.00 0.00 0.00 0 43 0.00 0.00 0.00 1 44 0.00 0.00 0.00 1 45 0.00 0.00 0.00 0 46 0.00 0.00 0.00 2 47 0.00 0.00 0.00 1 48 0.00 0.00 0.00 2 49 0.00 0.00 0.00 2 50 0.00 0.00 0.00 1 51 0.00 0.00 0.00 2 52 0.00 0.00 0.00 0 53 0.00 0.00 0.00 2 54 0.00 0.00 0.00 1 55 0.00 0.00 0.00 0 56 0.00 0.00 0.00 1 57 0.00 0.00 0.00 0 58 0.00 0.00 0.00 0 59 0.00 0.00 0.00 1 60 0.00 0.00 0.00 1 61 0.00 0.00 0.00 1 62 0.00 0.00 0.00 1 63 0.00 0.00 0.00 1 64 0.00 0.00 0.00 0 65 0.00 0.00 0.00 0 66 0.00 0.00 0.00 1 67 0.00 0.00 0.00 1 68 0.00 0.00 0.00 2 69 0.00 0.00 0.00 1 70 0.00 0.00 0.00 1 71 0.00 0.00 0.00 1 72 0.00 0.00 0.00 1 73 0.00 0.00 0.00 0 74 0.00 0.00 0.00 1 75 0.00 0.00 0.00 0 76 0.00 0.00 0.00 1 77 0.00 0.00 0.00 1 78 0.00 0.00 0.00 1 79 0.00 0.00 0.00 1 80 0.00 0.00 0.00 1 81 0.00 0.00 0.00 0 82 0.00 0.00 0.00 1 83 0.00 0.00 0.00 0 84 0.00 0.00 0.00 0 85 0.00 0.00 0.00 0 86 0.00 0.00 0.00 0 87 0.00 0.00 0.00 3 88 0.00 0.00 0.00 1 89 0.00 0.00 0.00 2 90 0.00 0.00 0.00 1 91 0.00 0.00 0.00 1 92 0.00 0.00 0.00 1 93 0.00 0.00 0.00 1 94 0.00 0.00 0.00 1 95 0.00 0.00 0.00 1 96 0.00 0.00 0.00 0 97 0.00 0.00 0.00 0 98 0.00 0.00 0.00 1 99 0.00 0.00 0.00 1 100 0.00 0.00 0.00 1 101 0.00 0.00 0.00 0 102 0.00 0.00 0.00 1 103 0.00 0.00 0.00 0 104 0.00 0.00 0.00 1 105 0.00 0.00 0.00 1 106 0.00 0.00 0.00 1 107 0.00 0.00 0.00 1 108 0.00 0.00 0.00 1 109 0.00 0.00 0.00 1 110 0.00 0.00 0.00 1 111 0.00 0.00 0.00 1 112 0.00 0.00 0.00 1 113 0.00 0.00 0.00 0 114 0.00 0.00 0.00 1 115 0.00 0.00 0.00 1 116 0.00 0.00 0.00 0 117 0.00 0.00 0.00 1 118 0.00 0.00 0.00 0 119 0.00 0.00 0.00 0 120 0.00 0.00 0.00 0 121 0.00 0.00 0.00 0 122 0.00 0.00 0.00 0 123 0.00 0.00 0.00 0 124 0.00 0.00 0.00 1 125 0.00 0.00 0.00 1 126 0.00 0.00 0.00 0 127 0.00 0.00 0.00 0 128 0.00 0.00 0.00 1 129 0.00 0.00 0.00 0 130 0.00 0.00 0.00 1 131 0.00 0.00 0.00 1 132 0.00 0.00 0.00 0 133 0.00 0.00 0.00 1 134 0.00 0.00 0.00 1 135 0.00 0.00 0.00 1 136 0.00 0.00 0.00 1 137 0.00 0.00 0.00 0 138 0.00 0.00 0.00 1 139 0.00 0.00 0.00 1 140 0.00 0.00 0.00 0 141 0.00 0.00 0.00 1 142 0.00 0.00 0.00 1 143 0.00 0.00 0.00 0 144 0.00 0.00 0.00 0 145 0.00 0.00 0.00 1 146 0.00 0.00 0.00 1 147 0.00 0.00 0.00 1 148 0.00 0.00 0.00 1 149 1.00 0.50 0.67 2 150 0.00 0.00 0.00 0 151 0.00 0.00 0.00 1 152 0.00 0.00 0.00 1 153 0.00 0.00 0.00 1 micro avg 0.40 0.01 0.02 192 macro avg 0.01 0.00 0.01 192 weighted avg 0.04 0.01 0.02 192 samples avg 0.01 0.01 0.01 192
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result)) C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Recall is ill-defined and being set to 0.0 in labels with no true samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result)) C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no true nor predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result)) C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in samples with no predicted labels. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
Ignoring Order of Types 1¶
The first way to ignore order of types is by making a label for each type and matching the data with two labels .
Preprocessing¶
df = preprocessed_df.copy()
# Combine Type 1 and Type 2 into a single column
df['Types'] = df[['Type 1', 'Type 2']].apply(lambda x: tuple(filter(lambda y: pd.notna(y), x)), axis=1)
print(df['Types'][0])
# Get unique Pokémon types
unique_types = np.unique(df['Types'].explode())
df.drop(['Type 1', 'Type 2'], axis=1, inplace=True)
df.head()
('Grass', 'Poison')
Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | Types | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 318 | 45 | 49 | 49 | 65 | 65 | 45 | 1 | 0 | (Grass, Poison) |
1 | 405 | 60 | 62 | 63 | 80 | 80 | 60 | 1 | 0 | (Grass, Poison) |
2 | 525 | 80 | 82 | 83 | 100 | 100 | 80 | 1 | 0 | (Grass, Poison) |
3 | 625 | 80 | 100 | 123 | 122 | 120 | 80 | 1 | 0 | (Grass, Poison) |
4 | 309 | 39 | 52 | 43 | 60 | 50 | 65 | 1 | 0 | (Fire, None) |
For each unique type, we create a binary label. The label is 1 if the Pokemon has that type, (so was present in the type combination tuple) and 0 if it doesn't.
# Create binary labels for each Pokémon type
for type in unique_types:
df[type] = df['Types'].apply(lambda x: 1 if type in x else 0)
df.head()
Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | Types | ... | Grass | Ground | Ice | None | Normal | Poison | Psychic | Rock | Steel | Water | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 318 | 45 | 49 | 49 | 65 | 65 | 45 | 1 | 0 | (Grass, Poison) | ... | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
1 | 405 | 60 | 62 | 63 | 80 | 80 | 60 | 1 | 0 | (Grass, Poison) | ... | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
2 | 525 | 80 | 82 | 83 | 100 | 100 | 80 | 1 | 0 | (Grass, Poison) | ... | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
3 | 625 | 80 | 100 | 123 | 122 | 120 | 80 | 1 | 0 | (Grass, Poison) | ... | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
4 | 309 | 39 | 52 | 43 | 60 | 50 | 65 | 1 | 0 | (Fire, None) | ... | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
5 rows × 29 columns
# Some type combinations only occur once so we double them to stratify the data better
singleton_classes = df['Types'].value_counts()[df['Types'].value_counts() == 1].index.tolist()
singleton_data = df[df['Types'].isin(singleton_classes)]
other_data = df[~df['Types'].isin(singleton_classes)]
df = df.drop(columns=['Types'])
other_data.drop(columns=['Types'], inplace=True)
singleton_data.drop(columns=['Types'], inplace=True)
Decision tree¶
# Split the data into training and testing sets
y = df[unique_types]
X_train, X_test, y_train, y_test = train_test_split(other_data.drop(columns=unique_types), other_data[unique_types], test_size=0.2, stratify=other_data[unique_types], random_state=42)
X_train = pd.concat([X_train, singleton_data.drop(columns=unique_types)])
y_train = pd.concat([y_train, singleton_data[unique_types]])
X_test = pd.concat([X_test, singleton_data.drop(columns=unique_types)])
y_test = pd.concat([y_test, singleton_data[unique_types]])
# Initialize and train the decision tree classifier
model = DecisionTreeClassifier(random_state=42)
model.fit(X_train, y_train)
# Predict labels for the test set
y_pred = model.predict(X_test)
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Score: ", model.score(X_test, y_test))
print("Accuracy:", accuracy)
print(classification_report(y_test, y_pred))
Score: 0.25 Accuracy: 0.25 precision recall f1-score support 0 0.47 0.47 0.47 17 1 0.56 0.42 0.48 12 2 0.44 0.47 0.46 17 3 0.65 0.65 0.65 17 4 0.38 0.30 0.33 10 5 0.40 0.31 0.35 13 6 0.53 0.45 0.49 20 7 0.27 0.35 0.30 23 8 0.50 0.50 0.50 12 9 0.24 0.27 0.26 22 10 0.43 0.50 0.46 18 11 0.17 0.22 0.19 9 12 0.57 0.49 0.53 80 13 0.27 0.33 0.30 21 14 0.23 0.19 0.21 16 15 0.44 0.40 0.42 20 16 0.60 0.50 0.55 12 17 0.56 0.60 0.58 15 18 0.29 0.30 0.30 30 micro avg 0.42 0.42 0.42 384 macro avg 0.42 0.41 0.41 384 weighted avg 0.44 0.42 0.42 384 samples avg 0.43 0.42 0.42 384
Hyperparameter tuning¶
pipeline = make_pipeline(StandardScaler(), DecisionTreeClassifier())
param_dist = {
"decisiontreeclassifier__max_depth": np.arange(10, 20),
"decisiontreeclassifier__min_samples_leaf": np.arange(1, 10)
}
# Instantiate the GridSearchCV object
grid_search_cv = GridSearchCV(pipeline, param_grid=param_dist, cv=5)
# Fit grid_search_cv using the data X and labels y.
grid_search_cv.fit(X_train, y_train)
y_pred = grid_search_cv.predict(X_test)
# Print the best score
print("Tuned Model Parameters: {}".format(grid_search_cv.best_params_))
print("Accuracy: {}".format(grid_search_cv.best_estimator_.score(X_test, y_test)))
print(classification_report(y_test, y_pred))
Tuned Model Parameters: {'decisiontreeclassifier__max_depth': 16, 'decisiontreeclassifier__min_samples_leaf': 1} Accuracy: 0.265625 precision recall f1-score support 0 0.42 0.47 0.44 17 1 0.50 0.50 0.50 12 2 0.38 0.47 0.42 17 3 0.52 0.65 0.58 17 4 0.30 0.30 0.30 10 5 0.50 0.38 0.43 13 6 0.67 0.40 0.50 20 7 0.38 0.35 0.36 23 8 0.60 0.50 0.55 12 9 0.29 0.27 0.28 22 10 0.45 0.50 0.47 18 11 0.17 0.22 0.19 9 12 0.61 0.57 0.59 80 13 0.27 0.29 0.28 21 14 0.27 0.25 0.26 16 15 0.50 0.40 0.44 20 16 0.45 0.42 0.43 12 17 0.62 0.53 0.57 15 18 0.31 0.37 0.34 30 micro avg 0.45 0.44 0.44 384 macro avg 0.43 0.41 0.42 384 weighted avg 0.46 0.44 0.44 384 samples avg 0.45 0.44 0.44 384
Random Forest¶
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)
# Predict labels for the test set
y_pred = model.predict(X_test)
# Calculate accuracy
print("Score: ", model.score(X_test, y_test))
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
print(classification_report(y_test, y_pred))
Score: 0.20833333333333334 Accuracy: 0.20833333333333334 precision recall f1-score support 0 1.00 0.29 0.45 17 1 1.00 0.25 0.40 12 2 0.69 0.53 0.60 17 3 1.00 0.59 0.74 17 4 1.00 0.20 0.33 10 5 1.00 0.31 0.47 13 6 1.00 0.45 0.62 20 7 0.50 0.09 0.15 23 8 0.86 0.50 0.63 12 9 0.67 0.18 0.29 22 10 1.00 0.28 0.43 18 11 1.00 0.11 0.20 9 12 0.65 0.49 0.56 80 13 0.43 0.14 0.21 21 14 0.75 0.19 0.30 16 15 0.83 0.25 0.38 20 16 0.80 0.33 0.47 12 17 0.86 0.40 0.55 15 18 1.00 0.17 0.29 30 micro avg 0.77 0.33 0.46 384 macro avg 0.84 0.30 0.43 384 weighted avg 0.80 0.33 0.44 384 samples avg 0.44 0.33 0.36 384
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in samples with no predicted labels. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
Hyperparameter tuning¶
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint
# Define the parameter grid
pipeline = make_pipeline(StandardScaler(), RandomForestClassifier())
# Setup the parameters and distributions to sample from: param_dist
param_dist = {
"randomforestclassifier__max_depth": np.arange(10, 20),
"randomforestclassifier__min_samples_leaf": np.arange(1, 10, 4),
"randomforestclassifier__n_estimators": np.arange(60, 140, 5)
}
random_search_cv = RandomizedSearchCV(pipeline, param_distributions=param_dist, n_iter=50, cv=3, random_state=42)
#grid_search_cv = GridSearchCV(pipeline, param_grid=param_dist, cv=3)
# Fit random_search_cv using the data X and labels y
random_search_cv.fit(X_train, y_train)
#grid_search_cv.fit(X_train, y_train)
# Print the best score
print("Best score is {}".format(random_search_cv.best_estimator_.score(X_test, y_test)))
print("Best parameters are {}".format(random_search_cv.best_params_))
#print("Best score is {}".format(grid_search_cv.best_estimator_.score(X_test, y_test)))
#print("Best parameters are {}".format(grid_search_cv.best_params_))
print(classification_report(y_test, y_pred))
Best score is 0.203125 Best parameters are {'randomforestclassifier__n_estimators': 130, 'randomforestclassifier__min_samples_leaf': 1, 'randomforestclassifier__max_depth': 16} precision recall f1-score support 0 1.00 0.29 0.45 17 1 1.00 0.25 0.40 12 2 0.69 0.53 0.60 17 3 1.00 0.59 0.74 17 4 1.00 0.20 0.33 10 5 1.00 0.31 0.47 13 6 1.00 0.45 0.62 20 7 0.50 0.09 0.15 23 8 0.86 0.50 0.63 12 9 0.67 0.18 0.29 22 10 1.00 0.28 0.43 18 11 1.00 0.11 0.20 9 12 0.65 0.49 0.56 80 13 0.43 0.14 0.21 21 14 0.75 0.19 0.30 16 15 0.83 0.25 0.38 20 16 0.80 0.33 0.47 12 17 0.86 0.40 0.55 15 18 1.00 0.17 0.29 30 micro avg 0.77 0.33 0.46 384 macro avg 0.84 0.30 0.43 384 weighted avg 0.80 0.33 0.44 384 samples avg 0.44 0.33 0.36 384
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in samples with no predicted labels. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
K Nearest neigbors¶
from sklearn.neighbors import KNeighborsClassifier
model = KNeighborsClassifier()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
# Calculate accuracy
print("Score: ", model.score(X_test, y_test))
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
print(classification_report(y_test, y_pred))
Score: 0.015625 Accuracy: 0.015625 precision recall f1-score support 0 0.67 0.12 0.20 17 1 0.00 0.00 0.00 12 2 0.36 0.24 0.29 17 3 0.83 0.29 0.43 17 4 0.00 0.00 0.00 10 5 0.00 0.00 0.00 13 6 0.33 0.10 0.15 20 7 0.25 0.09 0.13 23 8 0.33 0.08 0.13 12 9 0.33 0.05 0.08 22 10 0.33 0.11 0.17 18 11 0.00 0.00 0.00 9 12 0.56 0.50 0.53 80 13 0.31 0.19 0.24 21 14 0.00 0.00 0.00 16 15 0.33 0.15 0.21 20 16 0.67 0.17 0.27 12 17 0.33 0.07 0.11 15 18 0.17 0.03 0.06 30 micro avg 0.46 0.18 0.26 384 macro avg 0.31 0.11 0.16 384 weighted avg 0.36 0.18 0.22 384 samples avg 0.31 0.18 0.22 384
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result)) C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in samples with no predicted labels. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
Hyperparameter tuning¶
param_grid = {
'kneighborsclassifier__n_neighbors': [3, 5, 7, 9] # List of k values to try
}
pipeline = make_pipeline(StandardScaler(), KNeighborsClassifier())
grid_search = GridSearchCV(estimator=pipeline, param_grid=param_grid, cv=5)
grid_search.fit(X_train, y_train)
best_params = grid_search.best_params_
best_score=grid_search.best_estimator_.score(X_test, y_test)
print("Best Parameters:", best_params)
print("Best Score:", best_score)
print(classification_report(y_test, y_pred))
Best Parameters: {'kneighborsclassifier__n_neighbors': 3} Best Score: 0.046875 precision recall f1-score support 0 0.67 0.12 0.20 17 1 0.00 0.00 0.00 12 2 0.36 0.24 0.29 17 3 0.83 0.29 0.43 17 4 0.00 0.00 0.00 10 5 0.00 0.00 0.00 13 6 0.33 0.10 0.15 20 7 0.25 0.09 0.13 23 8 0.33 0.08 0.13 12 9 0.33 0.05 0.08 22 10 0.33 0.11 0.17 18 11 0.00 0.00 0.00 9 12 0.56 0.50 0.53 80 13 0.31 0.19 0.24 21 14 0.00 0.00 0.00 16 15 0.33 0.15 0.21 20 16 0.67 0.17 0.27 12 17 0.33 0.07 0.11 15 18 0.17 0.03 0.06 30 micro avg 0.46 0.18 0.26 384 macro avg 0.31 0.11 0.16 384 weighted avg 0.36 0.18 0.22 384 samples avg 0.31 0.18 0.22 384
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result)) C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in samples with no predicted labels. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
Ignoring Order of Types 2¶
The second way of ignoring order of types is to use binary labels for each sorted type combination.
Preprocessing¶
We again combine the Type 1 and Type 2 columns into a single column that contains a list of type combination tuples that are sorted.
df = preprocessed_df.copy()
df['Types'] = df[['Type 1', 'Type 2']].apply(lambda x: sorted(tuple(filter(lambda y: pd.notna(y), x))), axis=1)
df['Types'] = df['Types'].astype(str)
# print two pokemon types where we know type 1 and two are the same but reversed to check if order is kept into account
print("Sableye: ",df['Types'][326])
print("Spiritomb: ",df['Types'][490])
# drop the Type 1 and Type 2 columns
df.drop(['Type 1', 'Type 2'], axis=1, inplace=True)
# print head
df.head()
Sableye: ['Dark', 'Ghost'] Spiritomb: ['Dark', 'Ghost']
Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | Types | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 318 | 45 | 49 | 49 | 65 | 65 | 45 | 1 | 0 | ['Grass', 'Poison'] |
1 | 405 | 60 | 62 | 63 | 80 | 80 | 60 | 1 | 0 | ['Grass', 'Poison'] |
2 | 525 | 80 | 82 | 83 | 100 | 100 | 80 | 1 | 0 | ['Grass', 'Poison'] |
3 | 625 | 80 | 100 | 123 | 122 | 120 | 80 | 1 | 0 | ['Grass', 'Poison'] |
4 | 309 | 39 | 52 | 43 | 60 | 50 | 65 | 1 | 0 | ['Fire', 'None'] |
# Find classes with only one type
singleton_classes = df['Types'].value_counts()[df['Types'].value_counts() == 1].index.tolist()
Make binary labels for each sorted type combination
# Create binary labels for each Pokémon type combination
unique_type_combinations = df['Types'].unique()
for type_combination in unique_type_combinations:
df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0)
singleton_data = df[df['Types'].isin(singleton_classes)]
other_data = df[~df['Types'].isin(singleton_classes)]
print("Number of singleton classes",len(singleton_classes))
print("Number of unique type combinations",len(df['Types'].unique()))
print(len(df['Types']))
df.head()
Number of singleton classes 24 Number of unique type combinations 133 800
C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0) C:\Users\thors\AppData\Local\Temp\ipykernel_9004\1370455685.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` df[type_combination] = df['Types'].apply(lambda x: 1 if type in x else 0)
Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | Types | ... | ['Dragon', 'Poison'] | ['Electric', 'Normal'] | ['Dragon', 'Rock'] | ['Ice', 'Rock'] | ['Fighting', 'Flying'] | ['Electric', 'Fairy'] | ['Fairy', 'Rock'] | ['Ghost', 'Grass'] | ['Ghost', 'Psychic'] | ['Fire', 'Water'] | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 318 | 45 | 49 | 49 | 65 | 65 | 45 | 1 | 0 | ['Grass', 'Poison'] | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 405 | 60 | 62 | 63 | 80 | 80 | 60 | 1 | 0 | ['Grass', 'Poison'] | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 525 | 80 | 82 | 83 | 100 | 100 | 80 | 1 | 0 | ['Grass', 'Poison'] | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | 625 | 80 | 100 | 123 | 122 | 120 | 80 | 1 | 0 | ['Grass', 'Poison'] | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | 309 | 39 | 52 | 43 | 60 | 50 | 65 | 1 | 0 | ['Fire', 'None'] | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
5 rows × 143 columns
# Drop the 'Types' column
df = df.drop(columns=['Types'])
other_data.drop(columns=['Types'], inplace=True)
singleton_data.drop(columns=['Types'], inplace=True)
Decision Tree¶
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(other_data.drop(columns=unique_type_combinations), other_data[unique_type_combinations], test_size=0.2, stratify=other_data[unique_type_combinations], random_state=42)
X_train = pd.concat([X_train, singleton_data.drop(columns=unique_type_combinations)])
y_train = pd.concat([y_train, singleton_data[unique_type_combinations]])
X_test = pd.concat([X_test, singleton_data.drop(columns=unique_type_combinations)])
y_test = pd.concat([y_test, singleton_data[unique_type_combinations]])
# Initialize and train the decision tree classifier
model = DecisionTreeClassifier(random_state=42)
model.fit(X_train, y_train)
# Predict labels for the test set
y_pred = model.predict(X_test)
# Calculate accuracy
print("Score: ", model.score(X_test, y_test))
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
print(classification_report(y_test, y_pred))
Score: 0.7611111111111111 Accuracy: 0.7611111111111111 precision recall f1-score support 0 0.31 0.38 0.34 29 1 0.31 0.38 0.34 29 2 0.31 0.38 0.34 29 3 0.31 0.38 0.34 29 4 0.31 0.38 0.34 29 5 0.31 0.38 0.34 29 6 0.31 0.38 0.34 29 7 0.31 0.38 0.34 29 8 0.31 0.38 0.34 29 9 0.31 0.38 0.34 29 10 0.31 0.38 0.34 29 11 0.31 0.38 0.34 29 12 0.31 0.38 0.34 29 13 0.31 0.38 0.34 29 14 0.31 0.38 0.34 29 15 0.31 0.38 0.34 29 16 0.31 0.38 0.34 29 17 0.31 0.38 0.34 29 18 0.31 0.38 0.34 29 19 0.31 0.38 0.34 29 20 0.31 0.38 0.34 29 21 0.31 0.38 0.34 29 22 0.31 0.38 0.34 29 23 0.31 0.38 0.34 29 24 0.31 0.38 0.34 29 25 0.31 0.38 0.34 29 26 0.31 0.38 0.34 29 27 0.31 0.38 0.34 29 28 0.31 0.38 0.34 29 29 0.31 0.38 0.34 29 30 0.31 0.38 0.34 29 31 0.31 0.38 0.34 29 32 0.31 0.38 0.34 29 33 0.31 0.38 0.34 29 34 0.31 0.38 0.34 29 35 0.31 0.38 0.34 29 36 0.31 0.38 0.34 29 37 0.31 0.38 0.34 29 38 0.31 0.38 0.34 29 39 0.31 0.38 0.34 29 40 0.31 0.38 0.34 29 41 0.31 0.38 0.34 29 42 0.31 0.38 0.34 29 43 0.31 0.38 0.34 29 44 0.31 0.38 0.34 29 45 0.31 0.38 0.34 29 46 0.31 0.38 0.34 29 47 0.31 0.38 0.34 29 48 0.31 0.38 0.34 29 49 0.31 0.38 0.34 29 50 0.31 0.38 0.34 29 51 0.31 0.38 0.34 29 52 0.31 0.38 0.34 29 53 0.31 0.38 0.34 29 54 0.31 0.38 0.34 29 55 0.31 0.38 0.34 29 56 0.31 0.38 0.34 29 57 0.31 0.38 0.34 29 58 0.31 0.38 0.34 29 59 0.31 0.38 0.34 29 60 0.31 0.38 0.34 29 61 0.31 0.38 0.34 29 62 0.31 0.38 0.34 29 63 0.31 0.38 0.34 29 64 0.31 0.38 0.34 29 65 0.31 0.38 0.34 29 66 0.31 0.38 0.34 29 67 0.31 0.38 0.34 29 68 0.31 0.38 0.34 29 69 0.31 0.38 0.34 29 70 0.31 0.38 0.34 29 71 0.31 0.38 0.34 29 72 0.31 0.38 0.34 29 73 0.31 0.38 0.34 29 74 0.31 0.38 0.34 29 75 0.31 0.38 0.34 29 76 0.31 0.38 0.34 29 77 0.31 0.38 0.34 29 78 0.31 0.38 0.34 29 79 0.31 0.38 0.34 29 80 0.31 0.38 0.34 29 81 0.31 0.38 0.34 29 82 0.31 0.38 0.34 29 83 0.31 0.38 0.34 29 84 0.31 0.38 0.34 29 85 0.31 0.38 0.34 29 86 0.31 0.38 0.34 29 87 0.31 0.38 0.34 29 88 0.31 0.38 0.34 29 89 0.31 0.38 0.34 29 90 0.31 0.38 0.34 29 91 0.31 0.38 0.34 29 92 0.31 0.38 0.34 29 93 0.31 0.38 0.34 29 94 0.31 0.38 0.34 29 95 0.31 0.38 0.34 29 96 0.31 0.38 0.34 29 97 0.31 0.38 0.34 29 98 0.31 0.38 0.34 29 99 0.31 0.38 0.34 29 100 0.31 0.38 0.34 29 101 0.31 0.38 0.34 29 102 0.31 0.38 0.34 29 103 0.31 0.38 0.34 29 104 0.31 0.38 0.34 29 105 0.31 0.38 0.34 29 106 0.31 0.38 0.34 29 107 0.31 0.38 0.34 29 108 0.31 0.38 0.34 29 109 0.31 0.38 0.34 29 110 0.31 0.38 0.34 29 111 0.31 0.38 0.34 29 112 0.31 0.38 0.34 29 113 0.31 0.38 0.34 29 114 0.31 0.38 0.34 29 115 0.31 0.38 0.34 29 116 0.31 0.38 0.34 29 117 0.31 0.38 0.34 29 118 0.31 0.38 0.34 29 119 0.31 0.38 0.34 29 120 0.31 0.38 0.34 29 121 0.31 0.38 0.34 29 122 0.31 0.38 0.34 29 123 0.31 0.38 0.34 29 124 0.31 0.38 0.34 29 125 0.31 0.38 0.34 29 126 0.31 0.38 0.34 29 127 0.31 0.38 0.34 29 128 0.31 0.38 0.34 29 129 0.31 0.38 0.34 29 130 0.31 0.38 0.34 29 131 0.31 0.38 0.34 29 132 0.31 0.38 0.34 29 micro avg 0.31 0.38 0.34 3857 macro avg 0.31 0.38 0.34 3857 weighted avg 0.31 0.38 0.34 3857 samples avg 0.06 0.06 0.06 3857
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in samples with no predicted labels. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result)) C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Recall is ill-defined and being set to 0.0 in samples with no true labels. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result)) C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in samples with no true nor predicted labels. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
Hyperparameter Tuning¶
pipeline = make_pipeline(StandardScaler(), DecisionTreeClassifier())
param_dist = {
"decisiontreeclassifier__max_depth": [5, 6, 7, 8, 9, 10, 15, 30, None],
"decisiontreeclassifier__min_samples_leaf": np.arange(1, 10)
}
# Instantiate the GridSearchCV object
grid_search_cv = GridSearchCV(pipeline, param_grid=param_dist, cv=5)
# Fit grid_search_cv using the data X and labels y.
grid_search_cv.fit(X_train, y_train)
y_pred = grid_search_cv.predict(X_test)
# Print the best score
print("Tuned Model Parameters: {}".format(grid_search_cv.best_params_))
print("Accuracy: {}".format(grid_search_cv.best_estimator_.score(X_test, y_test)))
print(classification_report(y_test, y_pred))
Tuned Model Parameters: {'decisiontreeclassifier__max_depth': 6, 'decisiontreeclassifier__min_samples_leaf': 9} Accuracy: 0.8166666666666667 precision recall f1-score support 0 0.17 0.03 0.06 29 1 0.17 0.03 0.06 29 2 0.17 0.03 0.06 29 3 0.17 0.03 0.06 29 4 0.17 0.03 0.06 29 5 0.17 0.03 0.06 29 6 0.17 0.03 0.06 29 7 0.17 0.03 0.06 29 8 0.17 0.03 0.06 29 9 0.17 0.03 0.06 29 10 0.17 0.03 0.06 29 11 0.17 0.03 0.06 29 12 0.17 0.03 0.06 29 13 0.17 0.03 0.06 29 14 0.17 0.03 0.06 29 15 0.17 0.03 0.06 29 16 0.17 0.03 0.06 29 17 0.17 0.03 0.06 29 18 0.17 0.03 0.06 29 19 0.17 0.03 0.06 29 20 0.17 0.03 0.06 29 21 0.17 0.03 0.06 29 22 0.17 0.03 0.06 29 23 0.17 0.03 0.06 29 24 0.17 0.03 0.06 29 25 0.17 0.03 0.06 29 26 0.17 0.03 0.06 29 27 0.17 0.03 0.06 29 28 0.17 0.03 0.06 29 29 0.17 0.03 0.06 29 30 0.17 0.03 0.06 29 31 0.17 0.03 0.06 29 32 0.17 0.03 0.06 29 33 0.17 0.03 0.06 29 34 0.17 0.03 0.06 29 35 0.17 0.03 0.06 29 36 0.17 0.03 0.06 29 37 0.17 0.03 0.06 29 38 0.17 0.03 0.06 29 39 0.17 0.03 0.06 29 40 0.17 0.03 0.06 29 41 0.17 0.03 0.06 29 42 0.17 0.03 0.06 29 43 0.17 0.03 0.06 29 44 0.17 0.03 0.06 29 45 0.17 0.03 0.06 29 46 0.17 0.03 0.06 29 47 0.17 0.03 0.06 29 48 0.17 0.03 0.06 29 49 0.17 0.03 0.06 29 50 0.17 0.03 0.06 29 51 0.17 0.03 0.06 29 52 0.17 0.03 0.06 29 53 0.17 0.03 0.06 29 54 0.17 0.03 0.06 29 55 0.17 0.03 0.06 29 56 0.17 0.03 0.06 29 57 0.17 0.03 0.06 29 58 0.17 0.03 0.06 29 59 0.17 0.03 0.06 29 60 0.17 0.03 0.06 29 61 0.17 0.03 0.06 29 62 0.17 0.03 0.06 29 63 0.17 0.03 0.06 29 64 0.17 0.03 0.06 29 65 0.17 0.03 0.06 29 66 0.17 0.03 0.06 29 67 0.17 0.03 0.06 29 68 0.17 0.03 0.06 29 69 0.17 0.03 0.06 29 70 0.17 0.03 0.06 29 71 0.17 0.03 0.06 29 72 0.17 0.03 0.06 29 73 0.17 0.03 0.06 29 74 0.17 0.03 0.06 29 75 0.17 0.03 0.06 29 76 0.17 0.03 0.06 29 77 0.17 0.03 0.06 29 78 0.17 0.03 0.06 29 79 0.17 0.03 0.06 29 80 0.17 0.03 0.06 29 81 0.17 0.03 0.06 29 82 0.17 0.03 0.06 29 83 0.17 0.03 0.06 29 84 0.17 0.03 0.06 29 85 0.17 0.03 0.06 29 86 0.17 0.03 0.06 29 87 0.17 0.03 0.06 29 88 0.17 0.03 0.06 29 89 0.17 0.03 0.06 29 90 0.17 0.03 0.06 29 91 0.17 0.03 0.06 29 92 0.17 0.03 0.06 29 93 0.17 0.03 0.06 29 94 0.17 0.03 0.06 29 95 0.17 0.03 0.06 29 96 0.17 0.03 0.06 29 97 0.17 0.03 0.06 29 98 0.17 0.03 0.06 29 99 0.17 0.03 0.06 29 100 0.17 0.03 0.06 29 101 0.17 0.03 0.06 29 102 0.17 0.03 0.06 29 103 0.17 0.03 0.06 29 104 0.17 0.03 0.06 29 105 0.17 0.03 0.06 29 106 0.17 0.03 0.06 29 107 0.17 0.03 0.06 29 108 0.17 0.03 0.06 29 109 0.17 0.03 0.06 29 110 0.17 0.03 0.06 29 111 0.17 0.03 0.06 29 112 0.17 0.03 0.06 29 113 0.17 0.03 0.06 29 114 0.17 0.03 0.06 29 115 0.17 0.03 0.06 29 116 0.17 0.03 0.06 29 117 0.17 0.03 0.06 29 118 0.17 0.03 0.06 29 119 0.17 0.03 0.06 29 120 0.17 0.03 0.06 29 121 0.17 0.03 0.06 29 122 0.17 0.03 0.06 29 123 0.17 0.03 0.06 29 124 0.17 0.03 0.06 29 125 0.17 0.03 0.06 29 126 0.17 0.03 0.06 29 127 0.17 0.03 0.06 29 128 0.17 0.03 0.06 29 129 0.17 0.03 0.06 29 130 0.17 0.03 0.06 29 131 0.17 0.03 0.06 29 132 0.17 0.03 0.06 29 micro avg 0.17 0.03 0.06 3857 macro avg 0.17 0.03 0.06 3857 weighted avg 0.17 0.03 0.06 3857 samples avg 0.01 0.01 0.01 3857
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in samples with no predicted labels. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result)) C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Recall is ill-defined and being set to 0.0 in samples with no true labels. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result)) C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in samples with no true nor predicted labels. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
Random forests¶
# Initialize and train the decision tree classifier
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)
# Predict labels for the test set
y_pred = model.predict(X_test)
# Calculate accuracy
print("Score: ", model.score(X_test,y_test))
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
print(classification_report(y_test, y_pred))
Score: 0.8555555555555555 Accuracy: 0.8555555555555555 precision recall f1-score support 0 0.71 0.17 0.28 29 1 0.71 0.17 0.28 29 2 0.71 0.17 0.28 29 3 0.71 0.17 0.28 29 4 0.71 0.17 0.28 29 5 0.71 0.17 0.28 29 6 0.71 0.17 0.28 29 7 0.71 0.17 0.28 29 8 0.71 0.17 0.28 29 9 0.71 0.17 0.28 29 10 0.71 0.17 0.28 29 11 0.71 0.17 0.28 29 12 0.71 0.17 0.28 29 13 0.71 0.17 0.28 29 14 0.71 0.17 0.28 29 15 0.71 0.17 0.28 29 16 0.71 0.17 0.28 29 17 0.71 0.17 0.28 29 18 0.71 0.17 0.28 29 19 0.71 0.17 0.28 29 20 0.71 0.17 0.28 29 21 0.71 0.17 0.28 29 22 0.71 0.17 0.28 29 23 0.71 0.17 0.28 29 24 0.71 0.17 0.28 29 25 0.71 0.17 0.28 29 26 0.71 0.17 0.28 29 27 0.71 0.17 0.28 29 28 0.71 0.17 0.28 29 29 0.71 0.17 0.28 29 30 0.71 0.17 0.28 29 31 0.71 0.17 0.28 29 32 0.71 0.17 0.28 29 33 0.71 0.17 0.28 29 34 0.71 0.17 0.28 29 35 0.71 0.17 0.28 29 36 0.71 0.17 0.28 29 37 0.71 0.17 0.28 29 38 0.71 0.17 0.28 29 39 0.71 0.17 0.28 29 40 0.71 0.17 0.28 29 41 0.71 0.17 0.28 29 42 0.71 0.17 0.28 29 43 0.71 0.17 0.28 29 44 0.71 0.17 0.28 29 45 0.71 0.17 0.28 29 46 0.71 0.17 0.28 29 47 0.71 0.17 0.28 29 48 0.71 0.17 0.28 29 49 0.71 0.17 0.28 29 50 0.71 0.17 0.28 29 51 0.71 0.17 0.28 29 52 0.71 0.17 0.28 29 53 0.71 0.17 0.28 29 54 0.71 0.17 0.28 29 55 0.71 0.17 0.28 29 56 0.71 0.17 0.28 29 57 0.71 0.17 0.28 29 58 0.71 0.17 0.28 29 59 0.71 0.17 0.28 29 60 0.71 0.17 0.28 29 61 0.71 0.17 0.28 29 62 0.71 0.17 0.28 29 63 0.71 0.17 0.28 29 64 0.71 0.17 0.28 29 65 0.71 0.17 0.28 29 66 0.71 0.17 0.28 29 67 0.71 0.17 0.28 29 68 0.71 0.17 0.28 29 69 0.71 0.17 0.28 29 70 0.71 0.17 0.28 29 71 0.71 0.17 0.28 29 72 0.71 0.17 0.28 29 73 0.71 0.17 0.28 29 74 0.71 0.17 0.28 29 75 0.71 0.17 0.28 29 76 0.71 0.17 0.28 29 77 0.71 0.17 0.28 29 78 0.71 0.17 0.28 29 79 0.71 0.17 0.28 29 80 0.71 0.17 0.28 29 81 0.71 0.17 0.28 29 82 0.71 0.17 0.28 29 83 0.71 0.17 0.28 29 84 0.71 0.17 0.28 29 85 0.71 0.17 0.28 29 86 0.71 0.17 0.28 29 87 0.71 0.17 0.28 29 88 0.71 0.17 0.28 29 89 0.71 0.17 0.28 29 90 0.71 0.17 0.28 29 91 0.71 0.17 0.28 29 92 0.71 0.17 0.28 29 93 0.71 0.17 0.28 29 94 0.71 0.17 0.28 29 95 0.71 0.17 0.28 29 96 0.71 0.17 0.28 29 97 0.71 0.17 0.28 29 98 0.71 0.17 0.28 29 99 0.71 0.17 0.28 29 100 0.71 0.17 0.28 29 101 0.71 0.17 0.28 29 102 0.71 0.17 0.28 29 103 0.71 0.17 0.28 29 104 0.71 0.17 0.28 29 105 0.71 0.17 0.28 29 106 0.71 0.17 0.28 29 107 0.71 0.17 0.28 29 108 0.71 0.17 0.28 29 109 0.71 0.17 0.28 29 110 0.71 0.17 0.28 29 111 0.71 0.17 0.28 29 112 0.71 0.17 0.28 29 113 0.71 0.17 0.28 29 114 0.71 0.17 0.28 29 115 0.71 0.17 0.28 29 116 0.71 0.17 0.28 29 117 0.71 0.17 0.28 29 118 0.71 0.17 0.28 29 119 0.71 0.17 0.28 29 120 0.71 0.17 0.28 29 121 0.71 0.17 0.28 29 122 0.71 0.17 0.28 29 123 0.71 0.17 0.28 29 124 0.71 0.17 0.28 29 125 0.71 0.17 0.28 29 126 0.71 0.17 0.28 29 127 0.71 0.17 0.28 29 128 0.71 0.17 0.28 29 129 0.71 0.17 0.28 29 130 0.71 0.17 0.28 29 131 0.71 0.17 0.28 29 132 0.71 0.17 0.28 29 micro avg 0.71 0.17 0.28 3857 macro avg 0.71 0.17 0.28 3857 weighted avg 0.71 0.17 0.28 3857 samples avg 0.03 0.03 0.03 3857
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in samples with no predicted labels. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result)) C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Recall is ill-defined and being set to 0.0 in samples with no true labels. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result)) C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in samples with no true nor predicted labels. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
Hyperparameter Tuning¶
pipeline = make_pipeline(StandardScaler(), RandomForestClassifier())
# Setup the parameters and distributions to sample from: param_dist
param_dist = {
"randomforestclassifier__max_depth": np.arange(5, 21),
"randomforestclassifier__min_samples_leaf": np.arange(1, 10),
"randomforestclassifier__n_estimators": np.arange(50, 150, 5)
}
# Instantiate the RandomizedSearchCV object: random_grid_search_cv
random_search_cv = RandomizedSearchCV(pipeline, param_distributions=param_dist, n_iter=50, cv=3, random_state=42)
#grid_search_cv = GridSearchCV(pipeline, param_grid=param_dist, cv=3)
# Fit random_search_cv using the data X and labels y
random_search_cv.fit(X_train, y_train)
#grid_search_cv.fit(X_train, y_train)
# Print the best score
print("Best score is {}".format(random_search_cv.best_estimator_.score(X_test, y_test)))
print("Best parameters are {}".format(random_search_cv.best_params_))
print(classification_report(y_test, y_pred))
Best score is 0.8388888888888889 Best parameters are {'randomforestclassifier__n_estimators': 55, 'randomforestclassifier__min_samples_leaf': 4, 'randomforestclassifier__max_depth': 12} precision recall f1-score support 0 0.71 0.17 0.28 29 1 0.71 0.17 0.28 29 2 0.71 0.17 0.28 29 3 0.71 0.17 0.28 29 4 0.71 0.17 0.28 29 5 0.71 0.17 0.28 29 6 0.71 0.17 0.28 29 7 0.71 0.17 0.28 29 8 0.71 0.17 0.28 29 9 0.71 0.17 0.28 29 10 0.71 0.17 0.28 29 11 0.71 0.17 0.28 29 12 0.71 0.17 0.28 29 13 0.71 0.17 0.28 29 14 0.71 0.17 0.28 29 15 0.71 0.17 0.28 29 16 0.71 0.17 0.28 29 17 0.71 0.17 0.28 29 18 0.71 0.17 0.28 29 19 0.71 0.17 0.28 29 20 0.71 0.17 0.28 29 21 0.71 0.17 0.28 29 22 0.71 0.17 0.28 29 23 0.71 0.17 0.28 29 24 0.71 0.17 0.28 29 25 0.71 0.17 0.28 29 26 0.71 0.17 0.28 29 27 0.71 0.17 0.28 29 28 0.71 0.17 0.28 29 29 0.71 0.17 0.28 29 30 0.71 0.17 0.28 29 31 0.71 0.17 0.28 29 32 0.71 0.17 0.28 29 33 0.71 0.17 0.28 29 34 0.71 0.17 0.28 29 35 0.71 0.17 0.28 29 36 0.71 0.17 0.28 29 37 0.71 0.17 0.28 29 38 0.71 0.17 0.28 29 39 0.71 0.17 0.28 29 40 0.71 0.17 0.28 29 41 0.71 0.17 0.28 29 42 0.71 0.17 0.28 29 43 0.71 0.17 0.28 29 44 0.71 0.17 0.28 29 45 0.71 0.17 0.28 29 46 0.71 0.17 0.28 29 47 0.71 0.17 0.28 29 48 0.71 0.17 0.28 29 49 0.71 0.17 0.28 29 50 0.71 0.17 0.28 29 51 0.71 0.17 0.28 29 52 0.71 0.17 0.28 29 53 0.71 0.17 0.28 29 54 0.71 0.17 0.28 29 55 0.71 0.17 0.28 29 56 0.71 0.17 0.28 29 57 0.71 0.17 0.28 29 58 0.71 0.17 0.28 29 59 0.71 0.17 0.28 29 60 0.71 0.17 0.28 29 61 0.71 0.17 0.28 29 62 0.71 0.17 0.28 29 63 0.71 0.17 0.28 29 64 0.71 0.17 0.28 29 65 0.71 0.17 0.28 29 66 0.71 0.17 0.28 29 67 0.71 0.17 0.28 29 68 0.71 0.17 0.28 29 69 0.71 0.17 0.28 29 70 0.71 0.17 0.28 29 71 0.71 0.17 0.28 29 72 0.71 0.17 0.28 29 73 0.71 0.17 0.28 29 74 0.71 0.17 0.28 29 75 0.71 0.17 0.28 29 76 0.71 0.17 0.28 29 77 0.71 0.17 0.28 29 78 0.71 0.17 0.28 29 79 0.71 0.17 0.28 29 80 0.71 0.17 0.28 29 81 0.71 0.17 0.28 29 82 0.71 0.17 0.28 29 83 0.71 0.17 0.28 29 84 0.71 0.17 0.28 29 85 0.71 0.17 0.28 29 86 0.71 0.17 0.28 29 87 0.71 0.17 0.28 29 88 0.71 0.17 0.28 29 89 0.71 0.17 0.28 29 90 0.71 0.17 0.28 29 91 0.71 0.17 0.28 29 92 0.71 0.17 0.28 29 93 0.71 0.17 0.28 29 94 0.71 0.17 0.28 29 95 0.71 0.17 0.28 29 96 0.71 0.17 0.28 29 97 0.71 0.17 0.28 29 98 0.71 0.17 0.28 29 99 0.71 0.17 0.28 29 100 0.71 0.17 0.28 29 101 0.71 0.17 0.28 29 102 0.71 0.17 0.28 29 103 0.71 0.17 0.28 29 104 0.71 0.17 0.28 29 105 0.71 0.17 0.28 29 106 0.71 0.17 0.28 29 107 0.71 0.17 0.28 29 108 0.71 0.17 0.28 29 109 0.71 0.17 0.28 29 110 0.71 0.17 0.28 29 111 0.71 0.17 0.28 29 112 0.71 0.17 0.28 29 113 0.71 0.17 0.28 29 114 0.71 0.17 0.28 29 115 0.71 0.17 0.28 29 116 0.71 0.17 0.28 29 117 0.71 0.17 0.28 29 118 0.71 0.17 0.28 29 119 0.71 0.17 0.28 29 120 0.71 0.17 0.28 29 121 0.71 0.17 0.28 29 122 0.71 0.17 0.28 29 123 0.71 0.17 0.28 29 124 0.71 0.17 0.28 29 125 0.71 0.17 0.28 29 126 0.71 0.17 0.28 29 127 0.71 0.17 0.28 29 128 0.71 0.17 0.28 29 129 0.71 0.17 0.28 29 130 0.71 0.17 0.28 29 131 0.71 0.17 0.28 29 132 0.71 0.17 0.28 29 micro avg 0.71 0.17 0.28 3857 macro avg 0.71 0.17 0.28 3857 weighted avg 0.71 0.17 0.28 3857 samples avg 0.03 0.03 0.03 3857
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in samples with no predicted labels. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result)) C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Recall is ill-defined and being set to 0.0 in samples with no true labels. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result)) C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in samples with no true nor predicted labels. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
K Nearest Neighbors¶
from sklearn.neighbors import KNeighborsClassifier
model = KNeighborsClassifier()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
# Calculate accuracy
print("Score: ", model.score(X_test, y_test))
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
print(classification_report(y_test, y_pred))
Score: 0.8222222222222222 Accuracy: 0.8222222222222222 precision recall f1-score support 0 0.33 0.10 0.16 29 1 0.33 0.10 0.16 29 2 0.33 0.10 0.16 29 3 0.33 0.10 0.16 29 4 0.33 0.10 0.16 29 5 0.33 0.10 0.16 29 6 0.33 0.10 0.16 29 7 0.33 0.10 0.16 29 8 0.33 0.10 0.16 29 9 0.33 0.10 0.16 29 10 0.33 0.10 0.16 29 11 0.33 0.10 0.16 29 12 0.33 0.10 0.16 29 13 0.33 0.10 0.16 29 14 0.33 0.10 0.16 29 15 0.33 0.10 0.16 29 16 0.33 0.10 0.16 29 17 0.33 0.10 0.16 29 18 0.33 0.10 0.16 29 19 0.33 0.10 0.16 29 20 0.33 0.10 0.16 29 21 0.33 0.10 0.16 29 22 0.33 0.10 0.16 29 23 0.33 0.10 0.16 29 24 0.33 0.10 0.16 29 25 0.33 0.10 0.16 29 26 0.33 0.10 0.16 29 27 0.33 0.10 0.16 29 28 0.33 0.10 0.16 29 29 0.33 0.10 0.16 29 30 0.33 0.10 0.16 29 31 0.33 0.10 0.16 29 32 0.33 0.10 0.16 29 33 0.33 0.10 0.16 29 34 0.33 0.10 0.16 29 35 0.33 0.10 0.16 29 36 0.33 0.10 0.16 29 37 0.33 0.10 0.16 29 38 0.33 0.10 0.16 29 39 0.33 0.10 0.16 29 40 0.33 0.10 0.16 29 41 0.33 0.10 0.16 29 42 0.33 0.10 0.16 29 43 0.33 0.10 0.16 29 44 0.33 0.10 0.16 29 45 0.33 0.10 0.16 29 46 0.33 0.10 0.16 29 47 0.33 0.10 0.16 29 48 0.33 0.10 0.16 29 49 0.33 0.10 0.16 29 50 0.33 0.10 0.16 29 51 0.33 0.10 0.16 29 52 0.33 0.10 0.16 29 53 0.33 0.10 0.16 29 54 0.33 0.10 0.16 29 55 0.33 0.10 0.16 29 56 0.33 0.10 0.16 29 57 0.33 0.10 0.16 29 58 0.33 0.10 0.16 29 59 0.33 0.10 0.16 29 60 0.33 0.10 0.16 29 61 0.33 0.10 0.16 29 62 0.33 0.10 0.16 29 63 0.33 0.10 0.16 29 64 0.33 0.10 0.16 29 65 0.33 0.10 0.16 29 66 0.33 0.10 0.16 29 67 0.33 0.10 0.16 29 68 0.33 0.10 0.16 29 69 0.33 0.10 0.16 29 70 0.33 0.10 0.16 29 71 0.33 0.10 0.16 29 72 0.33 0.10 0.16 29 73 0.33 0.10 0.16 29 74 0.33 0.10 0.16 29 75 0.33 0.10 0.16 29 76 0.33 0.10 0.16 29 77 0.33 0.10 0.16 29 78 0.33 0.10 0.16 29 79 0.33 0.10 0.16 29 80 0.33 0.10 0.16 29 81 0.33 0.10 0.16 29 82 0.33 0.10 0.16 29 83 0.33 0.10 0.16 29 84 0.33 0.10 0.16 29 85 0.33 0.10 0.16 29 86 0.33 0.10 0.16 29 87 0.33 0.10 0.16 29 88 0.33 0.10 0.16 29 89 0.33 0.10 0.16 29 90 0.33 0.10 0.16 29 91 0.33 0.10 0.16 29 92 0.33 0.10 0.16 29 93 0.33 0.10 0.16 29 94 0.33 0.10 0.16 29 95 0.33 0.10 0.16 29 96 0.33 0.10 0.16 29 97 0.33 0.10 0.16 29 98 0.33 0.10 0.16 29 99 0.33 0.10 0.16 29 100 0.33 0.10 0.16 29 101 0.33 0.10 0.16 29 102 0.33 0.10 0.16 29 103 0.33 0.10 0.16 29 104 0.33 0.10 0.16 29 105 0.33 0.10 0.16 29 106 0.33 0.10 0.16 29 107 0.33 0.10 0.16 29 108 0.33 0.10 0.16 29 109 0.33 0.10 0.16 29 110 0.33 0.10 0.16 29 111 0.33 0.10 0.16 29 112 0.33 0.10 0.16 29 113 0.33 0.10 0.16 29 114 0.33 0.10 0.16 29 115 0.33 0.10 0.16 29 116 0.33 0.10 0.16 29 117 0.33 0.10 0.16 29 118 0.33 0.10 0.16 29 119 0.33 0.10 0.16 29 120 0.33 0.10 0.16 29 121 0.33 0.10 0.16 29 122 0.33 0.10 0.16 29 123 0.33 0.10 0.16 29 124 0.33 0.10 0.16 29 125 0.33 0.10 0.16 29 126 0.33 0.10 0.16 29 127 0.33 0.10 0.16 29 128 0.33 0.10 0.16 29 129 0.33 0.10 0.16 29 130 0.33 0.10 0.16 29 131 0.33 0.10 0.16 29 132 0.33 0.10 0.16 29 micro avg 0.33 0.10 0.16 3857 macro avg 0.33 0.10 0.16 3857 weighted avg 0.33 0.10 0.16 3857 samples avg 0.02 0.02 0.02 3857
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in samples with no predicted labels. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result)) C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Recall is ill-defined and being set to 0.0 in samples with no true labels. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result)) C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in samples with no true nor predicted labels. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
Hyperparameter Tuning¶
from sklearn.neighbors import KNeighborsClassifier
param_grid = {
'kneighborsclassifier__n_neighbors': [3, 5, 7, 9] # List of k values to try
}
pipeline = make_pipeline(StandardScaler(), KNeighborsClassifier())
grid_search = GridSearchCV(estimator=pipeline, param_grid=param_grid, cv=5)
grid_search.fit(X_train, y_train)
best_params = grid_search.best_params_
best_score=grid_search.best_estimator_.score(X_test, y_test)
print("Best Parameters:", best_params)
print("Best Score:", best_score)
print(classification_report(y_test, y_pred))
Best Parameters: {'kneighborsclassifier__n_neighbors': 7} Best Score: 0.8166666666666667 precision recall f1-score support 0 0.33 0.10 0.16 29 1 0.33 0.10 0.16 29 2 0.33 0.10 0.16 29 3 0.33 0.10 0.16 29 4 0.33 0.10 0.16 29 5 0.33 0.10 0.16 29 6 0.33 0.10 0.16 29 7 0.33 0.10 0.16 29 8 0.33 0.10 0.16 29 9 0.33 0.10 0.16 29 10 0.33 0.10 0.16 29 11 0.33 0.10 0.16 29 12 0.33 0.10 0.16 29 13 0.33 0.10 0.16 29 14 0.33 0.10 0.16 29 15 0.33 0.10 0.16 29 16 0.33 0.10 0.16 29 17 0.33 0.10 0.16 29 18 0.33 0.10 0.16 29 19 0.33 0.10 0.16 29 20 0.33 0.10 0.16 29 21 0.33 0.10 0.16 29 22 0.33 0.10 0.16 29 23 0.33 0.10 0.16 29 24 0.33 0.10 0.16 29 25 0.33 0.10 0.16 29 26 0.33 0.10 0.16 29 27 0.33 0.10 0.16 29 28 0.33 0.10 0.16 29 29 0.33 0.10 0.16 29 30 0.33 0.10 0.16 29 31 0.33 0.10 0.16 29 32 0.33 0.10 0.16 29 33 0.33 0.10 0.16 29 34 0.33 0.10 0.16 29 35 0.33 0.10 0.16 29 36 0.33 0.10 0.16 29 37 0.33 0.10 0.16 29 38 0.33 0.10 0.16 29 39 0.33 0.10 0.16 29 40 0.33 0.10 0.16 29 41 0.33 0.10 0.16 29 42 0.33 0.10 0.16 29 43 0.33 0.10 0.16 29 44 0.33 0.10 0.16 29 45 0.33 0.10 0.16 29 46 0.33 0.10 0.16 29 47 0.33 0.10 0.16 29 48 0.33 0.10 0.16 29 49 0.33 0.10 0.16 29 50 0.33 0.10 0.16 29 51 0.33 0.10 0.16 29 52 0.33 0.10 0.16 29 53 0.33 0.10 0.16 29 54 0.33 0.10 0.16 29 55 0.33 0.10 0.16 29 56 0.33 0.10 0.16 29 57 0.33 0.10 0.16 29 58 0.33 0.10 0.16 29 59 0.33 0.10 0.16 29 60 0.33 0.10 0.16 29 61 0.33 0.10 0.16 29 62 0.33 0.10 0.16 29 63 0.33 0.10 0.16 29 64 0.33 0.10 0.16 29 65 0.33 0.10 0.16 29 66 0.33 0.10 0.16 29 67 0.33 0.10 0.16 29 68 0.33 0.10 0.16 29 69 0.33 0.10 0.16 29 70 0.33 0.10 0.16 29 71 0.33 0.10 0.16 29 72 0.33 0.10 0.16 29 73 0.33 0.10 0.16 29 74 0.33 0.10 0.16 29 75 0.33 0.10 0.16 29 76 0.33 0.10 0.16 29 77 0.33 0.10 0.16 29 78 0.33 0.10 0.16 29 79 0.33 0.10 0.16 29 80 0.33 0.10 0.16 29 81 0.33 0.10 0.16 29 82 0.33 0.10 0.16 29 83 0.33 0.10 0.16 29 84 0.33 0.10 0.16 29 85 0.33 0.10 0.16 29 86 0.33 0.10 0.16 29 87 0.33 0.10 0.16 29 88 0.33 0.10 0.16 29 89 0.33 0.10 0.16 29 90 0.33 0.10 0.16 29 91 0.33 0.10 0.16 29 92 0.33 0.10 0.16 29 93 0.33 0.10 0.16 29 94 0.33 0.10 0.16 29 95 0.33 0.10 0.16 29 96 0.33 0.10 0.16 29 97 0.33 0.10 0.16 29 98 0.33 0.10 0.16 29 99 0.33 0.10 0.16 29 100 0.33 0.10 0.16 29 101 0.33 0.10 0.16 29 102 0.33 0.10 0.16 29 103 0.33 0.10 0.16 29 104 0.33 0.10 0.16 29 105 0.33 0.10 0.16 29 106 0.33 0.10 0.16 29 107 0.33 0.10 0.16 29 108 0.33 0.10 0.16 29 109 0.33 0.10 0.16 29 110 0.33 0.10 0.16 29 111 0.33 0.10 0.16 29 112 0.33 0.10 0.16 29 113 0.33 0.10 0.16 29 114 0.33 0.10 0.16 29 115 0.33 0.10 0.16 29 116 0.33 0.10 0.16 29 117 0.33 0.10 0.16 29 118 0.33 0.10 0.16 29 119 0.33 0.10 0.16 29 120 0.33 0.10 0.16 29 121 0.33 0.10 0.16 29 122 0.33 0.10 0.16 29 123 0.33 0.10 0.16 29 124 0.33 0.10 0.16 29 125 0.33 0.10 0.16 29 126 0.33 0.10 0.16 29 127 0.33 0.10 0.16 29 128 0.33 0.10 0.16 29 129 0.33 0.10 0.16 29 130 0.33 0.10 0.16 29 131 0.33 0.10 0.16 29 132 0.33 0.10 0.16 29 micro avg 0.33 0.10 0.16 3857 macro avg 0.33 0.10 0.16 3857 weighted avg 0.33 0.10 0.16 3857 samples avg 0.02 0.02 0.02 3857
C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in samples with no predicted labels. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result)) C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: Recall is ill-defined and being set to 0.0 in samples with no true labels. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result)) C:\Users\thors\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\metrics\_classification.py:1497: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in samples with no true nor predicted labels. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
The accuracy scores generally appear to be higher for multilabel classifiers than for the multiclass classifiers. However, we should take these accuracy scores for the multilabel classifiers with a grain of salt. In a multilabel setting, accuracy can be deceptively high when the majority of labels are negative. Since only one type combination (or two types) can be true for each Pokémon (row), all others are false, meaning we are definitely dealing with an imbalanced dataset. Recall is a mathematical measure of how many of the actual positive cases were correctly identified by a model. As we can see, recall scores are generally pretty low for the multilabel classifiers.
Multiclass multioutput Classification¶
Finally, there is Multiclass multioutput Classification, which is supported natively by some scikit-learn models.
Preprocessing¶
Here, we don't want to drop the Type 1 and Type 2 columns because we want to use them as the y vector.
df = preprocessed_df.copy()
# Some type combinations only occur once so we extract them and add them to the test data
df['Types'] = df[['Type 1', 'Type 2']].apply(lambda x: tuple(filter(lambda y: pd.notna(y), x)), axis=1)
singleton_classes = df['Types'].value_counts()[df['Types'].value_counts() == 1].index.tolist()
singleton_data = df[df['Types'].isin(singleton_classes)]
other_data = df[~df['Types'].isin(singleton_classes)]
df = df.drop(columns=['Types'])
other_data.drop(columns=['Types'], inplace=True)
singleton_data.drop(columns=['Types'], inplace=True)
y = df[['Type 1', 'Type 2']]
df.head()
Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Grass | Poison | 318 | 45 | 49 | 49 | 65 | 65 | 45 | 1 | 0 |
1 | Grass | Poison | 405 | 60 | 62 | 63 | 80 | 80 | 60 | 1 | 0 |
2 | Grass | Poison | 525 | 80 | 82 | 83 | 100 | 100 | 80 | 1 | 0 |
3 | Grass | Poison | 625 | 80 | 100 | 123 | 122 | 120 | 80 | 1 | 0 |
4 | Fire | None | 309 | 39 | 52 | 43 | 60 | 50 | 65 | 1 | 0 |
y.head()
Type 1 | Type 2 | |
---|---|---|
0 | Grass | Poison |
1 | Grass | Poison |
2 | Grass | Poison |
3 | Grass | Poison |
4 | Fire | None |
Decision Tree¶
We used a multioutput classifier to measure the accuracy score.
We calculated the accuracy for each type individually and together (where both types are correctly predicted).
Because the documentation on MultOutputClassifier's .score() method was somewhat unclear abour how/what it scored, we made our own method to calculate the accuracy for both types together and concluded it was similar to .score().
from sklearn.multioutput import MultiOutputClassifier
# Train-Test Split
X_train, X_test, y_train, y_test = train_test_split(other_data.drop(columns=['Type 1', 'Type 2']), other_data[['Type 1', 'Type 2']], test_size=0.2, stratify=other_data[['Type 1', 'Type 2']], random_state=42)
X_train = pd.concat([X_train, singleton_data.drop(columns=['Type 1', 'Type 2'])])
y_train = pd.concat([y_train, singleton_data[['Type 1', 'Type 2']]])
X_test = pd.concat([X_test, singleton_data.drop(columns=['Type 1', 'Type 2'])])
y_test = pd.concat([y_test, singleton_data[['Type 1', 'Type 2']]])
base_classifier = DecisionTreeClassifier()
multi_output_classifier = MultiOutputClassifier(base_classifier)
multi_output_classifier.fit(X_train, y_train)
base_classifier.fit(X_train, y_train)
# Step 4: Model Evaluation
y_pred = multi_output_classifier.predict(X_test)
# Evaluate
print("Score: ", multi_output_classifier.score(X_test, y_test))
y_pred = base_classifier.predict(X_test)
# our own score function
a = (y_test == y_pred)
b = []
for i, j in enumerate(a.iterrows()):
b.append(j[1]['Type 1'] and j[1]['Type 2'])
nb_correct = 0
for i in b:
if i:
nb_correct += 1
score_ratio = nb_correct/len(b)
print("score ratio: ",score_ratio)
# accuracy score for each type
accuracy_list=[]
y_test = np.asarray(y_test)
y_pred = np.asarray(y_pred)
for i in range(2):
accuracy = accuracy_score(y_test[:, i], y_pred[:, i])
accuracy_list.append(accuracy)
print("Accuracy type ", i+1, ": ", accuracy )
print("Averaged Accuracy for types: ",np.mean(accuracy_list))
Score: 0.24479166666666666 score ratio: 0.2604166666666667 Accuracy type 1 : 0.3541666666666667 Accuracy type 2 : 0.4479166666666667 Averaged Accuracy for types: 0.4010416666666667
Hyperparameter Tuning¶
pipeline = make_pipeline(StandardScaler(), MultiOutputClassifier(DecisionTreeClassifier()))
param_dist = {
"multioutputclassifier__estimator__max_depth": [5, 6, 7, 8, 9, 10, 15, 30, None],
"multioutputclassifier__estimator__min_samples_leaf": np.arange(1, 10)
}
# Instantiate the GridSearchCV object
grid_search_cv = GridSearchCV(pipeline, param_grid=param_dist, cv=5)
# Fit grid_search_cv using the data X and labels y.
grid_search_cv.fit(X_train, y_train)
y_pred = grid_search_cv.predict(X_test)
# Print the best score
print("Tuned Model Parameters: {}".format(grid_search_cv.best_params_))
print("Best score is {}".format(grid_search_cv.best_estimator_.score(X_test, y_test)))
Tuned Model Parameters: {'multioutputclassifier__estimator__max_depth': 9, 'multioutputclassifier__estimator__min_samples_leaf': 8} Best score is 0.09895833333333333
Random Forest¶
from sklearn.model_selection import cross_validate, KFold
from sklearn.multioutput import MultiOutputClassifier
base_classifier = RandomForestClassifier()
multi_output_classifier = MultiOutputClassifier(base_classifier)
multi_output_classifier.fit(X_train, y_train)
# Step 4: Model Evaluation
accuracy_list=[]
y_pred = multi_output_classifier.predict(X_test)
print("score: ", multi_output_classifier.score(X_test, y_test))
y_test = np.asarray(y_test)
y_pred = np.asarray(y_pred)
for i in range(2):
accuracy = accuracy_score(y_test[:, i], y_pred[:, i])
print("Accuracy type ", i+1, ": ", accuracy )
accuracy_list.append(accuracy)
print("Averaged Accuracy for types: ",np.mean(accuracy_list))
score: 0.3072916666666667 Accuracy type 1 : 0.390625 Accuracy type 2 : 0.5989583333333334 Averaged Accuracy for types: 0.4947916666666667
Hyperparameter Tuning¶
pipeline = make_pipeline(StandardScaler(), MultiOutputClassifier(RandomForestClassifier()))
# Setup the parameters and distributions to sample from: param_dist
param_dist = {
"multioutputclassifier__estimator__max_depth": [5, 10, 15, 30, None],
"multioutputclassifier__estimator__min_samples_leaf": np.arange(1, 10),
"multioutputclassifier__estimator__n_estimators": np.arange(60, 140)
}
# Instantiate the RandomizedSearchCV object: random_grid_search_cv
random_search_cv = RandomizedSearchCV(pipeline, param_distributions=param_dist, n_iter=60, cv=3, random_state=42)
#grid_search_cv = GridSearchCV(pipeline, param_grid=param_dist, cv=3)
# Fit random_search_cv using the data X and labels y
random_search_cv.fit(X_train, y_train)
#grid_search_cv.fit(X_train, y_train)
# Print the best score
print("Best score is {}".format(random_search_cv.best_estimator_.score(X_test, y_test)))
print("Best parameters are {}".format(random_search_cv.best_params_))
Best score is 0.3229166666666667 Best parameters are {'multioutputclassifier__estimator__n_estimators': 129, 'multioutputclassifier__estimator__min_samples_leaf': 1, 'multioutputclassifier__estimator__max_depth': 15}
KNeighborsClassifier¶
from sklearn.multioutput import MultiOutputClassifier
base_classifier = KNeighborsClassifier()
multi_output_classifier = MultiOutputClassifier(base_classifier)
multi_output_classifier.fit(X_train, y_train)
# Step 4: Model Evaluation
y_pred = multi_output_classifier.predict(X_test)
print("score: ", multi_output_classifier.score(X_test, y_test))
y_test = np.asarray(y_test)
y_pred = np.asarray(y_pred)
accuracy_list=[]
for i in range(2):
accuracy = accuracy_score(y_test[:, i], y_pred[:, i])
print("Accuracy type ", i+1, ": ", accuracy )
accuracy_list.append(accuracy)
print("Averaged Accuracy for types: ",np.mean(accuracy_list))
score: 0.10416666666666667 Accuracy type 1 : 0.265625 Accuracy type 2 : 0.4322916666666667 Averaged Accuracy for types: 0.34895833333333337
Hyperparameter Tuning¶
param_grid = {
'multioutputclassifier__estimator__n_neighbors': [3, 5, 7, 9] # List of k values to try
}
pipeline = make_pipeline(StandardScaler(), MultiOutputClassifier(KNeighborsClassifier()))
grid_search = GridSearchCV(estimator=pipeline, param_grid=param_grid, cv=5)
grid_search.fit(X_train, y_train)
best_params = grid_search.best_params_
best_score=grid_search.best_estimator_.score(X_test, y_test)
print("Best Parameters:", best_params)
print("Best Score:", best_score)
Best Parameters: {'multioutputclassifier__estimator__n_neighbors': 7} Best Score: 0.078125