This project:
- Scrapes the Dotabuff to build a dataset
- Uses various Machine Learning/Deep Learning technologies to predict the outcome of the given matches.
- BeautifulSoup4
- lxml
- Tensorflow
- Pandas
- Scikit-Learn
- Numpy
Tensorflow doesn't let you use the most recent version of which is currently python3.11 . You can change the python version from the beginning of my files.
#!/usr/bin/env python3- The first few lines import various libraries that are needed for the script, including os, tensorflow, pandas, and sklearn.
- The script splits the data and results datasets into training and testing sets using the
train_test_split functionfromsklearn.model_selection. - The script then imports the
RandomForestClassifierclass fromsklearn.ensembleand creates an instance of this class calledrfc. Therfcmodel is then fit to the training data using thefitmethod. - The script uses the predict method of the rfc model to generate predictions for both the test data and the
test_data dataset. Theaccuracy_score, function fromsklearn.metricsis then used to calculate the accuracy of the model's predictions on both the test data and thetest_datadataset. - The script repeats this process for two other machine learning models: a decision tree classifier and a support vector classifier. The script imports the necessary classes, creates instances of the models, fits them to the training data, generates predictions, and calculates the accuracy of the predictions.
- The first few lines import various libraries that are needed for the script, including os, tensorflow, pandas, and sklearn.
- The script sets some environment variables related to TensorFlow, which is a library for machine learning.
- The script loads a dataset from a CSV file using
pandas.read_csvand processes the dataset by removing duplicates, dropping rows with missing values, and removing a column called 'Unnamed: 0'. - The script creates two variables, X and y, to store the feature data and target data from the dataset, respectively. The X variable stores the values of all columns except for the 'Result' column, and the y variable stores the values of the 'Result' column.
- The script loads another dataset called
test_datafrom a CSV file and processes it in a similar way as the first dataset. It also creates atest_resultsvariable to store the values of the 'Result' column from thetest_datadataset. - The script creates an instance of the MinMaxScaler class from sklearn.preprocessing and uses it to scale the values in the X variable.
- The script then splits the scaled X and y datasets into training and testing sets using the
train_test_splitfunction fromsklearn.model_selection. - The script creates a neural network model using the Sequential class from tensorflow.keras. The model consists of several fully connected (Dense) layers with various activation functions and regularization techniques applied.
- The script compiles the model using the compile method, specifying the optimizer, loss function, and metrics to be used during training.
- The script creates an EarlyStopping callback object and passes it to the fit method of the model to interrupt training if the validation loss does not improve after a certain number of epochs.
- The script trains the model on the training data using the fit method, specifying the number of epochs, batch size, and validation split to be used.
- The script evaluates the model on the test data using the evaluate method and prints the test accuracy.
- The script generates predictions for the test data and the
test_data datasetusing the predict method of the model.
- The first few lines import various libraries that are needed for the script, including os, tensorflow, pandas, and sklearn.
- The script sets some environment variables related to TensorFlow, which is a library for machine learning.
- The script loads a dataset from a CSV file using
pandas.read_csvand processes the dataset by removing duplicates, dropping rows with missing values, and removing a column called 'Unnamed: 0'. - The script creates two variables, X and y, to store the feature data and target data from the dataset, respectively. The X variable stores the values of all columns except for the 'Result' column, and the y variable stores the values of the 'Result' column.
- The script loads another dataset called
test_datafrom a CSV file and processes it in a similar way as the first dataset. It also creates a test_results variable to store the values of the 'Result' column from thetest_datadataset. - The script creates an instance of the MinMaxScaler class from sklearn.preprocessing and uses it to scale the values in the X variable.
- The script creates a KFold object from
sklearn.model_selectionwith a specified number of splits and shuffle options. - The script enters a loop to iterate over the splits generated by the KFold object. On each iteration, the script splits the scaled X and y datasets into training and testing sets using the
train_indexandtest_indexvariables. - The script creates neural network model using the Sequential class from tensorflow.keras. The model consists of several fully connected (Dense) layers with various activation functions and regularization techniques applied.
- The script compiles the model using the compile method, specifying the optimizer, loss function, and metrics to be used during training.
- The script creates an EarlyStopping callback object and passes it to the fit method of the model to interrupt training if the validation loss does not improve after a certain number of epochs.
- The script trains the model on the training data using the fit method, specifying the number of epochs, batch size, and validation split to be used.
- The script evaluates the model on the test data using the evaluate method and prints the test loss and accuracy.
- The script generates predictions for the test data and the
test_datadataset using the predict method of the model. - The script converts the predictions to binary values using a threshold of 0.5 and calculates the accuracy of the predictions on the
test_resultsdataset using the accuracy_score function from sklearn.metrics.
NOTE: Datasets can be updated by using the fetch-script