In the era of modern technology, machine learning has become an integral part of various industries. To cater to the growing demand for automated machine learning (AutoML) solutions, AWS introduces AWS SageMaker Autopilot - AutoGluon, an open-source library that simplifies the process of building and deploying ML models.
AWS SageMaker Autopilot offers a user-friendly interface to automate ML tasks, including data preparation, feature selection, model training, and hyperparameter optimization. Users can achieve state-of-the-art predictive performance, even without prior programming or data science expertise.
This automated ML service builds, trains, and optimizes ML models based on provided datasets. The process is straightforward: users upload a tabular dataset (such as a CSV file) to S3, select the target column for prediction, and let SageMaker Autopilot explore various solutions to find the best model. The results are then reviewed, and the chosen model is deployed for production use.
AWS SageMaker Autopilot goes beyond off-the-shelf models by fitting various models ranging from boosted trees to custom neural networks. These models are then ensembled in a novel way: stacked in multiple layers and trained in a layer-wise manner. This approach guarantees that raw data can be transformed into high-quality predictions within a given time constraint, while also mitigating overfitting through careful tracking of out-of-fold examples. It is suitable for a range of problem types, including regression, binary and multiclass classification, and ranking problems.
How to use?
You can reach the AutoML under SageMaker Studio-> Create AutoML experiment.
![](https://sonne.technology/media/23082023/image-20230814-182318.png)
Step 1: Uploading the Dataset to S3 - Data Loading
To begin, please upload the dataset to an S3 bucket. This step ensures that your data is accessible and ready for further processing.
Step 2: Specifying Input and Output Locations - Configuration
In this step, you simply need to specify the location of the input file within the S3 bucket. Additionally, if there is a specific location where you would like the output data to be stored, you can specify that as well. After providing the necessary information, click the "Next" button to proceed.
![](https://sonne.technology/media/23082023/image-20230814-182440.png)
Step 3: Specify the Target Column and Select Features (if necessary)
In this step, you need to identify the target column that you want your model to predict. Depending on whether the target column contains numeric values or binary values, the algorithms used will differ. For numeric target columns, regression algorithms will be applied, while for binary target columns, classification algorithms will be utilized.
If you have specific features that you believe are relevant for the prediction task, you can selectively choose them. However, if you are unsure or want to consider all available features, you can leave them all selected by default. Once you have made your selections, proceed by clicking the "Next" button.
![](https://sonne.technology/media/23082023/image-20230814-182536.png)
Step 4: Select the “Auto” to let the autopilot select the method.
![](https://sonne.technology/media/23082023/image-20230814-182628.png)
Decide whether you will deploy immediately or later. Please also check advanced settings.
![](https://sonne.technology/media/23082023/image-20230814-182704.png)
One of the useful parameters is “runtime” parameters. You may limit the time for training of the dataset and job run time.
![](https://sonne.technology/media/23082023/image-20230814-182748.png)
After running the models, SageMaker Autopilot prepare the reports and shows the best model and the others.
![](https://sonne.technology/media/23082023/image-20230814-182822.png)
For each model, Autopilot keeps predictions and model details and store in S3.
In leaderboard, all models are compared.
![](https://sonne.technology/media/23082023/image-20230814-182857.png)
And for each method, both predictions of the method and model quality reports saved to S3 Bucket.
![](https://sonne.technology/media/23082023/image-20230814-182929.png)
![](https://sonne.technology/media/23082023/image-20230814-183000.png)
![](https://sonne.technology/media/23082023/image-20230814-183026.png)
Also a Notebook for Data Exploration saved in S3 Bucket as well. In this notebook, there are several sections to analyze the results.
Results
Dataset summary:
![](https://sonne.technology/media/23082023/image-20230814-183419.png)
Target Analysis:
![](https://sonne.technology/media/23082023/image-20230814-183506.png)
Data Sample and Prediction power:
![](https://sonne.technology/media/23082023/image-20230814-183551.png)
Duplicate rows and cross column Statistics:
![](https://sonne.technology/media/23082023/image-20230814-183635.png)
Anomalies:
![](https://sonne.technology/media/23082023/image-20230814-183713.png)
Missing Values:
![](https://sonne.technology/media/23082023/image-20230814-183744.png)
Cardinality:
![](https://sonne.technology/media/23082023/image-20230814-183859.png)
Statistics:
![](https://sonne.technology/media/23082023/image-20230814-183933.png)
So in summary, with SageMaker AutoPilot, developers and non-developers alike can now harness the power of automated machine learning, paving the way for faster and more efficient model development.