AutoML — TPOT, H2O & AutoKeras Complete Guide
In this tutorial, you'll learn about AutoML. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.
AutoML automates the end-to-end Machine Learning pipeline — data preprocessing, feature engineering, model selection, hyperparameter tuning, and ensembling — reducing the time from raw data to production models from weeks to hours.
What You'll Learn
You'll learn how three leading AutoML frameworks — TPOT (tree-based pipeline optimization), H2O AutoML (distributed AutoML), and AutoKeras (neural architecture search) — automate model development, and how to select the right tool for your problem type.
Why It Matters
Data scientists spend 60-80% of their time on repetitive tasks: trying different algorithms, tuning hyperparameters, and fixing preprocessing pipelines. AutoML frees them to focus on problem formulation, feature engineering from domain knowledge, and business impact. DodaTech's security analytics team uses AutoML to rapidly prototype anomaly detection models for Durga Antivirus Pro, reducing model development time from two weeks to two days.
Real-World Use
A fraud detection team at a payment processor uses H2O AutoML to train models on new merchant categories. The AutoML system evaluates 20+ algorithms with various preprocessing strategies overnight, producing a leaderboard of the best models. The winning ensemble model is automatically registered and deployed, achieving 99.2% AUC without manual intervention. TPOT handles smaller datasets where interpretability matters, while AutoKeras handles image-based fraud document analysis.
AutoML Landscape
flowchart TD
A[Raw Data] --> B{AutoML Framework}
B --> C[TPOT]
B --> D[H2O AutoML]
B --> E[AutoKeras]
C --> F[Genetic Programming]
C --> G[Scikit-learn Pipelines]
D --> H[Distributed Training]
D --> I[Stacked Ensembles]
E --> J[Neural Architecture Search]
E --> K[Keras Models]
F --> L[Best Pipeline]
H --> L
J --> L
L --> M[Deployment Model]
style B fill:#4a90d9,color:#fff
style L fill:#2ecc71,color:#fff
TPOT — Genetic Pipeline Search
TPOT uses genetic programming to evolve Machine Learning pipelines. It starts with random pipelines (preprocessor + model), selects the best performing, and creates new generations through crossover and mutation. TPOT is ideal for structured data with scikit-learn compatibility.
from tpot import TPOTClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(
digits.data, digits.target,
test_size=0.2, random_state=42
)
tpot = TPOTClassifier(
generations=5,
population_size=20,
cv=3,
scoring='accuracy',
verbosity=2,
random_state=42,
n_jobs=-1
)
tpot.fit(X_train, y_train)
accuracy = accuracy_score(y_test, tpot.predict(X_test))
print(f"Test accuracy: {accuracy:.4f}")
print(f"Best pipeline:\n{tpot.fitted_pipeline_}")
Expected output:
Generation 1 - Best score: 0.9583
Generation 2 - Best score: 0.9645
Generation 3 - Best score: 0.9708
Generation 4 - Best score: 0.9729
Generation 5 - Best score: 0.9750
Test accuracy: 0.9722
Best pipeline:
Pipeline(steps=[('pca', PCA(n_components=0.95)),
('kneighborsclassifier', KNeighborsClassifier(n_neighbors=5))])
TPOT exports the best pipeline as Python code:
tpot.export('best_pipeline.py')
The exported file contains a complete, runnable pipeline that can be deployed without TPOT installed.
H2O AutoML — Distributed AutoML
H2O AutoML runs in a distributed Java-based engine, making it suitable for large datasets. It trains multiple algorithms — GLM, GBM, Random Forest, XGBoost, Deep Learning — and creates a stacked ensemble that combines their predictions.
import h2o
from h2o.automl import H2OAutoML
h2o.init(max_mem_size='4G')
df = h2o.import_file('https://h2o-public-test-data.s3.amazonaws.com/'
'smalldata/higgs/higgs_train_5k.csv')
x = df.columns[2:]
y = 'response'
df[y] = df[y].asfactor()
train, test = df.split_frame(ratios=[0.8], seed=42)
aml = H2OAutoML(
max_models=20,
seed=42,
max_runtime_secs=120,
sort_metric='AUC',
nfolds=3
)
aml.train(x=x, y=y, training_frame=train)
lb = aml.leaderboard
print(lb.head(10))
predictions = aml.leader.predict(test)
print(f"\nLeader model: {aml.leader.model_id}")
print(f"Leader AUC: {aml.leader.auc():.4f}")
Expected output:
model_id auc logloss
StackedEnsemble_AllModels_AutoML_20260624 0.8123 0.5214
GBM_grid_1_AutoML_20260624_model_1 0.8078 0.5289
XGBoost_1_AutoML_20260624 0.8056 0.5310
DRF_1_AutoML_20260624 0.7912 0.5487
DeepLearning_1_AutoML_20260624 0.7834 0.5612
GLM_1_AutoML_20260624 0.7645 0.5834
Leader model: StackedEnsemble_AllModels_AutoML_20260624
Leader AUC: 0.8123
H2O AutoML's stacked ensemble typically outperforms any single model by combining their strengths. The leaderboard shows every model trained, ranked by the chosen metric.
AutoKeras — Neural Architecture Search
AutoKeras automates neural network design using Bayesian optimization to search over architectures — number of layers, layer types, filter sizes, dropout rates, and more. It supports images, text, and structured data.
import autokeras as ak
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
import numpy as np
digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(
digits.data, digits.target,
test_size=0.2, random_state=42
)
clf = ak.StructuredDataClassifier(
max_trials=10,
overwrite=True,
seed=42
)
clf.fit(
X_train, y_train,
epochs=10,
validation_split=0.2,
verbose=2
)
test_loss, test_acc = clf.evaluate(X_test, y_test, verbose=0)
print(f"Test accuracy: {test_acc:.4f}")
print(f"Test loss: {test_loss:.4f}")
print(f"Best model architecture:")
print(clf.export_model().summary())
Expected output:
Test accuracy: 0.9750
Test loss: 0.1245
Best model architecture:
Model: "functional_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 64)] 0
normalization (Normalization (None, 64) 129
dense (Dense) (None, 256) 16640
re_lu (ReLU) (None, 256) 0
dropout (Dropout) (None, 256) 0
dense_1 (Dense) (None, 128) 32896
re_lu_1 (ReLU) (None, 128) 0
dropout_1 (Dropout) (None, 128) 0
classification_head_1 (None, 10) 1290
=================================================================
Total params: 50,955
Trainable params: 50,826
Non-trainable params: 129
_________________________________________________________________
AutoML Framework Comparison
| Feature | TPOT | H2O AutoML | AutoKeras |
|---|---|---|---|
| Data type | Structured | Structured | Image, text, structured |
| Search method | Genetic programming | Grid + ensemble | Bayesian NAS |
| Output model | Scikit-Learn pipeline | H2O model/ensemble | Keras model |
| Scalability | Single machine | Distributed (multi-node) | Single GPU/multi-GPU |
| Interpretability | High (standard models) | Medium (ensemble) | Low (Deep Learning) |
| Speed | Slow (many generations) | Fast (parallel) | Medium (trial-based) |
Common Errors and Mistakes
| Mistake | Why It Happens | How to Fix |
|---|---|---|
| Too few generations | TPOT doesn't converge | Use 10+ generations for good results |
| Ignoring leaderboard variance | Model ranking is unstable | Use cross-validation within AutoML |
| AutoKeras overfitting | Architecture too complex for data | Set max_trials low for small datasets |
| H2O memory errors | Large datasets in JVM | Increase max_mem_size or use data sampling |
| Deploying without testing | AutoML finds patterns, not causes | Always validate on holdout test set |
Practice Questions
- What search algorithm does TPOT use?
Answer: TPOT uses genetic programming — it evolves pipelines through selection, crossover, and mutation over multiple generations.
- How does H2O AutoML create its final model?
Answer: H2O AutoML trains multiple individual models (GBM, XGBoost, RF, GLM, Deep Learning) and combines them into a stacked ensemble that outperforms any single model.
- What is neural architecture search in AutoKeras?
Answer: AutoKeras uses Bayesian optimization to search over neural network architectures — layer types, sizes, connections — finding the best design for the given data.
- When would you choose TPOT over H2O AutoML?
Answer: When working with smaller datasets, needing interpretable Scikit-Learn pipelines, or deploying in environments without Java/H2O dependencies.
- What is the main risk of using AutoML without validation?
Answer: AutoML finds patterns in training data, including noise and spurious correlations. Without proper holdout validation, deployed models may fail on new data.
Challenge
Use all three AutoML frameworks on the same dataset (Titanic or Kaggle's House Prices). Compare TPOT (50 generations), H2O AutoML (30 models), and AutoKeras (20 trials) on test accuracy and training time. Document which framework produces the best model and which produces the most interpretable pipeline.
Real-World Task
Design an AutoML pipeline for a medical diagnostic system that must process structured lab results, medical imaging, and free-text doctor notes. Choose the right AutoML framework for each data modality and create an ensemble that combines predictions from all three into a final diagnosis score.
Next Steps
Now that you understand AutoML, learn about Hyperparameter Tuning with Optuna for manual fine-tuning, and MLflow for managing the models AutoML produces.
Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro