Skip to content

AutoML — TPOT, H2O & AutoKeras Complete Guide

DodaTech Updated 2026-06-24 6 min read

In this tutorial, you'll learn about AutoML. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.

AutoML automates the end-to-end Machine Learning pipeline — data preprocessing, feature engineering, model selection, hyperparameter tuning, and ensembling — reducing the time from raw data to production models from weeks to hours.

What You'll Learn

You'll learn how three leading AutoML frameworks — TPOT (tree-based pipeline optimization), H2O AutoML (distributed AutoML), and AutoKeras (neural architecture search) — automate model development, and how to select the right tool for your problem type.

Why It Matters

Data scientists spend 60-80% of their time on repetitive tasks: trying different algorithms, tuning hyperparameters, and fixing preprocessing pipelines. AutoML frees them to focus on problem formulation, feature engineering from domain knowledge, and business impact. DodaTech's security analytics team uses AutoML to rapidly prototype anomaly detection models for Durga Antivirus Pro, reducing model development time from two weeks to two days.

Real-World Use

A fraud detection team at a payment processor uses H2O AutoML to train models on new merchant categories. The AutoML system evaluates 20+ algorithms with various preprocessing strategies overnight, producing a leaderboard of the best models. The winning ensemble model is automatically registered and deployed, achieving 99.2% AUC without manual intervention. TPOT handles smaller datasets where interpretability matters, while AutoKeras handles image-based fraud document analysis.

AutoML Landscape

flowchart TD
  A[Raw Data] --> B{AutoML Framework}
  B --> C[TPOT]
  B --> D[H2O AutoML]
  B --> E[AutoKeras]
  C --> F[Genetic Programming]
  C --> G[Scikit-learn Pipelines]
  D --> H[Distributed Training]
  D --> I[Stacked Ensembles]
  E --> J[Neural Architecture Search]
  E --> K[Keras Models]
  F --> L[Best Pipeline]
  H --> L
  J --> L
  L --> M[Deployment Model]
  style B fill:#4a90d9,color:#fff
  style L fill:#2ecc71,color:#fff

TPOT uses genetic programming to evolve Machine Learning pipelines. It starts with random pipelines (preprocessor + model), selects the best performing, and creates new generations through crossover and mutation. TPOT is ideal for structured data with scikit-learn compatibility.

from tpot import TPOTClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(
    digits.data, digits.target,
    test_size=0.2, random_state=42
)

tpot = TPOTClassifier(
    generations=5,
    population_size=20,
    cv=3,
    scoring='accuracy',
    verbosity=2,
    random_state=42,
    n_jobs=-1
)

tpot.fit(X_train, y_train)

accuracy = accuracy_score(y_test, tpot.predict(X_test))
print(f"Test accuracy: {accuracy:.4f}")
print(f"Best pipeline:\n{tpot.fitted_pipeline_}")

Expected output:

Generation 1 - Best score: 0.9583
Generation 2 - Best score: 0.9645
Generation 3 - Best score: 0.9708
Generation 4 - Best score: 0.9729
Generation 5 - Best score: 0.9750

Test accuracy: 0.9722
Best pipeline:
Pipeline(steps=[('pca', PCA(n_components=0.95)),
                ('kneighborsclassifier', KNeighborsClassifier(n_neighbors=5))])

TPOT exports the best pipeline as Python code:

tpot.export('best_pipeline.py')

The exported file contains a complete, runnable pipeline that can be deployed without TPOT installed.

H2O AutoML — Distributed AutoML

H2O AutoML runs in a distributed Java-based engine, making it suitable for large datasets. It trains multiple algorithms — GLM, GBM, Random Forest, XGBoost, Deep Learning — and creates a stacked ensemble that combines their predictions.

import h2o
from h2o.automl import H2OAutoML

h2o.init(max_mem_size='4G')

df = h2o.import_file('https://h2o-public-test-data.s3.amazonaws.com/'
                     'smalldata/higgs/higgs_train_5k.csv')
x = df.columns[2:]
y = 'response'
df[y] = df[y].asfactor()

train, test = df.split_frame(ratios=[0.8], seed=42)

aml = H2OAutoML(
    max_models=20,
    seed=42,
    max_runtime_secs=120,
    sort_metric='AUC',
    nfolds=3
)

aml.train(x=x, y=y, training_frame=train)

lb = aml.leaderboard
print(lb.head(10))

predictions = aml.leader.predict(test)
print(f"\nLeader model: {aml.leader.model_id}")
print(f"Leader AUC: {aml.leader.auc():.4f}")

Expected output:

model_id                                              auc    logloss
StackedEnsemble_AllModels_AutoML_20260624  0.8123  0.5214
GBM_grid_1_AutoML_20260624_model_1         0.8078  0.5289
XGBoost_1_AutoML_20260624                  0.8056  0.5310
DRF_1_AutoML_20260624                      0.7912  0.5487
DeepLearning_1_AutoML_20260624             0.7834  0.5612
GLM_1_AutoML_20260624                      0.7645  0.5834

Leader model: StackedEnsemble_AllModels_AutoML_20260624
Leader AUC: 0.8123

H2O AutoML's stacked ensemble typically outperforms any single model by combining their strengths. The leaderboard shows every model trained, ranked by the chosen metric.

AutoKeras automates neural network design using Bayesian optimization to search over architectures — number of layers, layer types, filter sizes, dropout rates, and more. It supports images, text, and structured data.

import autokeras as ak
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
import numpy as np

digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(
    digits.data, digits.target,
    test_size=0.2, random_state=42
)

clf = ak.StructuredDataClassifier(
    max_trials=10,
    overwrite=True,
    seed=42
)

clf.fit(
    X_train, y_train,
    epochs=10,
    validation_split=0.2,
    verbose=2
)

test_loss, test_acc = clf.evaluate(X_test, y_test, verbose=0)
print(f"Test accuracy: {test_acc:.4f}")
print(f"Test loss: {test_loss:.4f}")
print(f"Best model architecture:")
print(clf.export_model().summary())

Expected output:

Test accuracy: 0.9750
Test loss: 0.1245
Best model architecture:
Model: "functional_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input_1 (InputLayer)         [(None, 64)]              0
normalization (Normalization (None, 64)                129
dense (Dense)                (None, 256)               16640
re_lu (ReLU)                 (None, 256)               0
dropout (Dropout)            (None, 256)               0
dense_1 (Dense)              (None, 128)               32896
re_lu_1 (ReLU)               (None, 128)               0
dropout_1 (Dropout)          (None, 128)               0
classification_head_1        (None, 10)                1290
=================================================================
Total params: 50,955
Trainable params: 50,826
Non-trainable params: 129
_________________________________________________________________

AutoML Framework Comparison

Feature TPOT H2O AutoML AutoKeras
Data type Structured Structured Image, text, structured
Search method Genetic programming Grid + ensemble Bayesian NAS
Output model Scikit-Learn pipeline H2O model/ensemble Keras model
Scalability Single machine Distributed (multi-node) Single GPU/multi-GPU
Interpretability High (standard models) Medium (ensemble) Low (Deep Learning)
Speed Slow (many generations) Fast (parallel) Medium (trial-based)

Common Errors and Mistakes

Mistake Why It Happens How to Fix
Too few generations TPOT doesn't converge Use 10+ generations for good results
Ignoring leaderboard variance Model ranking is unstable Use cross-validation within AutoML
AutoKeras overfitting Architecture too complex for data Set max_trials low for small datasets
H2O memory errors Large datasets in JVM Increase max_mem_size or use data sampling
Deploying without testing AutoML finds patterns, not causes Always validate on holdout test set

Practice Questions

  1. What search algorithm does TPOT use?

Answer: TPOT uses genetic programming — it evolves pipelines through selection, crossover, and mutation over multiple generations.

  1. How does H2O AutoML create its final model?

Answer: H2O AutoML trains multiple individual models (GBM, XGBoost, RF, GLM, Deep Learning) and combines them into a stacked ensemble that outperforms any single model.

  1. What is neural architecture search in AutoKeras?

Answer: AutoKeras uses Bayesian optimization to search over neural network architectures — layer types, sizes, connections — finding the best design for the given data.

  1. When would you choose TPOT over H2O AutoML?

Answer: When working with smaller datasets, needing interpretable Scikit-Learn pipelines, or deploying in environments without Java/H2O dependencies.

  1. What is the main risk of using AutoML without validation?

Answer: AutoML finds patterns in training data, including noise and spurious correlations. Without proper holdout validation, deployed models may fail on new data.

Challenge

Use all three AutoML frameworks on the same dataset (Titanic or Kaggle's House Prices). Compare TPOT (50 generations), H2O AutoML (30 models), and AutoKeras (20 trials) on test accuracy and training time. Document which framework produces the best model and which produces the most interpretable pipeline.

Real-World Task

Design an AutoML pipeline for a medical diagnostic system that must process structured lab results, medical imaging, and free-text doctor notes. Choose the right AutoML framework for each data modality and create an ensemble that combines predictions from all three into a final diagnosis score.

Next Steps

Now that you understand AutoML, learn about Hyperparameter Tuning with Optuna for manual fine-tuning, and MLflow for managing the models AutoML produces.

Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro