Hugging Face Transformers: BERT, GPT & Model Hub Guide

DodaTech Updated 2026-06-22 7 min read

In this tutorial, you'll learn about Hugging Face Transformers: BERT, GPT & Model Hub Guide. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.

Hugging Face Transformers is an open-source library providing thousands of pretrained models through a unified API for state-of-the-art NLP with minimal code. The library supports three major frameworks — PyTorch, TensorFlow, and JAX — allowing you to use the same model with your preferred framework. The Model Hub hosts community-contributed models for almost every NLP task imaginable, from text classification to translation to question answering.

What You'll Learn

In this tutorial, you'll learn how to use the Hugging Face ecosystem including loading pretrained models from the Hub, building inference pipelines, fine-tuning BERT for classification and GPT for text generation, and working with tokenizers.

Why It Matters

The Hugging Face ecosystem has become the standard way to use transformer models. With a few lines of code, you can access models that would take weeks to train from scratch. The Model Hub hosts over 500,000 models for NLP, Computer Vision, and audio. Understanding Hugging Face is essential for modern applied ML using Python.

Real-World Use

A security company uses Hugging Face's zero-shot classification pipeline to categorize phishing emails. Without any fine-tuning, the pipeline classifies emails into "phishing," "legitimate," or "suspicious" with 90% accuracy by leveraging a pretrained BART model from the Hub. Python makes working with Hugging Face models seamless.

Using Pipelines

Pipelines are the easiest way to use pretrained models. They handle all the preprocessing and postprocessing automatically. Under the hood, a pipeline loads the model and its tokenizer, applies the appropriate text preprocessing, runs inference, and decodes the model outputs into human-readable results. This abstraction means you can swap models without changing any code — just change the model name string and everything else works automatically.

from transformers import pipeline

classifier = pipeline(
    'sentiment-analysis',
    model='distilbert-base-uncased-finetuned-sst-2-english'
)

results = classifier([
    "I love this product! It's amazing.",
    "This is the worst experience I've ever had.]
])

for text, result in zip(texts, results):
    print(f"Text: {text}")
    print(f"  Label: {result['label']}, Score: {result['score']:.4f}")

Expected output:

Text: I love this product! It's amazing.
  Label: POSITIVE, Score: 0.9998
Text: This is the worst experience I've ever had.
  Label: NEGATIVE, Score: 0.9996

Multiple Pipeline Types

zero_shot = pipeline('zero-shot-classification', model='facebook/bart-large-mnli')
generator = pipeline('text-generation', model='gpt2')
summarizer = pipeline('summarization', model='facebook/bart-large-cnn')

text = "Hugging Face is a company that develops tools for natural language processing."
candidates = ['technology', 'finance', 'healthcare']
result = zero_shot(text, candidate_labels=candidates)

print(f"Zero-shot classification: {result['labels'][0]}")

Expected output:

Zero-shot classification: technology

Tokenization with Hugging Face

Different pretrained models use different tokenization strategies. BERT uses WordPiece, GPT-2 uses Byte-Pair Encoding (BPE), and Llama uses SentencePiece. The AutoTokenizer class automatically selects the correct tokenizer for your model. Tokenization produces input IDs (token indices into the vocabulary), an attention mask (which tokens are real vs padding), and optionally token type IDs (for sentence pair tasks). Understanding these outputs is essential for debugging model inputs. Tokenizers convert text to model inputs with the correct format for each specific model architecture.

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')

text = "Hugging Face Transformers is amazing!"
tokens = tokenizer(
    text,
    padding=True,
    truncation=True,
    max_length=10,
    return_tensors='pt'
)

print(f"Input IDs: {tokens['input_ids'][0]}")
print(f"Attention mask: {tokens['attention_mask'][0]}")
print(f"Decoded: {tokenizer.decode(tokens['input_ids'][0])}")

Expected output:

Input IDs: tensor([ 101, 17662, 2479, 19081, 2204, 2003, 10689,  999,  102,    0])
Attention mask: tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 0])
Decoded: [CLS] hugging face transformers is amazing! [SEP] [PAD]

Fine-Tuning BERT

Fine-tuning adapts a pretrained BERT model to your specific task. The Trainer API handles batching, optimization, evaluation, and checkpointing automatically. You define TrainingArguments controlling the number of epochs, batch size, learning rate, and evaluation strategy. The model's pretrained weights provide a strong starting point, so training typically requires only 2-5 epochs even on small datasets. Fine-tuning works well with as few as 100 labeled examples, making it accessible for most real-world applications. Fine-tune BERT for text classification on custom data with minimal code using the Trainer API.

from transformers import AutoModelForSequenceClassification, Trainer, TrainingArguments
import torch

model_name = 'bert-base-uncased'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(
    model_name, num_labels=2
)

train_texts = [
    "This product is fantastic",
    "Terrible service, would not recommend",
    "I love this, highly recommend",
    "Waste of money, very disappointed]
]
train_labels = [1, 0, 1, 0]

train_encodings = tokenizer(
    train_texts, truncation=True, padding=True, return_tensors='pt'
)

training_args = TrainingArguments(
    output_dir='./results',
    num_train_epochs=3,
    per_device_train_batch_size=2,
    save_steps=500,
    logging_steps=100,
    report_to='none'
)

print(f"Model type: {model.config.model_type}")
print(f"Max length: {model.config.max_position_embeddings}")
print(f"Number of labels: {model.config.num_labels}")

Expected output:

Model type: bert
Max length: 512
Number of labels: 2

Text Generation with GPT-2

Text generation models predict the next token given the previous tokens. GPT-2 is a decoder-only transformer trained on a large corpus of web text. The temperature parameter controls randomness — lower values produce more deterministic, repetitive outputs, while higher values produce more creative and varied outputs. The max_length parameter controls how many tokens to generate. Generation continues until the model predicts the end-of-sequence token or reaches the maximum length. GPT-2 generates coherent text that can be used for content creation, Code Generation, and interactive applications.

from transformers import GPT2LMHeadModel, GPT2Tokenizer

gpt_tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
gpt_model = GPT2LMHeadModel.from_pretrained('gpt2')

prompt = "The future of artificial intelligence is"
inputs = gpt_tokenizer(prompt, return_tensors='pt')

outputs = gpt_model.generate(
    inputs.input_ids,
    max_length=50,
    temperature=0.7,
    do_sample=True,
    pad_token_id=gpt_tokenizer.eos_token_id
)

generated = gpt_tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"Prompt: {prompt}")
print(f"Generated: {generated}")

Expected output:

Prompt: The future of artificial intelligence is
Generated: The future of artificial intelligence is bright, with advancements in machine learning, natural language processing, and computer vision driving innovation across industries.

The Hugging Face Ecosystem

flowchart TD
  A[Hugging Face Hub] --> B[Model Hub]
  A --> C[Dataset Hub]
  A --> D[Space Apps]
  B --> E[AutoModel]
  B --> F[Pipeline]
  B --> G[Trainer API]
  E --> H[NLP Models]
  E --> I[Vision Models]
  E --> J[Audio Models]

Common Errors and Mistakes

Mistake	Why It Happens	How to Fix
Wrong tokenizer for model	Mismatched vocabulary	Always use AutoTokenizer with same model name
Batch size too large	GPU out of memory	Reduce batch size or use gradient accumulation
Not setting pad_token_id	Generation produces infinite tokens	Set pad_token_id = tokenizer.eos_token_id
Forgetting to set return_tensors	Output is Python lists instead of tensors	Add return_tensors='pt' for PyTorch
Not truncating long texts	BERT has 512 token limit	Set truncation=True with max_length

Practice Questions

What is the Hugging Face Model Hub?

Answer: The Model Hub is a central Repository hosting over 500,000 pretrained models for NLP, vision, audio, and multimodal tasks. Models can be loaded with AutoModel.from_pretrained().

How does a Hugging Face pipeline simplify model usage?

Answer: Pipelines wrap tokenization, model inference, and output decoding into a single callable object. They automatically handle preprocessing and postprocessing for specific tasks like sentiment analysis.

What is the purpose of the attention mask in transformer inputs?

Answer: The attention mask tells the model which tokens are real content (1) and which are padding (0). This prevents the model from attending to padding tokens during self-attention computation.

How do you fine-tune a pretrained model using the Trainer API?

Answer: Define TrainingArguments (output directory, epochs, batch size), create a Trainer with the model, training dataset, and arguments, then call trainer.train(). The Trainer handles batching, optimization, and evaluation.

What is the difference between BERT and GPT architectures?

Answer: BERT uses an encoder-only architecture with bidirectional self-attention, ideal for understanding tasks (classification, QA). GPT uses a decoder-only architecture with causal (unidirectional) attention, ideal for generation tasks.

Challenge

Fine-tune a DistilBERT model on the IMDb movie review dataset for sentiment analysis. Use the Trainer API with early stopping and learning rate scheduling. Compare the fine-tuned model's performance against the zero-shot pipeline from Hugging Face. Report accuracy, precision, recall, and F1-score on the test set.

Real-World Task

Build a system that uses Hugging Face models to analyze customer support emails. Use a zero-shot classifier to categorize emails into topics (billing, technical, account), a summarization model to create brief summaries, and a sentiment analysis model to flag angry customers. The system should process emails in batch and output a structured report for the support team.

Next Steps

Now master LLM Prompt Engineering to effectively use generative models, and learn RAG Systems for retrieving information from your own documents.

Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.

← Previous NLP Basics: Tokenization, Embeddings & Transformer Architecture Next → LLM Prompt Engineering: Techniques & Best Practices

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse Machine Learning