Using Hugging Face Transformers — Pretrained Models in Python

DodaTech 7 min read

In this tutorial, you'll learn about Using Hugging Face Transformers. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.

What You'll Learn

Use Hugging Face's Transformers library to load powerful pretrained models and apply them to text tasks with minimal code.

Why It Matters

You don't need to train models from scratch. Hugging Face gives you access to 500,000+ pretrained models that work out of the box.

Real-World Use

Sentiment analysis for customer feedback, text summarization for reports, named entity recognition for document processing.

Installation

pip install transformers torch

The transformers library provides the model architectures, tokenizers, and the high-level pipeline API. torch (PyTorch) is the Deep Learning backend that runs the model computations. If you prefer TensorFlow, install TensorFlow instead — Transformers supports both. After installation, the first model download may take a minute or two since it downloads the model weights (typically 100MB-1GB depending on the model).

Sentiment Analysis

from transformers import pipeline

classifier = pipeline("sentiment-analysis")
result = classifier("I love this product! It's amazing.")
print(result)

Expected output:

[{'label': 'POSITIVE', 'score': 0.9998}]

The pipeline() function is the simplest entry point. Pass it a task name and it automatically downloads and loads a default model for that task. Here, "sentiment-analysis" loads a DistilBERT model fine-tuned on the SST-2 dataset. The result is a list of dicts — for classification tasks, each dict contains a label (POSITIVE or NEGATIVE) and a score (confidence between 0 and 1). A score of 0.9998 means the model is 99.98% confident.

You can pass multiple texts at once: classifier(["Great!", "Terrible.", "Okay."]) returns a list of results. This is much faster than calling the pipeline repeatedly because the inputs are batched internally.

What the model does internally: The text is tokenized (split into word pieces), converted to input IDs, fed through 6 transformer layers, passed through a classification head, and softmax produces the final scores. All of this happens in the background — you just see the label and score.

Text Generation

generator = pipeline("text-generation", model="gpt2")
result = generator(
    "Python is a versatile programming language because",
    max_length=50,
    num_return_sequences=1
)
print(result[0]["generated_text"])

Text generation models predict the next token (word or subword) given a prompt, then feed the result back as input to generate the next token, and so on. The max_length=50 parameter limits the total length of the prompt plus generated text to 50 tokens. num_return_sequences=1 controls how many alternative continuations to generate.

We explicitly specify model="gpt2" because the default text-generation model is larger and slower to download. GPT-2 is a 124M-parameter model from OpenAI that generates coherent English text. For better results, try "distilgpt2" (faster) or "EleutherAI/gpt-neo-1.3B" (larger, more coherent).

Key parameters: temperature controls randomness (lower = more deterministic, higher = more creative, default 1.0). top_k limits sampling to the top K most likely next tokens. top_p (nucleus sampling) dynamically selects tokens until the cumulative probability reaches p. Experiment with temperature=0.7 for creative writing or temperature=0.2 for factual completions.

Summarization

summarizer = pipeline("summarization")

text = """
Machine learning is a subset of artificial intelligence that enables systems
to learn and improve from experience without being explicitly programmed.
It focuses on developing computer programs that can access data and use it
to learn for themselves. The process of learning begins with observations or
data, such as examples, direct experience, or instruction.
"""

result = summarizer(text, max_length=50, min_length=20)
print(result[0]["summary_text"])

The summarization pipeline uses a sequence-to-sequence model (by default, sshleifer/distilbart-cnn-12-6) that reads the full input and generates a shorter version. max_length=50 caps the summary length, and min_length=20 ensures it does not truncate too aggressively. The model produces abstractive summaries — it rephrases content rather than simply extracting sentences.

Note: Summarization requires the input to be within the model's maximum input length (typically 1024 tokens for BART-based models). Longer texts are automatically truncated. For documents exceeding the limit, split the text into sections, summarize each section, then summarize the summaries — a technique called hierarchical summarization.

The default model works well for news articles and general text. For scientific papers, try "allenai/led-base-16384" which handles longer inputs. For legal documents, "philschmid/bart-large-cnn-samsum" is tuned for conversational summarization.

Named Entity Recognition

ner = pipeline("ner", aggregation_strategy="simple")
text = "Apple Inc. was founded by Steve Jobs in Cupertino, California."
result = ner(text)

for entity in result:
    print(f"{entity['word']}: {entity['entity_group']} (score: {entity['score']:.2f})")

Expected output:

Apple Inc.: ORG (score: 0.99)
Steve Jobs: PER (score: 0.99)
Cupertino: LOC (score: 0.99)
California: LOC (score: 0.99)

Named entity recognition finds spans of text that refer to real-world entities and classifies them into categories. The default NER model (a fine-tuned BERT variant) recognizes: ORG (organizations), PER (people), LOC (locations), MISC (miscellaneous), plus DATE, TIME, MONEY, PERCENT, and others depending on the dataset used for training.

aggregation_<a href="/design-patterns/strategy/">Strategy</a>="simple" merges sub-word tokens back into whole words. Without this, multi-word entities like "Apple Inc." would be split into tokens like ["Apple", "Inc", "."] with separate predictions for each. The "simple" Strategy groups consecutive tokens with the same entity label and the same word (based on the ## prefix used by WordPiece tokenization).

Use cases: NER powers information extraction pipelines — pull all company names from a news article, extract all dates from a contract, or identify all person names in a support ticket to route to the right agent.

Using a Model from the Hub

from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_name = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

inputs = tokenizer("This tutorial is very helpful!", return_tensors="pt")
outputs = model(**inputs)
print(outputs.logits)

When you need more control than the pipeline offers, use the Auto* classes directly. AutoTokenizer converts text into the input format the model expects: token IDs, attention masks, and token type IDs. AutoModelForSequenceClassification loads the model with a classification head appropriate for the task.

return_tensors="pt" returns PyTorch tensors (use "tf" for TensorFlow). The **inputs unpacks the tokenizer output dict into the model's forward pass — the model receives input_ids, attention_mask, and any other required fields automatically.

The raw outputs.logits are unnormalized scores for each class. For the SST-2 model (2 classes: negative, positive), a higher first logit means negative, higher second means positive. Convert to probabilities with torch.nn.functional.softmax(outputs.logits, dim=1).

Why use Auto classes instead of pipeline?* You get access to intermediate layer outputs, control over the device (model.to("cuda")), batch processing logic, gradient computation for fine-tuning, and the ability to inspect or modify model internals. The pipeline is great for quick testing. Auto* classes are necessary for production and customization.

Common Errors

1. Model download fails

OSError: Can't load tokenizer for 'nonexistent-model'

The model name must match a valid Hugging Face Hub Repository. Check spelling on huggingface.co/models. If you are behind a corporate proxy, set export HF_ENDPOINT=https://hf-mirror.com or configure proxies in the transformers download settings.

2. Out of memory (CUDA or CPU)

RuntimeError: CUDA out of memory

Large models (GPT-2 large, BERT large, LLaMA) require significant RAM/VRAM. Use a smaller variant like distilgpt2 instead of gpt2, enable device_map="auto" for automatic offloading, or set torch_dtype=torch.float16 to reduce memory by half.

3. Pipeline returns no output or empty results

Some tasks (like summarization) require input longer than min_length. If your input text is short, reduce min_length or increase the input text length. For NER, ensure the text contains named entities in the model's training distribution.

4. Tokenizer mismatch

Your max_length is set to 512, but your input has 600 tokens

The tokenizer truncates or errors on inputs exceeding the model's maximum length. Pass truncation=True to the tokenizer to silently truncate, or use a model with a longer maximum length.

5. Tensor type mismatch (PyTorch vs TensorFlow)

If you installed TensorFlow but use return_tensors="pt", you get an error about missing PyTorch. Use return_tensors="tf" for TensorFlow, or install PyTorch (pip install torch) to use "pt".

Practice

Sentiment on custom data: Run sentiment analysis on 5 different sentences — include sarcasm ("Oh great, another meeting"), neutral statements ("The sky is blue"), and technical content ("The API returned status code 404"). Compare the confidence scores across different types.
Compare generation parameters: Generate text with temperature=0.2, temperature=1.0, and temperature=1.5 using the same prompt. Observe how creativity changes. At very high temperatures (>2.0), the output often becomes gibberish.
Batch NER processing: Create a list of 5 sentences about technology companies and pass them all to the NER pipeline in one call. Count unique organization names across all results.
Challenge: Build a text-processing script that reads a news article from a URL, uses the summarization pipeline to produce a 2-sentence summary, and then runs NER on the summary to extract key entities. Display the summary alongside the entities.

Summary

You used Hugging Face Transformers for five core NLP tasks: sentiment analysis, text generation, summarization, named entity recognition, and direct model loading from the Hub. The pipeline API lets you run any of these tasks in three lines of code, while the AutoTokenizer and AutoModel* classes give you full control for production use. With over 500,000 pretrained models available, you can apply state-of-the-art NLP to your data without training a single model from scratch.

← Previous PyTorch Basics — Build Your First Neural Network Next → Fine-Tuning a Language Model — Custom Training with LLaMA

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse Ai Ml