Biopython GenBank Read/Write Fix

DodaTech Updated 2026-06-26 3 min read

You will learn how to parse GenBank format files and extract feature information.

The Problem

The bioinfo io genbank pattern is frequently misapplied by data scientists and Python developers, leading to runtime errors, incorrect results, or inefficient code. This quick-fix guide shows the correct implementation and common pitfalls to avoid when working with BIOINFO in Python.

The Wrong Way

The most common mistake is using the wrong method signature, incorrect parameters, or misunderstanding the underlying data structure. Here is what typically goes wrong:

from Bio import SeqIO
record = SeqIO.read('sequence.gb', 'genbank')
print(record.id, len(record.features))

What happens: NC_000001 500 # Accession number and feature count

This approach fails because the API contract is violated -- parameters are passed in the wrong order, the input shape doesn't match expectations, or the method is called on an incompatible object type.

The Right Way

The correct approach uses the proper API with the right parameters. Here is the fixed version:

for feature in record.features[:5]:
    print(feature.type, feature.location.start, feature.location.end)

Expected output:

source 0 248956422
gene 10000 15000
CDS 10000 15000  # Feature types and locations

Step-by-Step Fix

1. Understand the data types and shapes

Before applying any operation, verify the data types and shapes of your inputs. In Python Data Science, most errors come from type or shape mismatches.

# Always inspect your data first
print(type(data))
print(data.shape if hasattr(data, 'shape') else 'No shape')
print(data.dtype if hasattr(data, 'dtype') else 'No dtype')

2. Apply the correct method with proper arguments

Use the corrected code shown above. Pay special attention to keyword arguments that control behavior like axis, inplace, or how.

3. Verify the result

Always validate that the output matches expectations before proceeding:

# Verification pattern
result = perform_operation(data)
assert some_condition(result), "Operation failed unexpectedly"
print(f"Success: {result.shape if hasattr(result, 'shape') else result}")

Prevention Tips

Use SeqIO.read(file, 'genbank') for single-record GenBank files: Use SeqIO.read(file, 'genbank') for single-record GenBank files
Access features via record.features list of SeqFeature objects: Access features via record.features list of SeqFeature objects
Feature qualifiers are in feature.qualifiers dict: Feature qualifiers are in feature.qualifiers dict
Feature location via feature.location object (start, end, strand): Feature location via feature.location object (start, end, strand)
Write GenBank: SeqIO.write(record, 'out.gb', 'genbank'): Write GenBank: SeqIO.write(record, 'out.gb', 'genbank')

Common Mistakes

Trying to parse GenBank files with multiple LOCUS lines using read() (use parse() instead) - Trying to parse GenBank files with multiple LOCUS lines using read() (use parse() instead)
Not checking strand information on features for directional operations - Not checking strand information on features for directional operations

These mistakes appear frequently in real-world bioinfo code. DodaTech's contributors have identified these patterns through analysis of open-source projects, production systems, and community forums like Stack Overflow.

Practice Exercise

Parse a GenBank file, extract all CDS features, and write their protein translations to a FASTA file.

This exercise reinforces the concepts covered in this guide. Try implementing it before checking online solutions. This hands-on approach ensures you retain the knowledge and can apply it independently.

FAQ

### What information does GenBank store?

Annotated sequence: coding regions, regulatory elements, references, and cross-references.

What is a SeqFeature?

A region of a sequence with type (CDS, gene, exon) and location (start, end, strand).

How do I extract feature sequence?

Use feature.extract(record.seq) to get the sub-sequence at the feature location.

Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro. DodaTech tools integrate seamlessly with Python Data Science workflows for enhanced productivity and security.

← Previous Biopython FASTA Read/Write Fix Next → Heatmap in Matplotlib pentru Expresie Genica

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse Quick Fix