Workflow Automation with Python Scripts â Automate Repetitive Tasks
In this tutorial, you'll learn about Workflow Automation with Python Scripts. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.
Automate repetitive tasks with Python scripts: file processing, email handling, API interactions, scheduled jobs, and system monitoring with practical automation examples.
What You'll Learn
You will learn to write Python scripts that automate file operations, interact with APIs, send email notifications, schedule recurring tasks, and monitor system resources -- turning hours of manual work into a single command.
Why It Matters
Every developer spends a significant portion of their day on repetitive tasks: renaming files, processing data, sending reports, checking server status, and deploying code. Automating these tasks with Python scripts saves hours each week, reduces human error, and frees you to focus on creative problem-solving.
Real-World Use
The Durga Antivirus Pro team uses a Python automation script that runs every night at 2 AM: it downloads the latest virus definitions from the update server, verifies their cryptographic signatures, compiles them into the engine format, runs a test scan against a sample dataset, and generates a report. If any step fails, the script sends an alert to the on-call engineer. This Process runs entirely without human intervention.
Your Learning Path
flowchart LR
A[Python Basics] --> B[Automation Scripts]
B --> C[Monitoring Alerting]
C --> D[Infrastructure Automation]
D --> E[AI Code Generation]
B --> F{You Are Here}
style F fill:#f90,color:#fff
Prerequisites: Basic Python knowledge (variables, functions, loops, file I/O). Familiarity with the Terminal & Command Line for Beginners is helpful.
Automating File Operations
File operations are the most common automation target. Python's pathlib, shutil, and os modules handle most file tasks.
from pathlib import Path
import shutil
from datetime import datetime, timedelta
def cleanup_old_logs(log_dir, days_old=30):
"""Remove log files older than specified days."""
log_path = Path(log_dir)
cutoff = datetime.now() - timedelta(days=days_old)
removed_count = 0
freed_bytes = 0
for log_file in log_path.glob("*.log"):
mtime = datetime.fromtimestamp(log_file.stat().st_mtime)
if mtime < cutoff:
freed_bytes += log_file.stat().st_size
log_file.unlink()
removed_count += 1
return removed_count, freed_bytes
removed, freed = cleanup_old_logs("/var/log/myapp")
print(f"Cleaned {removed} files ({freed / 1048576:.2f} MB freed)")
Expected output: Cleaned 15 files (23.45 MB freed)
Renaming Files in Bulk
from pathlib import Path
def bulk_rename(directory, prefix, extension=None):
"""Rename all files in directory with a prefix and sequential number."""
path = Path(directory)
files = list(path.iterdir())
if extension:
files = [f for f in files if f.suffix == extension]
for index, file in enumerate(sorted(files), start=1):
new_name = f"{prefix}_{index:03d}{file.suffix}"
file.rename(path / new_name)
print(f"Renamed: {file.name} -> {new_name}")
# Usage: prefix all images with "screenshot"
bulk_rename("./screenshots", "screenshot", ".png")
Expected output:
Renamed: snap1.png -> screenshot_001.png
Renamed: snap2.png -> screenshot_002.png
Renamed: snap3.png -> screenshot_003.png
Automating API Interactions
Many workflows involve fetching data from APIs, processing it, and taking action based on the results.
import requests
import json
from pathlib import Path
API_KEY = "your-api-key-here"
BASE_URL = "https://api.example.com/v1"
def fetch_and_save_alerts(status="open", output_file="alerts.json"):
"""Fetch open alerts from an API and save to a JSON file."""
headers = {"Authorization": f"Bearer {API_KEY}"}
params = {"status": status, "limit": 100}
response = requests.get(
f"{BASE_URL}/alerts",
headers=headers,
params=params,
timeout=30
)
response.raise_for_status()
alerts = response.json()
output = Path(output_file)
output.write_text(json.dumps(alerts, indent=2))
print(f"Saved {len(alerts)} {status} alerts to {output_file}")
# Count by severity
severity_counts = {}
for alert in alerts:
sev = alert.get("severity", "unknown")
severity_counts[sev] = severity_counts.get(sev, 0) + 1
for sev, count in sorted(severity_counts.items()):
print(f" {sev}: {count}")
fetch_and_save_alerts()
Expected output:
Saved 42 open alerts to alerts.json
critical: 3
high: 8
medium: 18
low: 13
Scheduled Task Automation
Use the schedule library to run Python functions at specific times.
pip install schedule
import schedule
import time
from pathlib import Path
def backup_database():
"""Simulate database backup."""
timestamp = time.strftime("%Y%m%d_%H%M%S")
backup_file = Path(f"/backups/db_dump_{timestamp}.sql")
# In real code: subprocess.run(["pg_dump", ...])
backup_file.write_text("-- simulated database dump")
print(f"Backup created: {backup_file.name}")
def check_disk_space():
"""Check disk usage and alert if low."""
import shutil
usage = shutil.disk_usage("/")
percent_used = (usage.used / usage.total) * 100
print(f"Disk usage: {percent_used:.1f}%")
if percent_used > 90:
print("WARNING: Disk usage above 90%!")
def send_daily_report():
"""Generate and send daily summary."""
print("Generating daily report...")
# In real code: generate report and email it
# Schedule jobs
schedule.every().day.at("02:00").do(backup_database)
schedule.every().hour.do(check_disk_space)
schedule.every().day.at("08:00").do(send_daily_report)
print("Scheduler started. Press Ctrl+C to stop.")
while True:
schedule.run_pending()
time.sleep(60)
Expected behavior: The scheduler runs each function at its configured interval. backup_database runs daily at 2 AM. check_disk_space runs every hour. send_daily_report runs every day at 8 AM.
Email Notifications from Scripts
import smtplib
from email.message import EmailMessage
from pathlib import Path
SMTP_SERVER = "smtp.example.com"
SMTP_PORT = 587
SMTP_USER = "alerts@example.com"
SMTP_PASS = "your-password"
def send_alert(subject, body, attachment_path=None):
"""Send an email alert, optionally with an attachment."""
msg = EmailMessage()
msg["Subject"] = subject
msg["From"] = SMTP_USER
msg["To"] = "team@example.com"
msg.set_content(body)
if attachment_path:
path = Path(attachment_path)
if path.exists():
with open(path, "rb") as f:
data = f.read()
msg.add_attachment(
data,
maintype="application",
subtype="octet-stream",
filename=path.name
)
with smtplib.SMTP(SMTP_SERVER, SMTP_PORT) as server:
server.starttls()
server.login(SMTP_USER, SMTP_PASS)
server.send_message(msg)
print(f"Alert sent: {subject}")
send_alert(
"Deployment Complete - v2.3.1",
"The deployment of version 2.3.1 finished successfully.\n\n"
"Changes:\n- Fixed login timeout issue\n- Updated dependencies\n- Added new API endpoint"
)
Expected behavior: The script connects to the SMTP server, authenticates, and sends an email with the specified subject and body. If an attachment path is provided, it is included as an attachment.
Common Automation Script Mistakes
1. Hardcoding File Paths
Absolute paths like /home/alice/scripts/data.csv break when the script runs on another machine or from a different directory. Use relative paths, environment variables, or configuration files instead.
2. Not Handling Errors Gracefully
An unhandled exception crashes the entire automation. Wrap network calls and file operations in try-except blocks, and log errors instead of letting the script fail silently.
3. Ignoring Rate Limits
APIs enforce rate limits. Sending too many requests in quick succession gets your IP blocked. Add time.sleep() delays or use exponential backoff.
4. Running Without Logging
When a script runs at 2 AM and fails, you need to know why. Always log key actions, errors, and timing information to a file.
5. Not Testing on Sample Data
Running an untested cleanup script on production data can delete important files. Always test on a copy of the data first, and include a --dry-run flag.
6. Forgetting to Handle File Locks
A script that tries to read a file being written by another Process will fail. Check file availability, use temporary copies, or implement retry logic.
7. Storing Secrets in Scripts
API keys and passwords in plain text are a security risk. Use environment variables, .env files, or a secrets manager like HashiCorp Vault.
Practice Questions
1. What is the best practice for storing API keys in Python automation scripts?
Use environment variables (via os.environ) or a .env file with the python-dotenv library. Never hardcode secrets in the script source.
2. How can you test a cleanup script safely before running it on real data?
Implement a --dry-run flag that prints what would be deleted without actually removing anything. Test on a copy of the data directory first.
3. Why should automation scripts include logging? Logs provide a record of what the script did, when it ran, and where it failed. Without logs, debugging a failed overnight automation is nearly impossible.
4. What library schedules Python functions to run at specific times?
The schedule library provides a simple API for running functions at intervals or specific times of day. For production, consider APScheduler or system-level Cron Jobs.
5. Challenge: Write a Python script that monitors a directory for new .csv files, processes each one (calculate row count and column names), moves processed files to an archive folder, and sends a summary email at the end.
Mini Project: Automated Log Archiver
Write a Python script that scans a specified directory for log files, compresses files older than 7 days using gzip, moves them to an archive directory, deletes archives older than 90 days, and sends a daily report with storage statistics. Schedule it to run daily via cron or the schedule library.
from pathlib import Path
import gzip
import shutil
from datetime import datetime, timedelta
LOG_DIR = Path("/var/log/myapp")
ARCHIVE_DIR = Path("/var/log/myapp/archive")
RETENTION_DAYS = 7
ARCHIVE_RETENTION = 90
def rotate_logs():
ARCHIVE_DIR.mkdir(parents=True, exist_ok=True)
cutoff = datetime.now() - timedelta(days=RETENTION_DAYS)
archive_cutoff = datetime.now() - timedelta(days=ARCHIVE_RETENTION)
stats = {"compressed": 0, "deleted": 0, "size_saved": 0}
for log_file in LOG_DIR.glob("*.log"):
mtime = datetime.fromtimestamp(log_file.stat().st_mtime)
if mtime < cutoff:
archive_path = ARCHIVE_DIR / f"{log_file.stem}.gz"
with open(log_file, "rb") as f_in:
with gzip.open(archive_path, "wb") as f_out:
shutil.copyfileobj(f_in, f_out)
stats["size_saved"] += log_file.stat().st_size
stats["compressed"] += 1
log_file.unlink()
for archive in ARCHIVE_DIR.glob("*.gz"):
mtime = datetime.fromtimestamp(archive.stat().st_mtime)
if mtime < archive_cutoff:
archive.unlink()
stats["deleted"] += 1
print(f"Compressed {stats['compressed']} logs")
print(f"Deleted {stats['deleted']} old archives")
print(f"Freed {stats['size_saved'] / 1048576:.2f} MB")
if __name__ == "__main__":
rotate_logs()
Expected behavior: The script compresses log files older than 7 days into .gz archives, moves them to the archive folder, and deletes archives older than 90 days.
Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro