Sitemap & SEO for Static Sites
In this tutorial, you'll learn about Sitemap & SEO for Static Sites. We cover key concepts, practical examples, and best practices.
Sitemaps and SEO configuration for static sites ensure search engines can discover, crawl, and index every page efficiently, maximizing organic traffic through proper technical SEO foundations and structured data markup.
What You'll Learn
Why It Matters
Technical SEO is the foundation of organic search visibility. An incorrectly configured sitemap can leave thousands of pages unindexed. Missing canonical URLs can cause duplicate content penalties. Absent structured data means missing rich results in SERPs. For static sites deployed on CDNs, proper SEO configuration is particularly important because there is no server-side logic to dynamically generate sitemaps, redirects, or meta tags. At DodaTech, our sitemap includes all 2,900+ pages with proper lastmod dates, and our structured data generates rich results for tutorials and FAQs in Google Search.
Real-World Use
A documentation site discovers that 40% of its pages are not indexed because the sitemap excludes paginated pages. An e-commerce site loses rankings after a domain migration because canonical URLs were not updated. A recipe blog gains 200% more organic traffic after adding Recipe structured data that generates rich results with cooking time and ratings.
SEO Architecture for Static Sites
flowchart LR A[Hugo Build] --> B[Sitemap XML] A --> C[Robots.txt] A --> D[Canonical URLs] A --> E[Structured Data JSON-LD] A --> F[Open Graph Meta] A --> G[Twitter Cards] B --> H[Google Search Console] D --> H E --> I[Rich Results SERP] F --> J[Social Media Previews] G --> J style A fill:#f90,color:#fff
Sitemap Configuration
A sitemap is an XML file that lists all URLs on your site with metadata about each page's importance and last update time.
Hugo Sitemap Template
{{ $pages := .Site.Pages -}}
{{ $sitemap := .Site.Config.Services.Sitemap -}}
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xhtml="http://www.w3.org/1999/xhtml">
{{- range $pages -}}
{{- if and .Permalink (not .Params.sitemap_exclude) -}}
<url>
<loc>{{ .Permalink }}</loc>
{{- if not .Lastmod.IsZero -}}
<lastmod>{{ .Lastmod.Format "2006-01-02T15:04:05-07:00" }}</lastmod>
{{- end -}}
{{- with .Params.sitemap_priority -}}
<priority>{{ . }}</priority>
{{- else -}}
<priority>{{ if .IsHome }}1.0{{ else if .IsSection }}0.8{{ else }}0.5{{ end }}</priority>
{{- end -}}
{{- if .IsHome -}}
<changefreq>daily</changefreq>
{{- else -}}
<changefreq>weekly</changefreq>
{{- end -}}
{{- range .Translations -}}
<xhtml:link rel="alternate" hreflang="{{ .Language.Lang }}" href="{{ .Permalink }}"/>
{{- end -}}
</url>
{{- end -}}
{{- end -}}
</urlset>
Expected behavior: The sitemap includes all pages with their full URL, last modified date, priority (home=1.0, sections=0.8, pages=0.5), change frequency, and hreflang links for multilingual pages. Pages with sitemap_exclude: true in frontmatter are excluded.
Hugo Sitemap Config
# hugo.toml -- Sitemap configuration
baseURL = "https://tutorials.dodatech.com"
[sitemap]
changefreq = "weekly"
filename = "sitemap.xml"
priority = 0.5
[params]
sitemap_exclude_kinds = ["404", "robotsTXT"]
Expected behavior: Hugo generates /sitemap.xml at the site root with weekly change frequency and 0.5 default priority. The 404 page and robots.txt are excluded. The sitemap is submitted to Google Search Console for indexing.
Robots.txt Configuration
Robots.txt tells search engines which URLs to crawl and which to avoid.
Hugo Robots.txt Template
User-agent: *
Allow: /
{{ if hugo.IsProduction -}}
Sitemap: {{ "sitemap.xml" | absURL }}
{{ end -}}
Disallow: /admin/
Disallow: /api/
Disallow: /pagefind/
Disallow: /tags/
Disallow: /categories/
Disallow: /*/page/2/
Disallow: /*/page/3/
Crawl-Delay: 10
Expected behavior: Search engines read /robots.txt first and follow the rules. The sitemap URL is included for discovery. Admin pages, API endpoints, and thin content pages (tags, categories, paginated archives) are excluded from crawling.
Canonical URLs
Canonical URLs tell search engines which version of a page is the authoritative one, preventing duplicate content issues.
Canonical URL Implementation
{{ if .Params.canonicalURL -}}
<link rel="canonical" href="{{ .Params.canonicalURL }}">
{{ else -}}
<link rel="canonical" href="{{ .Permalink }}">
{{ end -}}
Expected behavior: Every page includes a <link rel="canonical"> tag pointing to its own URL. If a page has a canonicalURL frontmatter parameter (useful for syndicated content), that takes precedence.
Structured Data (JSON-LD)
Structured data helps search engines understand your content and generate rich results.
Article Schema for Tutorials
{{ if eq .Kind "page" -}}
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Article",
"headline": "{{ .Title }}",
"description": "{{ .Description }}",
"datePublished": "{{ .Date.Format "2006-01-02" }}",
"dateModified": "{{ .Lastmod.Format "2006-01-02" }}",
"author": {
"@type": "Organization",
"name": "DodaTech",
"url": "https://dodatech.com"
},
"publisher": {
"@type": "Organization",
"name": "DodaTech",
"logo": {
"@type": "ImageObject",
"url": "{{ "images/logo.png" | absURL }}"
}
},
"mainEntityOfPage": {
"@type": "WebPage",
"@id": "{{ .Permalink }}"
},
"image": {
"@type": "ImageObject",
"url": "{{ with .Params.image }}{{ . | absURL }}{{ else }}{{ "images/default-og.png" | absURL }}{{ end }}"
},
"keywords": "{{ delimit .Params.tags ", " }}"
}
</script>
{{ end -}}
Expected behavior: Google reads the JSON-LD and may display rich results including the headline, description, author, publish date, and image in search results. Article schema can enable the Articles carousel and news box.
BreadcrumbList Schema
{{ if .Ancestors -}}
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "BreadcrumbList",
"itemListElement": [
{{ range $i, $page := .Ancestors.Reverse }}
{
"@type": "ListItem",
"position": {{ add $i 1 }},
"name": "{{ $page.Title }}",
"item": "{{ $page.Permalink }}]
}{{ if ne (add $i 1) (len $.Ancestors) }},{{ end }}
{{ end }}
]
}
</script>
{{ end -}}
Expected behavior: Google displays breadcrumb paths in search results, helping users understand the site hierarchy and improving click-through rates.
SEO Tool Comparison
| Feature | Hugo Built-in | Yoast SEO (WordPress) | Rank Math (WordPress) |
|---|---|---|---|
| Sitemap | Built-in | Yes | Yes |
| Canonical URLs | Manual template | Automatic | Automatic |
| Structured data | Manual template | Automatic | Automatic |
| Meta tags | Manual frontmatter | Automatic | Automatic |
| Open Graph | Manual template | Automatic | Automatic |
| Robots.txt | Built-in | Yes | Yes |
| Breadcrumbs | Built-in | Yes | Yes |
| Content analysis | No | Yes | Yes |
| Schema generator | No | Limited | Advanced |
Common Errors
1. Sitemap Exceeding 50,000 URLs
Google only processes the first 50,000 URLs in a single sitemap file. For larger sites, create a sitemap index file that references multiple sitemaps divided by section or content type.
2. Missing lastmod or Incorrect Dates
Without lastmod dates, Google recrawls pages less frequently. Incorrect dates (showing yesterday for a page that has not changed in years) waste crawl budget. Ensure the sitemap reflects accurate last-modified timestamps.
3. noindex on Important Pages
Accidentally adding <meta name="robots" content="noindex"> to important pages prevents them from appearing in search results. Always verify that indexable pages are not marked as noindex.
{{ if .Params.noindex -}}
<meta name="robots" content="noindex, nofollow">
{{ else -}}
<meta name="robots" content="index, follow, max-image-preview:large">
{{ end -}}
4. Blocking CSS and JS in Robots.txt
Blocking CSS and JS files prevents Google from rendering the page correctly, potentially hurting rankings. Only block admin pages and API endpoints, never static assets.
5. Missing Hreflang Tags for Multilingual Sites
Without hreflang annotations, Google may show the wrong language version in search results. Use the xhtml:link element in the sitemap and <link rel="alternate"> tags in the page head.
{{ if .IsTranslated -}}
{{ range .Translations -}}
<link rel="alternate" hreflang="{{ .Language.Lang }}" href="{{ .Permalink }}">
{{ end -}}
<link rel="alternate" hreflang="x-default" href="{{ .Site.BaseURL }}">
{{ end -}}
6. Over-Optimizing Title Tags
Title tags that are too long (over 60 characters) get truncated in SERPs. Tags that are too short miss keyword opportunities. Ideally, keep titles between 50-60 characters with the primary keyword near the beginning.
Practice Questions
1. How many URLs can a single sitemap contain and what should you do for larger sites?
A single sitemap can contain a maximum of 50,000 URLs. For larger sites, create a sitemap index file that references multiple sitemap files, optionally divided by section or content type.
2. What is the purpose of the hreflang attribute in sitemaps and page headers?
hreflang tells search engines which language version of a page to show in search results for a given locale. Without it, multilingual sites risk showing the wrong language version to users.
3. How does canonical URL prevent duplicate content penalties?
When identical or very similar content appears at multiple URLs, the canonical tag tells search engines which URL is the authoritative version. All ranking signals (links, engagement) are attributed to the canonical URL.
4. What is the difference between index, follow and noindex, nofollow robots directives?
index, follow allows the page to be indexed and links on the page to be followed. noindex, nofollow prevents indexing and link following. Use noindex for thin content pages, admin pages, and duplicate content.
5. Challenge: Set up Google Search Console for a Hugo site and verify the sitemap is processed correctly.
Add the site to Google Search Console, verify ownership via DNS TXT record or HTML file, submit the sitemap URL, and monitor the Coverage report for indexing errors.
Mini Project: Comprehensive SEO Setup for a Hugo Static Site
Implement a complete technical SEO foundation for a Hugo site:
- Sitemap: Customize the sitemap template to include priority, changefreq, lastmod, and hreflang for all pages
- Robots.txt: Create a robots.txt that allows crawling of content while excluding admin and thin content
- Canonical URLs: Add canonical link tags to every page with a frontmatter override option
- Structured data: Add Article schema to all tutorial pages and BreadcrumbList schema globally
- Open Graph: Add Open Graph and Twitter Card meta tags for social media previews
- Meta tags: Create a partial for dynamic title, description, and robots meta tags based on frontmatter
- Verification: Add Google Search Console and Bing Webmaster Tools verification meta tags
{{/* layouts/partials/seo.html - Complete SEO partial */}}
{{/* Title */}}
<title>{{ if .Title }}{{ .Title }} | {{ end }}{{ .Site.Title }}</title>
<meta name="description" content="{{ .Description }}">
{{/* Canonical */}}
<link rel="canonical" href="{{ .Permalink }}">
{{/* Robots */}}
{{ if .Params.noindex -}}
<meta name="robots" content="noindex, nofollow">
{{ else -}}
<meta name="robots" content="index, follow, max-image-preview:large">
{{ end -}}
{{/* Open Graph */}}
<meta property="og:title" content="{{ .Title }}">
<meta property="og:description" content="{{ .Description }}">
<meta property="og:url" content="{{ .Permalink }}">
<meta property="og:type" content="{{ if .IsPage }}article{{ else }}website{{ end }}">
<meta property="og:site_name" content="{{ .Site.Title }}">
{{/* Twitter Cards */}}
<meta name="twitter:card" content="summary_large_image">
<meta name="twitter:title" content="{{ .Title }}">
<meta name="twitter:description" content="{{ .Description }}">
{{/* Structured Data */}}
{{ partial "schema.html" . }}
{{ partial "breadcrumb-schema.html" . }}
{{/* Verification */}}
<meta name="google-site-verification" content="{{ .Site.Params.googleVerification }}">
<meta name="msvalidate.01" content="{{ .Site.Params.bingVerification }}">
Test the implementation by:
- Running
hugo serverand inspecting the HTML source for all SEO tags - Submitting the sitemap to Google Search Console
- Using Google's Rich Results Test tool to verify structured data
- Checking the robots.txt at
/robots.txt - Verifying social media previews with the Open Graph Debugger
Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro