Build a Link Preview Service with Node.js
In this tutorial, you'll learn about Build a Link Preview Service with Node.js. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.
Build a link preview service with Node.js that scrapes Open Graph metadata from URLs, generates rich preview cards, and caches results in Redis for fast repeated lookups.
What You'll Build
You'll build an API that accepts any URL and returns a preview object containing the page title, description, image, and site name. The results get cached in Redis so subsequent requests for the same URL return instantly — the same technique used by social media platforms like Twitter and Facebook when you paste a link.
Why Link Previews Matter
Every time you paste a URL into Slack, WhatsApp, or a CMS, the app fetches a preview card automatically. Building your own preview service teaches you HTTP scraping, HTML parsing, Open Graph protocol handling, and Caching Strategy. Security teams use similar scraping to inspect links before users click them — Doda Browser's URL safety checker uses the same pattern to analyse links before loading.
Prerequisites
- Node.js 18+ installed
- Basic Express.js knowledge
- Redis installed locally or via Docker
Step 1: Setup
mkdir link-previewer
cd link-previewer
npm init -y
npm install express cheerio axios ioredis helmet
cheerio parses HTML like jQuery on the server. ioredis is a robust Redis client. helmet adds security headers to protect against common web vulnerabilities.
Step 2: Open Graph Scraper
// scraper.js
const axios = require('axios');
const cheerio = require('cheerio');
async function scrape(url) {
const response = await axios.get(url, {
timeout: 5000,
headers: { 'User-Agent': 'LinkPreviewer/1.0' },
validateStatus: status => status < 400
});
const $ = cheerio.load(response.data);
const meta = {};
meta.title = $('meta[property="og:title"]').attr('content')
|| $('title').text()
|| '';
meta.description = $('meta[property="og:description"]').attr('content')
|| $('meta[name="description"]').attr('content')
|| '';
meta.image = $('meta[property="og:image"]').attr('content')
|| $('meta[property="og:image:url"]').attr('content')
|| '';
meta.siteName = $('meta[property="og:site_name"]').attr('content')
|| new URL(url).hostname;
meta.url = url;
return meta;
}
module.exports = scrape;
We prioritize Open Graph tags because they're specifically designed for link previews — social media platforms all use them. If OG tags are missing, we fall back to <title> and <meta name="description">. This fallback chain ensures we always return something useful.
Step 3: Caching Layer and Server
// server.js
const express = require('express');
const helmet = require('helmet');
const Redis = require('ioredis');
const scrape = require('./scraper');
const app = express();
const redis = new Redis(); // defaults to localhost:6379
app.use(helmet());
const CACHE_TTL = 3600; // 1 hour in seconds
app.get('/api/preview', async (req, res) => {
const { url } = req.query;
if (!url) {
return res.status(400).json({ error: 'url query parameter required' });
}
// Validate URL format
try {
new URL(url);
} catch {
return res.status(400).json({ error: 'Invalid URL format' });
}
// Block dangerous URL schemes
const blocked = ['file:', 'ftp:', 'data:'];
if (blocked.some(scheme => url.startsWith(scheme))) {
return res.status(400).json({ error: 'URL scheme not allowed' });
}
try {
// Check cache first
const cached = await redis.get(`preview:${url}`);
if (cached) {
return res.json({ ...JSON.parse(cached), cached: true });
}
// Scrape and cache
const preview = await scrape(url);
await redis.set(`preview:${url}`, JSON.stringify(preview), 'EX', CACHE_TTL);
res.json({ ...preview, cached: false });
} catch (err) {
if (err.code === 'ECONNABORTED') {
return res.status(504).json({ error: 'Request timeout' });
}
res.status(502).json({ error: 'Failed to fetch URL' });
}
});
const PORT = process.env.PORT || 4000;
app.listen(PORT, () => console.log(`Link previewer running on port ${PORT}`));
The server validates the URL, blocks dangerous schemes like file:// (which could leak local files), checks Redis cache first, scrapes if not cached, then stores the result for future requests. The helmet middleware adds security headers that prevent clickjacking and XSS.
Step 4: Frontend Preview Card
<!-- public/index.html -->
<!DOCTYPE html>
<html>
<head>
<title>Link Preview Demo</title>
<style>
* { margin: 0; padding: 0; box-sizing: border-box; }
body { font-family: system-ui; max-width: 600px; margin: 0 auto; padding: 20px; }
input, button { padding: 10px; font-size: 16px; }
input { flex: 1; border: 1px solid #ddd; border-radius: 6px 0 0 6px; }
button { background: #007bff; color: white; border: none; border-radius: 0 6px 6px 0; cursor: pointer; }
.card { margin-top: 20px; border: 1px solid #ddd; border-radius: 12px; overflow: hidden; }
.card img { width: 100%; height: 200px; object-fit: cover; }
.card-body { padding: 16px; }
.card-body h2 { font-size: 18px; margin-bottom: 8px; }
.card-body p { color: #555; font-size: 14px; line-height: 1.5; }
.card-footer { padding: 8px 16px; background: #f5f5f5; font-size: 12px; color: #888; }
</style>
</head>
<body>
<h1>Link Preview API</h1>
<div style="display: flex; margin-top: 16px">
<input type="url" id="urlInput" placeholder="Paste a URL..." value="https://example.com">
<button onclick="preview()">Preview</button>
</div>
<div id="result"></div>
<script>
async function preview() {
const url = document.getElementById('urlInput').value;
const res = await fetch(`/api/preview?url=${encodeURIComponent(url)}`);
const data = await res.json();
if (data.error) {
document.getElementById('result').innerHTML =
`<div style="color: red; margin-top: 12px">${data.error}</div>`;
return;
}
document.getElementById('result').innerHTML = `
<div class="card">
${data.image ? `<img src="${data.image}" alt="" loading="lazy">` : ''}
<div class="card-body">
<h2>${data.title}</h2>
<p>${data.description}</p>
</div>
<div class="card-footer">
${data.siteName} ${data.cached ? '• Cached' : ''}
</div>
</div>
`;
}
preview();
</script>
</body>
</html>
Expected output: Paste a URL like https://github.com — the page fetches the preview and displays a card with the site's title, description, and OG image. Paste the same URL again and you'll see "Cached" in the footer — response time drops from ~500ms to under 5ms.
Architecture
sequenceDiagram
Client->>API: GET /api/preview?url=https://example.com
API->>Redis: CHECK preview:https://example.com
alt Cache Hit
Redis-->>API: Cached Data
API-->>Client: Preview (cached: true)
else Cache Miss
API->>example.com: HTTP GET with User-Agent
example.com-->>API: HTML Page
API->>Cheerio: Parse Open Graph Tags
Cheerio-->>API: Extracted Metadata
API->>Redis: SET preview:url WITH TTL 3600
API-->>Client: Preview (cached: false)
end
Common Errors
1. Timeout on large pages
Some pages load slowly or serve huge HTML. Set axios timeout to 5 seconds. Pages that don't respond in time return a 504 error rather than hanging your server.
2. SSRF vulnerabilities
Without validation, an attacker can make your server request internal IPs like http://localhost:3000/admin. Block private IP ranges and dangerous schemes. Our code blocks file://, ftp://, and data:// URLs.
3. Redis connection refused
If Redis isn't running, ioredis throws on first operation. Start Redis with redis-server or use Docker: docker run -p 6379:6379 redis:7. Handle connection errors with a fallback that skips Caching.
Practice Questions
1. Why do we prioritize Open Graph tags over regular meta tags?
OG tags are explicitly designed for link previews. They contain curated content — a page's <title> might be "Home" while its og:title is "DodaTech - Security Tools for Everyone". Social media platforms and messaging apps all use OG tags first.
2. What is the purpose of the User-Agent header in the scraper? Some servers block requests without a User-Agent or return different content for bots vs browsers. Setting a descriptive User-Agent tells the server we're a legitimate scraper, not a malicious bot.
3. How does Redis Caching improve performance? Without Caching, every request requires an HTTP request to the target site (500ms-2000ms). With Redis, repeated lookups return in under 5ms. The 1-hour TTL ensures data stays fresh while dramatically reducing latency.
4. Challenge: Rate Limiting with Redis
Add a rate limiter that allows 10 requests per minute per IP. Use redis.setex(ip, 60, count) and increment on each request. Return 429 when exceeded — preventing abuse of your preview service.
FAQ
Next Steps
- Add Puppeteer integration for JavaScript-rendered pages
- Explore API security best practices for Rate Limiting and authentication
- Try building the API Client with Electron for a desktop tool that consumes APIs
- Learn about Redis caching patterns for high-performance applications
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro