Build a Collaborative Editor with CRDTs (Step by Step)
In this tutorial, you'll learn about Build a Collaborative Editor with CRDTs (Step by Step). We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.
Build a real-time collaborative text editor using Conflict-Free Replicated Data Types (CRDTs) with WebSockets that lets multiple users edit the same document simultaneously without merge conflicts or a central lock.
What You'll Build
You'll implement a simple CRDT-based text editor where each keystroke becomes an operation with a unique identifier. Operations from different users are merged automatically — no "locking," no "last writer wins," no merge conflicts. Two users typing at the same character position both see their text preserved in a deterministic order.
Why CRDTs Matter
Traditional Conflict Resolution uses Operational Transformation (OT) — the algorithm behind Google Docs — which requires a central server to transform operations against each other. CRDTs take a different approach: each operation is designed to commute, so every replica can apply operations in any order and converge to the same state. This makes CRDTs ideal for peer-to-peer editing, offline-first apps, and distributed databases. At DodaTech, CRDT-like merge strategies help Doda Browser synchronize bookmarks and settings across devices without data loss.
Prerequisites
- Python 3.10+ installed
- Basic JavaScript for the browser client
- Familiarity with WebSocket basics from the chat app tutorial
- FastAPI and
uvicorninstalled
Step 1: Project Setup
mkdir crdt-editor && cd crdt-editor
python -m venv venv
source venv/bin/activate
pip install fastapi uvicorn websockets
Create this structure:
crdt-editor/
├── server.py # FastAPI + WebSocket server
├── crdt.py # CRDT data type implementation
├── static/
│ └── editor.html # Browser-based editor
Step 2: Understanding the CRDT Approach
We'll implement a simple CRDT based on a grow-only set with tombstones (tombstone = a deleted character marker). Every character inserted into the document gets:
- A unique ID combining a site ID (unique per user) and a local counter
- A position in the document, represented as a fractional index between two neighbors
When two users insert at the same position, one gets a slightly different fractional index. Since both indices are unique and ordered, the document converges to the same sequence on all replicas.
# crdt.py
import itertools
class Char:
def __init__(self, id_, value, is_deleted=False):
self.id = id_
self.value = value
self.is_deleted = is_deleted
def __repr__(self):
return self.value if not self.is_deleted else ""
class CRDT:
def __init__(self, site_id):
self.site_id = site_id
self.counter = itertools.count()
self.chars = []
def generate_id(self):
return (next(self.counter), self.site_id)
def insert(self, index, value, id_=None):
if id_ is None:
id_ = self.generate_id()
char = Char(id_, value)
self.chars.insert(index, char)
return id_
def delete(self, index):
if 0 <= index < len(self.chars):
self.chars[index].is_deleted = True
def get_text(self):
return "".join(c.value for c in self.chars if not c.is_deleted)
Each Char stores an ID tuple (counter, site_id) that is globally unique. The counter is per-site, and site_id distinguishes which replica created it.
Step 3: Merging Operations from Multiple Clients
The critical property: when we receive operations from another client, we insert each character at the correct position based on its ID, ensuring all replicas converge:
def merge_insert(self, char):
pos = 0
while pos < len(self.chars):
existing = self.chars[pos]
if char.id < existing.id:
self.chars.insert(pos, char)
return
pos += 1
self.chars.append(char)
def merge_delete(self, char_id):
for char in self.chars:
if char.id == char_id and not char.is_deleted:
char.is_deleted = True
break
The merge rule uses ID ordering: characters with smaller (counter, site_id) tuples appear earlier. Since IDs are unique and totally ordered, every replica places every character at the same position.
Step 4: Operation Log
We need to broadcast every operation to all connected clients. Each operation is a JSON message:
import json
def insert_op(self, index, value):
id_ = self.generate_id()
self.insert(index, value, id_)
return {"type": "insert", "id": id_, "value": value, "index": index}
def delete_op(self, index):
if 0 <= index < len(self.chars):
char_id = self.chars[index].id
self.delete(index)
return {"type": "delete", "id": char_id}
return None
Each operation carries enough information for any replica to apply it locally and converge to the same state.
Step 5: The WebSocket Server
The server runs one CRDT instance per document and broadcasts operations to all connected clients:
# server.py
from fastapi import FastAPI, WebSocket, WebSocketDisconnect
from fastapi.responses import HTMLResponse
from crdt import CRDT
import json
app = FastAPI()
clients = set()
doc = CRDT(site_id="server")
@app.get("/")
async def get():
with open("static/editor.html") as f:
return HTMLResponse(f.read())
@app.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket):
await websocket.accept()
clients.add(websocket)
try:
while True:
data = await websocket.receive_text()
op = json.loads(data)
if op["type"] == "insert":
doc.insert(op["index"], op["value"], tuple(op["id"]))
elif op["type"] == "delete":
doc.delete(op["index"])
for client in clients:
if client != websocket:
await client.send_json(op)
except WebSocketDisconnect:
clients.discard(<a href="/apis/websocket/">WebSocket</a>)
When a client connects, they receive the full document state for initial sync:
@app.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket):
await websocket.accept()
clients.add(websocket)
# Send full document to new client
sync_msg = {
"type": "sync",
"chars": [{"id": c.id, "value": c.value, "is_deleted": c.is_deleted}
for c in doc.chars]
}
await websocket.send_json(sync_msg)
# ... rest of handler
Step 6: The Browser Client
The editor captures every keystroke, sends it to the server, and applies incoming operations from other users:
<!-- static/editor.html -->
<!DOCTYPE html>
<html>
<head>
<title>CRDT Collaborative Editor</title>
<style>
* { margin: 0; padding: 0; box-sizing: border-box; }
body { font-family: 'Courier New', monospace; max-width: 900px; margin: 0 auto; padding: 20px; }
h1 { margin-bottom: 12px; font-size: 1.3em; }
textarea { width: 100%; height: 500px; font-size: 16px; padding: 16px; border: 1px solid #ddd; border-radius: 8px; resize: vertical; }
.status { margin-top: 8px; font-size: 0.85em; color: #888; }
.users { margin-bottom: 12px; font-size: 0.9em; color: #555; }
</style>
</head>
<body>
<h1>Collaborative Editor</h1>
<div class="users" id="userCount">Connected: 1</div>
<textarea id="editor" placeholder="Start typing..."></textarea>
<div class="status" id="status">Connected</div>
<script>
const ws = new WebSocket(`ws://${location.host}/ws`);
const editor = document.getElementById('editor');
let isLocalChange = false;
ws.onopen = () => document.getElementById('status').textContent = 'Connected';
ws.onmessage = function(event) {
const data = JSON.parse(event.data);
if (data.type === 'sync') {
let text = '';
for (const char of data.chars) {
if (!char.is_deleted) text += char.value;
}
isLocalChange = true;
editor.value = text;
isLocalChange = false;
return;
}
if (data.type === 'insert') {
isLocalChange = true;
const before = editor.value.substring(0, data.index);
const after = editor.value.substring(data.index);
editor.value = before + data.value + after;
isLocalChange = false;
}
if (data.type === 'delete') {
isLocalChange = true;
const before = editor.value.substring(0, data.index);
const after = editor.value.substring(data.index + 1);
editor.value = before + after;
isLocalChange = false;
}
};
editor.addEventListener('input', function(e) {
if (isLocalChange) return;
ws.send(JSON.stringify({
type: 'insert',
index: e.target.selectionStart - 1,
value: e.data || ''
}));
});
editor.addEventListener('keydown', function(e) {
if (e.key === 'Backspace' && !isLocalChange) {
const pos = editor.selectionStart;
if (pos > 0) {
ws.send(JSON.stringify({ type: 'delete', index: pos - 1 }));
}
}
});
ws.onclose = () => document.getElementById('status').textContent = 'Disconnected';
</script>
</body>
</html>
Open http://localhost:8000 in two browser tabs side by side. Type in one tab — the text appears in the other tab instantly. Try typing at the exact same position in both tabs — both characters appear without overwriting each other.
Architecture
flowchart LR
C1[Client 1] -->|Insert op| WS[WebSocket Server]
C2[Client 2] -->|Insert op| WS
C3[Client 3] -->|Insert op| WS
WS -->|Broadcast ops| C1
WS -->|Broadcast ops| C2
WS -->|Broadcast ops| C3
subgraph Server
CRDT[CRDT Document State]
WS --> CRDT
CRDT --> WS
end
Common Errors
1. Characters appear in wrong order on different clients
This happens when the ID ordering rule is not applied uniformly. Ensure every client compares IDs as (counter, site_id) tuples. Python tuple comparison handles this correctly — just make sure both elements are the same type.
2. Text cursor jumps to the end on every remote edit
The browser client resets cursor position when it replaces editor.value. Save and restore the cursor position before and after applying remote changes, or use a contenteditable div with finer-grained DOM updates.
3. Duplicate characters on merge If the merge_insert function does not check for duplicate IDs before inserting, the same character from two broadcast paths gets inserted twice. Always check if a character with that ID already exists before inserting.
4. Site IDs collide
If two clients generate the same site_id, their counter sequences collide and IDs are no longer unique. Use a UUID or a server-assigned ID for each client connection.
5. Tombstones accumulate indefinitely
Deleted characters are marked with is_deleted = True but stay in the list. For long documents, this grows memory usage. Implement periodic Garbage Collection: once all clients acknowledge a deletion, remove the tombstoned character entirely.
Practice Questions
1. Why do CRDTs not need a central lock for concurrent edits? Each operation carries a globally unique ID. The merge rule orders IDs deterministically, so every replica applies operations in the same order regardless of when they arrive. No coordination is needed — just broadcast and merge.
2. How does the ID tuple (counter, site_id) guarantee uniqueness?
The counter increments locally on each site, so no two operations from the same site share a counter value. The site ID distinguishes operations from different sites. Together they form a globally unique pair.
3. What is a tombstone and why is it necessary? A tombstone marks a character as deleted without removing it from the list. If we removed it immediately, a remote replica that hasn't yet received the delete operation might later re-insert the character, causing undeletion. Tombstones prevent this.
4. Challenge: Add undo support Implement undo by storing a stack of inverse operations. For an insert, the inverse is a delete at the same position. For a delete, the inverse is a re-insert. Broadcast undo operations like regular operations so all clients revert together.
5. Challenge: Implement cursor presence
Broadcast each user's cursor position alongside their edits. Display remote cursors as colored vertical bars in the editor. Use a separate operation type cursor that does not affect the CRDT state.
FAQ
Next Steps
- Add SQLite persistence so document state survives server restarts
- Explore advanced CRDT algorithms like RGA (Replicated Growable Array) for better performance
- Combine with the WebSocket chat app to add side-channel messaging between editors
- Study Yjs — a production CRDT framework used by Notion and Linear
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro