Skip to content

Build a Collaborative Editor with CRDTs (Step by Step)

DodaTech Updated 2026-06-21 9 min read

In this tutorial, you'll learn about Build a Collaborative Editor with CRDTs (Step by Step). We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.

Build a real-time collaborative text editor using Conflict-Free Replicated Data Types (CRDTs) with WebSockets that lets multiple users edit the same document simultaneously without merge conflicts or a central lock.

What You'll Build

You'll implement a simple CRDT-based text editor where each keystroke becomes an operation with a unique identifier. Operations from different users are merged automatically — no "locking," no "last writer wins," no merge conflicts. Two users typing at the same character position both see their text preserved in a deterministic order.

Why CRDTs Matter

Traditional Conflict Resolution uses Operational Transformation (OT) — the algorithm behind Google Docs — which requires a central server to transform operations against each other. CRDTs take a different approach: each operation is designed to commute, so every replica can apply operations in any order and converge to the same state. This makes CRDTs ideal for peer-to-peer editing, offline-first apps, and distributed databases. At DodaTech, CRDT-like merge strategies help Doda Browser synchronize bookmarks and settings across devices without data loss.

Prerequisites

Step 1: Project Setup

mkdir crdt-editor && cd crdt-editor
python -m venv venv
source venv/bin/activate
pip install fastapi uvicorn websockets

Create this structure:

crdt-editor/
├── server.py        # FastAPI + WebSocket server
├── crdt.py          # CRDT data type implementation
├── static/
│   └── editor.html  # Browser-based editor

Step 2: Understanding the CRDT Approach

We'll implement a simple CRDT based on a grow-only set with tombstones (tombstone = a deleted character marker). Every character inserted into the document gets:

  • A unique ID combining a site ID (unique per user) and a local counter
  • A position in the document, represented as a fractional index between two neighbors

When two users insert at the same position, one gets a slightly different fractional index. Since both indices are unique and ordered, the document converges to the same sequence on all replicas.

# crdt.py
import itertools

class Char:
    def __init__(self, id_, value, is_deleted=False):
        self.id = id_
        self.value = value
        self.is_deleted = is_deleted

    def __repr__(self):
        return self.value if not self.is_deleted else ""

class CRDT:
    def __init__(self, site_id):
        self.site_id = site_id
        self.counter = itertools.count()
        self.chars = []

    def generate_id(self):
        return (next(self.counter), self.site_id)

    def insert(self, index, value, id_=None):
        if id_ is None:
            id_ = self.generate_id()
        char = Char(id_, value)
        self.chars.insert(index, char)
        return id_

    def delete(self, index):
        if 0 <= index < len(self.chars):
            self.chars[index].is_deleted = True

    def get_text(self):
        return "".join(c.value for c in self.chars if not c.is_deleted)

Each Char stores an ID tuple (counter, site_id) that is globally unique. The counter is per-site, and site_id distinguishes which replica created it.

Step 3: Merging Operations from Multiple Clients

The critical property: when we receive operations from another client, we insert each character at the correct position based on its ID, ensuring all replicas converge:

def merge_insert(self, char):
    pos = 0
    while pos < len(self.chars):
        existing = self.chars[pos]
        if char.id < existing.id:
            self.chars.insert(pos, char)
            return
        pos += 1
    self.chars.append(char)

def merge_delete(self, char_id):
    for char in self.chars:
        if char.id == char_id and not char.is_deleted:
            char.is_deleted = True
            break

The merge rule uses ID ordering: characters with smaller (counter, site_id) tuples appear earlier. Since IDs are unique and totally ordered, every replica places every character at the same position.

Step 4: Operation Log

We need to broadcast every operation to all connected clients. Each operation is a JSON message:

import json

def insert_op(self, index, value):
    id_ = self.generate_id()
    self.insert(index, value, id_)
    return {"type": "insert", "id": id_, "value": value, "index": index}

def delete_op(self, index):
    if 0 <= index < len(self.chars):
        char_id = self.chars[index].id
        self.delete(index)
        return {"type": "delete", "id": char_id}
    return None

Each operation carries enough information for any replica to apply it locally and converge to the same state.

Step 5: The WebSocket Server

The server runs one CRDT instance per document and broadcasts operations to all connected clients:

# server.py
from fastapi import FastAPI, WebSocket, WebSocketDisconnect
from fastapi.responses import HTMLResponse
from crdt import CRDT
import json

app = FastAPI()
clients = set()
doc = CRDT(site_id="server")

@app.get("/")
async def get():
    with open("static/editor.html") as f:
        return HTMLResponse(f.read())

@app.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket):
    await websocket.accept()
    clients.add(websocket)

    try:
        while True:
            data = await websocket.receive_text()
            op = json.loads(data)

            if op["type"] == "insert":
                doc.insert(op["index"], op["value"], tuple(op["id"]))
            elif op["type"] == "delete":
                doc.delete(op["index"])

            for client in clients:
                if client != websocket:
                    await client.send_json(op)
    except WebSocketDisconnect:
        clients.discard(<a href="/apis/websocket/">WebSocket</a>)

When a client connects, they receive the full document state for initial sync:

@app.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket):
    await websocket.accept()
    clients.add(websocket)

    # Send full document to new client
    sync_msg = {
        "type": "sync",
        "chars": [{"id": c.id, "value": c.value, "is_deleted": c.is_deleted}
                  for c in doc.chars]
    }
    await websocket.send_json(sync_msg)
    # ... rest of handler

Step 6: The Browser Client

The editor captures every keystroke, sends it to the server, and applies incoming operations from other users:

<!-- static/editor.html -->
<!DOCTYPE html>
<html>
<head>
    <title>CRDT Collaborative Editor</title>
    <style>
        * { margin: 0; padding: 0; box-sizing: border-box; }
        body { font-family: 'Courier New', monospace; max-width: 900px; margin: 0 auto; padding: 20px; }
        h1 { margin-bottom: 12px; font-size: 1.3em; }
        textarea { width: 100%; height: 500px; font-size: 16px; padding: 16px; border: 1px solid #ddd; border-radius: 8px; resize: vertical; }
        .status { margin-top: 8px; font-size: 0.85em; color: #888; }
        .users { margin-bottom: 12px; font-size: 0.9em; color: #555; }
    </style>
</head>
<body>
    <h1>Collaborative Editor</h1>
    <div class="users" id="userCount">Connected: 1</div>
    <textarea id="editor" placeholder="Start typing..."></textarea>
    <div class="status" id="status">Connected</div>

    <script>
        const ws = new WebSocket(`ws://${location.host}/ws`);
        const editor = document.getElementById('editor');
        let isLocalChange = false;

        ws.onopen = () => document.getElementById('status').textContent = 'Connected';

        ws.onmessage = function(event) {
            const data = JSON.parse(event.data);

            if (data.type === 'sync') {
                let text = '';
                for (const char of data.chars) {
                    if (!char.is_deleted) text += char.value;
                }
                isLocalChange = true;
                editor.value = text;
                isLocalChange = false;
                return;
            }

            if (data.type === 'insert') {
                isLocalChange = true;
                const before = editor.value.substring(0, data.index);
                const after = editor.value.substring(data.index);
                editor.value = before + data.value + after;
                isLocalChange = false;
            }

            if (data.type === 'delete') {
                isLocalChange = true;
                const before = editor.value.substring(0, data.index);
                const after = editor.value.substring(data.index + 1);
                editor.value = before + after;
                isLocalChange = false;
            }
        };

        editor.addEventListener('input', function(e) {
            if (isLocalChange) return;
            ws.send(JSON.stringify({
                type: 'insert',
                index: e.target.selectionStart - 1,
                value: e.data || ''
            }));
        });

        editor.addEventListener('keydown', function(e) {
            if (e.key === 'Backspace' && !isLocalChange) {
                const pos = editor.selectionStart;
                if (pos > 0) {
                    ws.send(JSON.stringify({ type: 'delete', index: pos - 1 }));
                }
            }
        });

        ws.onclose = () => document.getElementById('status').textContent = 'Disconnected';
    </script>
</body>
</html>

Open http://localhost:8000 in two browser tabs side by side. Type in one tab — the text appears in the other tab instantly. Try typing at the exact same position in both tabs — both characters appear without overwriting each other.

Architecture

flowchart LR
    C1[Client 1] -->|Insert op| WS[WebSocket Server]
    C2[Client 2] -->|Insert op| WS
    C3[Client 3] -->|Insert op| WS
    WS -->|Broadcast ops| C1
    WS -->|Broadcast ops| C2
    WS -->|Broadcast ops| C3
    subgraph Server
        CRDT[CRDT Document State]
        WS --> CRDT
        CRDT --> WS
    end

Common Errors

1. Characters appear in wrong order on different clients This happens when the ID ordering rule is not applied uniformly. Ensure every client compares IDs as (counter, site_id) tuples. Python tuple comparison handles this correctly — just make sure both elements are the same type.

2. Text cursor jumps to the end on every remote edit The browser client resets cursor position when it replaces editor.value. Save and restore the cursor position before and after applying remote changes, or use a contenteditable div with finer-grained DOM updates.

3. Duplicate characters on merge If the merge_insert function does not check for duplicate IDs before inserting, the same character from two broadcast paths gets inserted twice. Always check if a character with that ID already exists before inserting.

4. Site IDs collide If two clients generate the same site_id, their counter sequences collide and IDs are no longer unique. Use a UUID or a server-assigned ID for each client connection.

5. Tombstones accumulate indefinitely Deleted characters are marked with is_deleted = True but stay in the list. For long documents, this grows memory usage. Implement periodic Garbage Collection: once all clients acknowledge a deletion, remove the tombstoned character entirely.

Practice Questions

1. Why do CRDTs not need a central lock for concurrent edits? Each operation carries a globally unique ID. The merge rule orders IDs deterministically, so every replica applies operations in the same order regardless of when they arrive. No coordination is needed — just broadcast and merge.

2. How does the ID tuple (counter, site_id) guarantee uniqueness? The counter increments locally on each site, so no two operations from the same site share a counter value. The site ID distinguishes operations from different sites. Together they form a globally unique pair.

3. What is a tombstone and why is it necessary? A tombstone marks a character as deleted without removing it from the list. If we removed it immediately, a remote replica that hasn't yet received the delete operation might later re-insert the character, causing undeletion. Tombstones prevent this.

4. Challenge: Add undo support Implement undo by storing a stack of inverse operations. For an insert, the inverse is a delete at the same position. For a delete, the inverse is a re-insert. Broadcast undo operations like regular operations so all clients revert together.

5. Challenge: Implement cursor presence Broadcast each user's cursor position alongside their edits. Display remote cursors as colored vertical bars in the editor. Use a separate operation type cursor that does not affect the CRDT state.

FAQ

How do CRDTs differ from Operational Transformation (OT)?

OT transforms each operation against concurrent operations to produce the same final state. CRDTs avoid transformation by designing operations that commute naturally. OT typically requires a central server; CRDTs work in peer-to-peer settings.

Does this CRDT scale to hundreds of users?

The simple list-based CRDT has O(n) insertion cost where n is the document length. For large documents with many collaborators, use a tree-based CRDT (like RGA) or a block-split approach for O(log n) operations.

Can I use CRDTs for data other than text?

Yes. CRDTs exist for counters (increment/decrement), sets (add/remove), maps, and lists. They are used in distributed databases like Redis CRDTs, Riak, and automerge for JSON-like data structures.

Next Steps

  • Add SQLite persistence so document state survives server restarts
  • Explore advanced CRDT algorithms like RGA (Replicated Growable Array) for better performance
  • Combine with the WebSocket chat app to add side-channel messaging between editors
  • Study Yjs — a production CRDT framework used by Notion and Linear

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro