Building Domain-Specific Languages (DSLs) — Complete Guide
In this tutorial, you'll learn about Building Domain. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.
Domain-specific languages are programming languages designed for a specific problem domain rather than general-purpose use, offering concise notation, domain-specific abstractions, and automated analysis or Code Generation for specialized tasks.
What You'll Learn & Why It Matters
In this tutorial, you will learn how to design and implement DSLs, from internal DSLs embedded in host languages to external DSLs with custom parsers and code generators. DSLs dramatically improve productivity in their target domains.
Real-world use: Durga Antivirus Pro uses an internal DSL for describing malware detection rules, allowing security analysts to write complex behavioral patterns in a concise language without understanding the underlying binary analysis engine.
Prerequisites
You should understand parsing from the parser generators tutorial. Familiarity with the interpreter design tutorial is helpful. Experience with any Python or JavaScript is assumed.
Internal vs External DSLs
| Aspect | Internal DSL | External DSL |
|---|---|---|
| Definition | Built within a host language | Custom language with own syntax |
| Implementation | Library + API design | Custom parser + Interpreter/compiler |
| Learning curve | Low (uses host language) | High (new syntax to learn) |
| Flexibility | Limited by host language | Unlimited |
| Tooling | Host language tools | Need custom tools |
| Examples | Ruby on Rails, LINQ, jQuery | SQL, Regex, GraphQL |
graph TD
subgraph "Internal DSL"
I1[Host Language Code] --> I2[Fluent API]
I2 --> I3[Domain Operations]
end
subgraph "External DSL"
E1[DSL Source] --> E2[Custom Parser]
E2 --> E3[AST]
E3 --> E4[Interpreter]
E3 --> E5[Code Generator]
E5 --> E6[Host Language Code]
end
style I1 fill:#4CAF50,color:#fff
style E1 fill:#2196F3,color:#fff
style E6 fill:#FF9800,color:#fff
Building an Internal DSL in Python
Internal DSLs use the host language's syntax features to create a domain-specific vocabulary.
class QueryBuilder:
def __init__(self):
self._table = None
self._fields = []
self._conditions = []
self._order_by = None
def select(self, *fields):
self._fields = list(fields)
return self
def from_table(self, table):
self._table = table
return self
def where(self, condition):
self._conditions.append(condition)
return self
def order_by(self, field, direction="ASC"):
self._order_by = f"{field} {direction}"
return self
def build(self):
query = f"SELECT {', '.join(self._fields) if self._fields else '*'}"
query += f" FROM {self._table}"
if self._conditions:
query += " WHERE " + " AND ".join(self._conditions)
if self._order_by:
query += f" ORDER BY {self._order_by}"
query += ";"
return query
query = (QueryBuilder()
.select("name", "email", "age")
.from_table("users")
.where("age > 18")
.where("status = 'active'")
.order_by("name")
.build())
print(query)
Expected output:
SELECT name, email, age FROM users WHERE age > 18 AND status = 'active' ORDER BY name ASC;
Building an External DSL
Step 1: Design the Language
Design a DSL for network configuration:
server web-server {
host "192.168.1.10"
port 8080
ssl true
}
route /api/* {
target web-server
rate_limit 100
}
firewall {
allow tcp/80
allow tcp/443
deny all
}
Step 2: Implement the Lexer
import re
class ConfigDSLLexer:
tokens = [
("IDENTIFIER", r'[a-zA-Z_][a-zA-Z0-9_]*'),
("STRING", r'"[^"]*"'),
("NUMBER", r'\d+'),
("LBRACE", r'\{'),
("RBRACE", r'\}'),
("SLASH", r'/'),
("DOT", r'\.'),
("STAR", r'\*'),
("NEWLINE", r'\n'),
("SKIP", r'[ \t]+'),
]
def __init__(self, source):
self.source = source
self.tokens = []
self.tokenize()
def tokenize(self):
pos = 0
while pos < len(self.source):
match = None
for name, pattern in self.tokens:
regex = re.compile(pattern)
match = regex.match(self.source, pos)
if match:
text = match.group(0)
if name != "SKIP":
self.tokens.append((name, text, pos))
pos = match.end()
break
if not match:
raise SyntaxError(f"Unexpected character at position {pos}")
Step 3: Implement the Parser and Interpreter
class ConfigDSLParser:
def __init__(self, tokens):
self.tokens = tokens
self.pos = 0
self.config = {"servers": {}, "routes": [], "firewall": []}
def parse(self):
while self.pos < len(self.tokens):
token = self.tokens[self.pos]
if token[0] == "IDENTIFIER" and token[1] == "server":
self.parse_server()
elif token[0] == "IDENTIFIER" and token[1] == "route":
self.parse_route()
elif token[0] == "IDENTIFIER" and token[1] == "firewall":
self.parse_firewall()
else:
self.pos += 1
return self.config
def parse_server(self):
self.pos += 1
name_token = self.tokens[self.pos]
name = name_token[1]
self.pos += 1
self.expect("LBRACE")
server = {"host": "", "port": 0, "ssl": False}
while self.tokens[self.pos][0] != "RBRACE":
key = self.tokens[self.pos][1]
self.pos += 1
val_token = self.tokens[self.pos]
if val_token[0] == "STRING":
val = val_token[1].strip('"')
elif val_token[0] == "NUMBER":
val = int(val_token[1])
elif val_token[1] == "true":
val = True
elif val_token[1] == "false":
val = False
server[key] = val
self.pos += 1
self.expect("RBRACE")
self.config["servers"][name] = server
def expect(self, token_type):
if self.tokens[self.pos][0] != token_type:
raise SyntaxError(f"Expected {token_type}, got {self.tokens[self.pos][1]}")
self.pos += 1
source = '''
server web-server {
host "192.168.1.10"
port 8080
ssl true
}
'''
lexer = ConfigDSLLexer(source)
parser = ConfigDSLParser(lexer.tokens)
config = parser.parse()
for name, settings in config["servers"].items():
print(f"Server: {name}")
for key, val in settings.items():
print(f" {key}: {val}")
Expected output:
Server: web-server
host: 192.168.1.10
port: 8080
ssl: True
DSL Code Generation
Many DSLs generate code in a general-purpose language:
class DSLCodeGenerator:
def generate(self, config):
code = []
code.append("# Generated configuration")
code.append("")
for name, settings in config["servers"].items():
code.append(f"class {name.capitalize()}Server:")
for key, val in settings.items():
code.append(f" {key} = {repr(val)}")
code.append("")
return "\n".join(code)
generator = DSLCodeGenerator()
print(generator.generate(config))
Expected output:
# Generated configuration
class Web-serverServer:
host = '192.168.1.10'
port = 8080
ssl = True
Common Errors in DSL Design
Error 1: Over-Abstracting
Making the DSL too abstract hides important details. The DSL should express domain concepts naturally without forcing users to work around the abstraction.
Error 2: Poor Error Messages
DSL users may not understand compiler terminology. Error messages should speak the domain language, not parser jargon.
Error 3: Host Language Leakage
Internal DSLs often leak host language syntax, confusing users. Design the API so domain code reads naturally without requiring knowledge of the host language's quirks.
Error 4: Not Having a Clear Scope
DSLs that try to do everything become general-purpose languages. Define the scope narrowly and resist feature creep.
Error 5: Neglecting Tooling
A DSL without syntax highlighting, autocomplete, and error checking is frustrating to use. Invest in editor support and validation tools.
Practice Questions
Question 1
What is the difference between an internal and external DSL?
Show answer
An internal DSL is embedded in a host language using its syntax features (method chaining, operator overloading). An external DSL has its own syntax and requires a custom parser and Interpreter.Question 2
What are the advantages of internal DSLs?
Show answer
Internal DSLs reuse the host language's tooling (editors, debuggers, package managers), have a lower learning curve, and are faster to implement. They automatically inherit the host language's ecosystem.Question 3
When should you build an external DSL instead of an internal one?
Show answer
Build an external DSL when the domain syntax differs significantly from the host language, when non-programmers need to write in the DSL, or when you need full control over syntax, error messages, and tooling.Question 4
How do you evaluate a DSL design?
Show answer
Evaluate DSLs on expressiveness (how concisely domain concepts are expressed), readability (how naturally domain experts can read the code), and tooling support (error messages, completion, validation).Question 5
What is a fluent interface?
Show answer
A fluent interface uses method chaining where each method returns the object itself (return self), enabling a natural reading flow. It is the most common technique for building internal DSLs in object-oriented languages.Challenge
Design and implement a small DSL for describing automated security scan rules. The DSL should support patterns for file signatures, network connections, and process behavior. Implement a parser, an Interpreter that validates rules, and a code generator that outputs JSON configuration for a security scanner.
FAQ
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro