How it works
Overview
kemlang-py is a tree-walking interpreter. This section explains exactly how it turns a .jsk source file into running output - from character scanning all the way to executing statements.
What is a programming language, really?
A programming language is a convention. The source file you write is just text - a sequence of Unicode characters sitting on disk. Nothing in the hardware understands bhai bol. The interpreter is the program that reads that text and figures out what to do with it.
Every interpreter or compiler does the same fundamental job: transform source text into behavior. The strategies differ enormously in complexity and performance, but the goal is always the same.
The spectrum of language implementations
Different languages take different approaches to turning source into execution. The main strategies are compiled native code, bytecode + virtual machine, and tree-walking interpretation.
Source text
│
▼
┌───────────────────────────────────────────────────────────────────┐
│ COMPILED (C, Rust, Go) │
│ │
│ Source ──▶ Compiler ──▶ Machine code (.exe / .o) ──▶ CPU runs │
│ │
│ + Fastest possible execution (direct CPU instructions) │
│ - Compilation takes time; separate step before running │
└───────────────────────────────────────────────────────────────────┘
┌───────────────────────────────────────────────────────────────────┐
│ BYTECODE VM (Python, Java, Lua) │
│ │
│ Source ──▶ Compiler ──▶ Bytecode ──▶ Virtual Machine ──▶ output │
│ (.pyc) (CPython, JVM) │
│ │
│ + Faster than tree-walking; portable across platforms │
│ - VM adds complexity; bytecode is an intermediate layer │
└───────────────────────────────────────────────────────────────────┘
┌───────────────────────────────────────────────────────────────────┐
│ TREE-WALKING (kemlang-py, early Ruby, many scripting languages) │
│ │
│ Source ──▶ Lexer ──▶ Parser ──▶ AST ──▶ walk & execute │
│ │
│ + Simplest implementation; easy to debug and extend │
│ - Slowest; each node is re-evaluated on every visit │
└───────────────────────────────────────────────────────────────────┘kemlang-py is a tree-walking interpreter. It never produces compiled output - it reads your source file and executes it directly by walking the parsed tree. This makes the implementation small (~1000 lines across three core files), readable, and easy to modify.
The pipeline
Every time you run kem run hello.jsk, the source file travels through three sequential stages. Each stage receives the output of the previous one. Nothing is shared between stages except that single transformed value.
┌──────────────────────────────────────────────────────────────────┐
│ Source file (hello.jsk) │
│ │
│ kem bhai │
│ bhai bol "kem cho, duniya!" │
│ aavjo bhai │
└────────────────────────────┬─────────────────────────────────────┘
│ raw UTF-8 text
▼
┌──────────────────────────────────────────────────────────────────┐
│ Stage 1: Lexer (kemlang/lexer.py) │
│ │
│ Scans characters left-to-right. Groups them into tokens. │
│ Handles multi-word Gujarati keywords. Skips whitespace. │
└────────────────────────────┬─────────────────────────────────────┘
│ List[Token]
│
│ KEM_BHAI 'kem bhai' 1:0
│ BHAI_BOL 'bhai bol' 2:2
│ STRING '"kem cho, duniya!"' 2:10
│ AAVJO_BHAI 'aavjo bhai' 3:0
│ EOF '' 4:0
▼
┌──────────────────────────────────────────────────────────────────┐
│ Stage 2: Parser (kemlang/parser.py) │
│ │
│ Consumes tokens one at a time. Checks grammar rules. │
│ Builds a tree of dataclass nodes (the AST). │
└────────────────────────────┬─────────────────────────────────────┘
│ Program (AST)
│
│ Program
│ └── Print
│ └── Literal("kem cho, duniya!")
▼
┌──────────────────────────────────────────────────────────────────┐
│ Stage 3: Interpreter (kemlang/interpreter.py) │
│ │
│ Walks the AST recursively. Executes each node. Manages │
│ variable scope via Environment. Handles I/O and errors. │
└────────────────────────────┬─────────────────────────────────────┘
│
▼
stdout: kem cho, duniya!
exit code: 0Stage 1: Lexer
The lexer (also called a scanner or tokenizer) reads the source string one character at a time and groups characters into tokens. A token is the smallest meaningful unit of the language - a keyword, a number, a string literal, an operator, or an identifier.
kemlang-py's lexer handles something unusual: multi-word keywords. Most languages use single reserved words (if, while,print). kemlang-py uses natural Gujarati phrases like bhai bol (print) and aavjo bhai (end of program). The lexer checks for these multi-word sequences before checking single-word keywords.
Input: raw source text as a Python str. Output: List[Token], each token carrying its type, lexeme (the original text), line number, and column offset.
Stage 2: Parser
The parser takes the flat token stream and builds a tree structure called an Abstract Syntax Tree (AST). The tree represents the grammatical structure of your program - nesting, operator precedence, and the parent-child relationships between statements and expressions.
kemlang-py uses a hand-written recursive-descent parser. Each grammar rule has a corresponding method (statement(), if_statement(),expression(), etc.). These methods call each other recursively, naturally mirroring the nested structure of the grammar.
Input: List[Token] (with NEWLINE tokens stripped). Output: a Program dataclass containing a list of Stmt nodes, each of which may contain Expr nodes.
Stage 3: Interpreter
The interpreter walks the AST recursively. For each node it visits, it calls the appropriate handler. Statement nodes (Print, Declaration, If, While) produce side effects - they print to stdout, define variables, branch, or loop. Expression nodes (Binary, Variable, Literal) return a KemValue - one of Python's five built-in types.
Variable scope is managed through a chain of Environmentobjects. Each block (if body, while body) gets its own environment that holds a reference to its parent. When a variable lookup fails in the current environment, it walks up the chain.
Input: Program (root AST node). Output: stdout text + an integer exit code (0 = success, 1 = error).
What the CLI actually does
The kem run command in kemlang/cli.pycalls these three stages in sequence. The entire pipeline is five lines:
source = Path(file).read_text(encoding="utf-8") tokens = Lexer(source).tokenize() # str -> List[Token] ast = Parser(tokens).parse() # tokens -> Program exit_code = Interpreter().interpret(ast) # AST -> stdout + int raise typer.Exit(exit_code)
Explore each stage in detail
The Lexer
How characters become tokens. Multi-word keywords, the scanning loop, what gets rejected and why.
The Parser
How tokens become an AST. Context-free grammars, recursive descent, operator precedence, and the full BNF grammar.
The Interpreter
How the AST gets executed. Tree-walking, environment scopes, control flow via exceptions, and I/O.
Runtime and Types
The five runtime types, truthiness, type coercion, the full execution lifecycle, and error propagation.