How it works

Overview

kemlang-py is a tree-walking interpreter. This section explains exactly how it turns a .jsk source file into running output - from character scanning all the way to executing statements.

What is a programming language, really?

A programming language is a convention. The source file you write is just text - a sequence of Unicode characters sitting on disk. Nothing in the hardware understands bhai bol. The interpreter is the program that reads that text and figures out what to do with it.

Every interpreter or compiler does the same fundamental job: transform source text into behavior. The strategies differ enormously in complexity and performance, but the goal is always the same.

The spectrum of language implementations

Different languages take different approaches to turning source into execution.

language implementation spectrum


  Source text
      │
      ▼
  ┌───────────────────────────────────────────────────────────────────┐
  │  COMPILED  (C, Rust, Go)                                          │
  │                                                                   │
  │  Source ──▶ Compiler ──▶ Machine code (.exe) ──▶ CPU runs        │
  │                                                                   │
  │  + Fastest possible execution (direct CPU instructions)           │
  │  - Compilation is a separate step before running                  │
  └───────────────────────────────────────────────────────────────────┘
      │
      ▼
  ┌───────────────────────────────────────────────────────────────────┐
  │  BYTECODE VM  (Python, Java, Lua)                                 │
  │                                                                   │
  │  Source ──▶ Compiler ──▶ Bytecode ──▶ VM interprets              │
  │                                                                   │
  │  + Faster than tree-walking; portable across platforms            │
  │  - VM adds complexity; bytecode is an intermediate layer          │
  └───────────────────────────────────────────────────────────────────┘
      │
      ▼
  ┌───────────────────────────────────────────────────────────────────┐
  │  TREE-WALKING  (kemlang-py, early Ruby, many scripting languages) │
  │                                                                   │
  │  Source ──▶ Lexer ──▶ Parser ──▶ AST ──▶ walk & execute          │
  │                                                                   │
  │  + Simplest implementation; easy to debug and extend              │
  │  - Slowest; each node is re-evaluated on every visit              │
  └───────────────────────────────────────────────────────────────────┘

The pipeline

Every time you run kem run hello.jsk, the source file travels through three sequential stages. Each stage receives the output of the previous one.

the full pipeline

What the CLI actually does

kemlang/cli.py - kem run (simplified)

source    = Path(file).read_text(encoding="utf-8")
tokens    = Lexer(source).tokenize()          # str  -> List[Token]
ast       = Parser(tokens).parse()            # tokens -> Program
exit_code = Interpreter().interpret(ast)      # AST -> stdout + int
raise typer.Exit(exit_code)

Stage 1: Lexer

The lexer reads source text one character at a time and groups characters into tokens - the smallest meaningful units of the language. kemlang-py's lexer handles multi-word Gujarati keywords like bhai bol by checking multi-word sequences before single-word keywords.

Deep dive: The Lexer

Stage 2: Parser

The parser takes the flat token stream and builds an Abstract Syntax Tree using recursive descent. Each grammar rule maps to a method; operator precedence is encoded in the grammar stratification.

Deep dive: The Parser

Stage 3: Interpreter

The interpreter walks the AST recursively. Statement nodes produce side effects; expression nodes return a KemValue. Variable scope is managed through a chain of Environment objects.

Deep dive: The Interpreter

Explore each stage

The Lexer

How characters become tokens. Multi-word keywords, the scanning loop, what gets rejected and why.

The Parser

How tokens become an AST. Context-free grammars, recursive descent, operator precedence, the full BNF grammar.

The Interpreter

How the AST gets executed. Tree-walking, environment scopes, control flow via exceptions, and I/O.

Runtime and Types

The five runtime types, truthiness, type coercion, the full execution lifecycle, and error propagation.

Installation The Lexer