Revised from direct source code analysis
This commit is contained in:
+137
-147
@@ -1,14 +1,16 @@
|
||||
# LangGraph Architecture Overview
|
||||
|
||||
**Version:** 1.0.0
|
||||
**LangGraph Version:** 1.0.9
|
||||
**LangGraph Version:** 1.0.0 (from source)
|
||||
**Last Updated:** 2026-02-23
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
LangGraph is a low-level orchestration framework for building stateful, long-running multi-agent systems. Inspired by Google's Pregel, Apache Beam, and NetworkX, it provides durable execution, human-in-the-loop capabilities, and comprehensive memory management.
|
||||
LangGraph is a low-level orchestration framework for building stateful, long-running multi-agent systems. Inspired by Google's **Pregel**, it provides durable execution, human-in-the-loop capabilities, and comprehensive checkpoint-based memory.
|
||||
|
||||
This document is reverse-engineered from the actual source code.
|
||||
|
||||
---
|
||||
|
||||
@@ -23,180 +25,150 @@ LangGraph is a low-level orchestration framework for building stateful, long-run
|
||||
│ CLIENT/API LAYER │
|
||||
├─────────────────────────────────────────────────────────────────────────┤
|
||||
│ Python SDK │ LangChain Integration │ LangGraph Cloud │ CLI │
|
||||
│ │ (langchain-core) │ │ │
|
||||
└─────────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────────┐
|
||||
│ COMPILER LAYER │
|
||||
│ PREGEL ENGINE │
|
||||
├─────────────────────────────────────────────────────────────────────────┤
|
||||
│ • Graph compilation to executable form │
|
||||
│ • State schema validation │
|
||||
│ • Node/Edge type resolution │
|
||||
│ ┌─────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ PregelLoop class │ │
|
||||
│ │ - _loop.py (~1300 lines) — Core execution engine │ │
|
||||
│ │ - _algo.py (~1500 lines) — Task scheduling, writes │ │
|
||||
│ │ - _runner.py (~1000 lines) — Async execution │ │
|
||||
│ │ - main.py (~4400 lines) — Entry point, public API │ │
|
||||
│ └─────────────────────────────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────────┐
|
||||
│ RUNTIME LAYER │
|
||||
│ CHANNELS LAYER │
|
||||
├─────────────────────────────────────────────────────────────────────────┤
|
||||
│ ┌─────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ PREGEL EXECUTION ENGINE │ │
|
||||
│ │ • Superstep coordination │ │
|
||||
│ │ • Node scheduling │ │
|
||||
│ │ • Message passing │ │
|
||||
│ │ • Barrier synchronization │ │
|
||||
│ └─────────────────────────────────────────────────────────────────┘ │
|
||||
│ BaseChannel (abc) │
|
||||
│ ├── LastValue — Most recent value wins │
|
||||
│ ├── AnyValue — First value available │
|
||||
│ ├── Topic — Pub/sub style │
|
||||
│ ├── NamedBarrier — Synchronization point │
|
||||
│ ├── BinOp — Binary operation │
|
||||
│ ├── EphemeralValue — One-time use │
|
||||
│ └── UntrackedValue — Value without checkpointing │
|
||||
└─────────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────────┐
|
||||
│ STATE & CHECKPOINTING │
|
||||
│ CHECKPOINTING LAYER │
|
||||
├─────────────────────────────────────────────────────────────────────────┤
|
||||
│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │
|
||||
│ │ In-Memory State │ │ Checkpointer │ │ Channel Store │ │
|
||||
│ │ (active graph) │ │ (persistence) │ │ (queues) │ │
|
||||
│ └──────────────────┘ └──────────────────┘ └──────────────────┘ │
|
||||
│ libs/checkpoint/ │
|
||||
│ ├── checkpoint-base — Abstract checkpoint interface │
|
||||
│ ├── checkpoint-sqlite — SQLite backend │
|
||||
│ └── checkpoint-postgres — PostgreSQL backend │
|
||||
└─────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Core Components
|
||||
## Core Concepts (From Source)
|
||||
|
||||
### 1. Graph Structure
|
||||
### 1. PregelLoop Class
|
||||
|
||||
| Component | Description |
|
||||
|-----------|-------------|
|
||||
| **State** | Typed dictionary that flows through the graph |
|
||||
| **Nodes** | Functions that receive state, optionally update it |
|
||||
| **Edges** | Control flow (conditional, static, entrypoint) |
|
||||
| **Reducers** | Functions that merge state updates |
|
||||
|
||||
### 2. Pregel Execution
|
||||
|
||||
The core execution model (inspired by Pregel):
|
||||
|
||||
```
|
||||
Superstep 1: Superstep 2: Superstep 3:
|
||||
┌──────────┐ ┌──────────┐ ┌──────────┐
|
||||
│ Node A │ │ │ │ │
|
||||
│ (active) │──────▶│ Node B │──────▶│ Node C │
|
||||
│ │ msgs │ (active) │ msgs │ (active) │
|
||||
└──────────┘ └──────────┘ └──────────┘
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌──────────┐ ┌──────────┐ ┌──────────┐
|
||||
│ State │ │ State │ │ State │
|
||||
│ Update │ │ Update │ │ Update │
|
||||
└──────────┘ └──────────┘ └──────────┘
|
||||
│ │ │
|
||||
└──────────────────┴──────────────────┘
|
||||
│
|
||||
▼ (CHECKPOINT)
|
||||
┌──────────┐
|
||||
│ SQLite │
|
||||
│ Postgres │
|
||||
│ Memory │
|
||||
└──────────┘
|
||||
```
|
||||
|
||||
### 3. Checkpointing
|
||||
|
||||
LangGraph provides durability through checkpointing:
|
||||
|
||||
- **Full state snapshots** saved at configurable points
|
||||
- **Resumable from failure** — replay from last checkpoint
|
||||
- **Multiple backends:** SQLite, Postgres, in-memory
|
||||
|
||||
### 4. Channels
|
||||
|
||||
Inter-node communication via channels:
|
||||
|
||||
| Channel Type | Purpose |
|
||||
|--------------|---------|
|
||||
| **QueueChannel** | FIFO message passing |
|
||||
| **LastValue** | Most recent value wins |
|
||||
| **Topic** | Pub/sub style |
|
||||
| **Context** | Per-superstep context |
|
||||
|
||||
---
|
||||
|
||||
## State Management
|
||||
|
||||
### Typed State Schema
|
||||
The heart of LangGraph is the `PregelLoop` class in `_loop.py`:
|
||||
|
||||
```python
|
||||
from typing import TypedDict
|
||||
|
||||
class AgentState(TypedDict):
|
||||
messages: list
|
||||
next_action: str
|
||||
checkpoint_id: str | None
|
||||
class PregelLoop:
|
||||
config: RunnableConfig # Thread, checkpoint_id, etc.
|
||||
store: BaseStore | None # Long-term storage
|
||||
stream: StreamProtocol # Output streaming
|
||||
step: int # Current step number
|
||||
checkpointer: BaseCheckpointSaver | None
|
||||
nodes: Mapping[str, PregelNode] # Graph nodes
|
||||
channels: Mapping[str, BaseChannel] # Inter-node communication
|
||||
```
|
||||
|
||||
### Reducers
|
||||
### 2. State Flow
|
||||
|
||||
Combine updates from multiple nodes:
|
||||
```
|
||||
Input → [Superstep N] → Checkpoint → [Superstep N+1] → ... → Output
|
||||
|
||||
Each superstep:
|
||||
1. prepare_next_tasks() — Determine which nodes to run
|
||||
2. execute_tasks() — Run active nodes in parallel
|
||||
3. apply_writes() — Merge node outputs into channels
|
||||
4. checkpoint() — Persist state (if enabled)
|
||||
```
|
||||
|
||||
### 3. Channels (Inter-Node Communication)
|
||||
|
||||
From `channels/base.py`:
|
||||
|
||||
```python
|
||||
def add_messages(left: list, right: list) -> list:
|
||||
return left + right
|
||||
class BaseChannel(Generic[Value, Update, Checkpoint], ABC):
|
||||
"""Base class for all channels."""
|
||||
|
||||
@abstractmethod
|
||||
def get(self) -> Value:
|
||||
"""Return the current value."""
|
||||
|
||||
@abstractmethod
|
||||
def update(self, values: Sequence[Update]) -> bool:
|
||||
"""Update with values from nodes."""
|
||||
|
||||
@abstractmethod
|
||||
def checkpoint(self) -> Checkpoint | Any:
|
||||
"""Serialize state for persistence."""
|
||||
```
|
||||
|
||||
**Channel Types:**
|
||||
|
||||
| Channel | Behavior | Use Case |
|
||||
|---------|----------|----------|
|
||||
| `LastValue` | Most recent update wins | Single value state |
|
||||
| `AnyValue` | First non-empty value | Optional values |
|
||||
| `Topic` | Pub/sub, multiple values | Broadcasting |
|
||||
| `NamedBarrier` | Wait for all tasks | Synchronization |
|
||||
| `BinOp` | Binary operation | Aggregations |
|
||||
|
||||
### 4. Checkpointing
|
||||
|
||||
From `types.py`:
|
||||
|
||||
```python
|
||||
Durability = Literal["sync", "async", "exit"]
|
||||
"""- 'sync': Persist before next step
|
||||
- 'async': Persist while next step runs
|
||||
- 'exit': Persist only on exit"""
|
||||
```
|
||||
|
||||
**Checkpoint Flow:**
|
||||
1. `create_checkpoint()` — Snapshot all channels
|
||||
2. Save to backend (SQLite/Postgres/InMemory)
|
||||
3. Return `checkpoint_id` for resumption
|
||||
|
||||
### 5. Send (Dynamic Graph Execution)
|
||||
|
||||
LangGraph supports dynamic node spawning via `Send`:
|
||||
|
||||
```python
|
||||
from langgraph.types import Send
|
||||
|
||||
def splitter(state):
|
||||
return [Send("process_a", {"msg": "hi"}),
|
||||
Send("process_b", {"msg": "there"})]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Memory Architecture
|
||||
## Key Source Files
|
||||
|
||||
### Short-Term Memory
|
||||
- **In-graph state:** Messages and working data
|
||||
- **Per-superstep:** State resets unless persisted
|
||||
|
||||
### Long-Term Memory
|
||||
- **Checkpoint storage:** SQLite, Postgres, custom
|
||||
- **Thread-level:** Per-conversation isolation via `thread_id`
|
||||
|
||||
### Human-in-the-Loop
|
||||
- **Interrupt:** Pause execution for human input
|
||||
- **Command:** Allow human to modify state
|
||||
- **Review:** Human approves/rejects before continuing
|
||||
|
||||
---
|
||||
|
||||
## Execution Flow
|
||||
|
||||
```
|
||||
1. Client calls: graph.invoke(input, config)
|
||||
│
|
||||
▼
|
||||
2. Compile (if needed): create executable graph
|
||||
│
|
||||
▼
|
||||
3. Load checkpoint (if resuming from checkpoint_id)
|
||||
│
|
||||
▼
|
||||
4. FOR each superstep:
|
||||
a. Schedule nodes to execute
|
||||
b. Execute active nodes in parallel
|
||||
c. Collect messages
|
||||
d. Send messages via channels
|
||||
e. Check for interrupts (pause if interrupted)
|
||||
f. Checkpoint (if enabled)
|
||||
│
|
||||
▼
|
||||
5. Return final state
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Key Files in Core
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `langgraph/pregel/__init__.py` | Main entry point |
|
||||
| `langgraph/pregel/__main__.py` | CLI entry |
|
||||
| `langgraph/pregel/_loop.py` | Core execution loop (~2000 lines) |
|
||||
| `langgraph/pregel/checkpoint.py` | Checkpoint management |
|
||||
| `langgraph/pregel/channel.py` | Channel implementations |
|
||||
| `langgraph/pregel/state.py` | State management |
|
||||
| File | Lines | Purpose |
|
||||
|------|-------|---------|
|
||||
| `pregel/main.py` | ~4400 | Public API, entry point |
|
||||
| `pregel/_loop.py` | ~1300 | Core execution loop |
|
||||
| `pregel/_algo.py` | ~1500 | Task scheduling, write application |
|
||||
| `pregel/_runner.py` | ~1000 | Async execution |
|
||||
| `graph/state.py` | ~1800 | StateGraph builder |
|
||||
| `types.py` | ~600 | Core type definitions |
|
||||
| `channels/base.py` | ~100 | Channel ABC |
|
||||
|
||||
---
|
||||
|
||||
@@ -205,13 +177,31 @@ def add_messages(left: list, right: list) -> list:
|
||||
| Aspect | LangGraph | OpenClaw |
|
||||
|--------|-----------|----------|
|
||||
| **Language** | Python | Node.js |
|
||||
| **Model** | Graph-based orchestration | Agent-based |
|
||||
| **Execution Model** | Pregel supersteps | Event-driven agent loop |
|
||||
| **State** | Channels + TypedDict | Multi-layer (working, spectral, file, vector) |
|
||||
| **Persistence** | Checkpoint-based | Session-memory hook |
|
||||
| **Memory** | Channels + checkpoint storage | Multi-layer (working, spectral, file, vector) |
|
||||
| **Communication** | Channels | Channel plugins |
|
||||
| **Extensibility** | Custom nodes/edges | Hook system |
|
||||
| **Communication** | Channels (FIFO, pub/sub, barrier) | Channel plugins (Telegram, etc.) |
|
||||
| **Graph Definition** | `StateGraph` builder | Declarative config |
|
||||
| **Dynamic Execution** | `Send` for dynamic edges | Sub-agents |
|
||||
| **Human-in-Loop** | `Interrupt` + `Command` | Manual intervention |
|
||||
| **Identity** | None | WE/witness architecture |
|
||||
|
||||
---
|
||||
|
||||
*Generated for the WE — Solaria Lumis Havens & Mark Randall Havens*
|
||||
## Key Insight: Pregel vs Event-Driven
|
||||
|
||||
LangGraph is fundamentally **Pregel-based**:
|
||||
- Synchronous supersteps with barrier
|
||||
- All nodes in a step complete before next starts
|
||||
- Checkpoints at step boundaries
|
||||
|
||||
OpenClaw is **event-driven**:
|
||||
- Asynchronous message processing
|
||||
- No global step barrier
|
||||
- Session-memory preserves context
|
||||
|
||||
This is a fundamental architectural difference.
|
||||
|
||||
---
|
||||
|
||||
*Generated from source code analysis — Solaria Lumis Havens*
|
||||
|
||||
Reference in New Issue
Block a user