Cloud
Most teams can't script "inject a gyro fault at t=1000ms, press reset at t=3000ms" — a test like that needs a person at a bench. Here it's four lines of TypeScript: build the board in code, flash real firmware, schedule faults in simulation time, and assert on what the firmware did. On every commit, in CI.
const mcu = graph.addComponent(Components.ADAFRUIT_STM32F405_EXPRESS) const imu = graph.addComponent(Components.MPU6050) mcu.setFlash("./firmware.elf") graph.connect(imu.pins.sda, mcu.pins.sda) graph.connect(imu.pins.scl, mcu.pins.scl) const run = await graph.run({ duration: 5000 }) await run.at(1000).do(() => imu.setYGyro(5)) await run.at(3000).do(() => mcu.pressReset()) const logs = await run.logs()
Compose MCUs, sensors, and displays into a graph and connect bus-level signals — pin to pin, not string to string. The board definition lives in your repo, versioned with the firmware it tests.
Subscribe to live streams — sensor outputs, CPU registers — or record USART, I2C, and RTT during the run and query the logs after. Runs are persisted, so postmortems don't require reproducing anything.
Bounded runs in simulation time, scheduled fault injection, and queryable results — hardware regression tests that run on every commit, with no hardware in the loop.
A run with a duration is a test. Take the duration away and the same graph becomes a digital twin: a simulated copy of your product that accumulates uptime for weeks, queryable through the SDK the whole time.
Physical unit
stalledUnit in the field. The fan stopped, the loop is crawling — but on its own it just looks quiet.
Digital twin
nominalSame firmware, same inputs, simulated. This is what the unit should be doing right now.
| Signal | Physical | Twin | Offset |
|---|---|---|---|
| fan_rpm | 0 | 1,450 | Δ −1,450 ▲ |
| loop_time | 212 ms | 12 ms | Δ +200 ms ▲ |
| heap_free | 9.4 KB | 31.6 KB | Δ −22.2 KB ▲ |
| uptime | 31d 04:12 | 31d 04:12 | Δ 0 |
Run a twin next to the real thing. When the two drift apart, the offsets tell you what failed — before the support ticket does.
Memory leaks, heap fragmentation, counter wraps, watchdog timeouts — bugs that need days of continuous runtime to show up, long after the bench test passed. A twin never resets: let it accumulate runtime, snapshot at any moment, and query the heap and stack history the instant something drifts. It won't model temperature, analog effects, or flash wear — it will show you the heap that fragments four bytes per transaction.
Push a firmware update to a fleet of twins before a single production device — each with its own hardware config, firmware version, and simulated sensor environment. Watch which configurations apply cleanly and which fall back, all from the same SDK calls your tests already use. It exercises your update logic, not your radio.
None of the above needs AI. But if your team uses coding agents, the SDK is what makes them useful on embedded projects: agents are only as good as their feedback loop, and on embedded the loop usually ends at "flash it and see." The SDK closes it — every simulation result is queryable text, which is exactly what an agent can reason about.
Agent + lab bench
Agent + Simulator86
Point Claude Code at a repo with the SDK installed and it can do what no agent can do against a physical bench: compile your firmware, run it on a simulated board, read the logs, registers, and bus traffic, and iterate until the test passes — overnight, in parallel, without a board on anyone's desk. Your engineers review passing runs instead of babysitting benches.
Your team is right to be skeptical of AI-written firmware — in embedded, a hallucination isn't a bad merge, it's a field failure. That's the point of the SDK: nothing the agent writes has to be believed. Every change must survive a simulated run before a human sees it, and what reaches review is code plus evidence — the logs, register transitions, and bus traffic to prove it behaves.
Imagination is more important than knowledge.
· A. Einstein ·