From Specifications to Formal Models: Autoformalizing Memory Chips with DRAMBench
Manual translation of chip standards into verification artifacts is a major bottleneck in chip design. DRAMBench, developed by Normal Computing and Fraunhofer IESE, benchmarks AI systems that turn natural language DRAM specifications into timed Petri net models. This post explains how DRAMBench measures performance of AI-driven verification workflows for modern memory chips.
Chip design verification is one of the most critical and labor-intensive steps in the semiconductor industry, estimated to take up to 70% of the verification effort. Before a chip can be manufactured, engineers must ensure that the design implementation exactly matches its specification and is free of functional errors, because a bug found after fabrication can mean months of delay and millions for costly respins. Today, this verification process demands extensive manual effort: engineers read through hundreds of pages of natural language specifications and painstakingly translate them into testable representations (both formal and random), a process that is slow, error-prone, and does not scale well as specifications grow in complexity.
Auto-formalization is the automated translation of natural language into machine-checkable intermediate representation. The idea has gained significant momentum from recent advances in large language models. Systems like DeepMind's AlphaProof have demonstrated that LLMs can show advanced mathematical reasoning capabilities when presented with natural language problems. At Normal Computing, in collaboration with Fraunhofer IESE we are exploring how AI auto-formalization can accelerate hardware engineering methodology. As part of this effort, we are releasing DRAMBench, an open benchmark for AI systems, designed to measure how well AI systems can auto-formalize memory chip specifications.
The Problem: Specifications Are Written for Humans, Not Machines
Industry standards bodies like JEDEC publish detailed specifications for memory chips such as DDR, LPDDR, GDDR, and HBM. These documents define how such chips work: which commands a memory chip must support, the legal orderings of those commands, and the precise timing constraints between them. A single DRAM standard can contain hundreds of timing parameters and complex state-dependent rules governing how commands interact across banks, ranks, and channels.
Today, translating these specifications into something a verification tool can consume is almost entirely manual. Verification engineers read the spec, interpret its intent, and encode it into SystemVerilog assertions, stimulus sequences, and other artifacts. This translation step is where spec misinterpretations enter the design and where schedules tend to slip. While AI has made progress in isolated design verification tasks, current approaches typically focus on narrow problems rather than full end-to-end specification compliance of modern chip designs, failing to capture the complexity of real-world verification workflows.
Our Approach: Timed Petri Nets as a Formal Target
Rather than jumping directly from natural language to low-level verification artifacts, we introduce an intermediate formal representation based on timed Petri nets. Petri nets are a well-established mathematical formalism for modeling concurrent systems. They capture states, transitions, and the constraints governing when transitions may fire. By extending them with timing, we can faithfully represent the temporal requirements that are central to DRAM protocols.
Our framework, DRAMPyML, models each DRAM standard as a timed Petri net where:
- Places represent device states (idle, active, precharging, refreshing)
- Transitions represent commands (ACT, RD, WR, PRE, REF, and many more)
- Arcs encode the structural and timing relationships between them - including inhibitor arcs for mutual exclusion, reset arcs for state resets, and timed arcs that enforce minimum delays like tRCD, tRAS, and tRP.
The result is a compact, executable, and mathematically precise model of a DRAM protocol. From this single representation, we can derive SystemVerilog assertions, generate valid command sequences for stimulus and perform formal analysis. The Petri net serves as a single source of truth from which downstream verification artifacts flow.
DRAMBench: A Benchmark for Hardware Auto-formalization
To evaluate how well AI models can perform this translation, from natural language specification to formal Petri net, we are releasing DRAMBench. It includes ground truth Petri net models for a broad range of JEDEC DRAM standards: DDR2 through DDR4, LPDDR2 through LPDDR4, GDDR5 and GDDR6, and HBM2.
DRAMBench is designed to be a living benchmark. As new DRAM generations emerge, previously withheld standards are released into the benchmark, and newer ones take their place. This keeps DRAMBench rigorous over time by preventing training data contamination while ensuring the dataset grows with the industry.
To evaluate how closely an AI-generated model matches the ground truth, the benchmark compares legal command sequences. Given two Petri nets, we enumerate their legal k-step command sequences and compute the Jaccard index over the overlap. This gives a single, interpretable similarity score that captures structural and behavioral correctness. Timing correctness is evaluated separately through a dedicated timing constraint recall metric, measuring what fraction of the specification's timing requirements the generated model recovers.
A Call to Two Communities
DRAMBench sits at the intersection of two fields that do not talk to each other enough: machine learning and hardware engineering.
To the ML community: Hardware specification documents are a rich, under-explored domain for auto-formalization research. Unlike mathematical theorem proving, where significant benchmark infrastructure already exists, hardware specs present unique challenges - ambiguous natural language, implicit domain knowledge, complex timing semantics, and hierarchical structure. DRAMBench provides a concrete, well-defined task with quantitative evaluation metrics. We hope it serves as a stepping stone toward broader hardware auto-formalization benchmarks. We would welcome an alternative auto-formalization approach that does not rely on Petri Nets as intermediate artifacts.
To the hardware community: The cost of manual specification interpretation grows with each generation. The jump from DDR4 to DDR5 alone nearly quadrupled the page count of the JEDEC spec. Each new DRAM generation adds commands, timing parameters, and architectural features. Formal models like timed Petri nets offer a more compact and interpretable way to capture protocol behavior than verbose verification collateral. And if AI can help generate these models, the verification bottleneck loosens considerably.
Beyond Memory: A Vision for All Chip Specifications
While we demonstrate our approach on DRAM protocols, the underlying idea is not limited to memory. Any hardware specification that defines states, commands, and timing constraints is a candidate for the same treatment. Communication protocols, bus interfaces, and controllers all fit this pattern. Timed Petri nets are general enough to capture a wide range of protocol behaviors, and the auto-formalization pipeline we describe is architecture-agnostic.
Our hope is that this work contributes to a future where chip releases are accompanied not just by PDF datasheets, but by formal specifications: compact, executable, and machine-readable representations that verification teams can use directly. A formal model shipping alongside a specification would dramatically reduce the time from spec release to verified implementation. It would make verification more accessible to smaller teams and create a foundation for AI-assisted verification tools that can reason about specs end-to-end.
Get Involved
DRAMBench and DRAMPyML are open source under the Apache 2.0 license. The framework is Python-based, built on rustworkx for high-performance graph operations, and designed to be easy to extend with new standards, new formal targets or new evaluation metrics.
We welcome contributions from both communities, whether that means adding new DRAM standard models, building auto-formalization agents that target DRAMBench, extending the framework to non-memory protocols, or developing new evaluation metrics for formal model comparison.
The gap between natural language specifications and formal verification is one of the biggest bottlenecks in chip design. We think closing it is a problem worth solving together.
DRAMPyML and DRAMBench are developed by Normal Computing and Fraunhofer IESE. For more details, see our papers: "DRAMPyML: A Formal Description of DRAM Protocols with Timed Petri Nets" (DV-Con Europe 2026), "A Formal Description of Communication Protocols Using Petri-Nets" (MBMV 2026), and "Autoformalizing Memory Device Specifications using Agents" (ICLR VerifAI Workshop 2026).