Binary Layout and Linking
This chapter describes how solx models deploy/runtime bytecode objects, dependency data, and post-compilation linking.
Contract Object Model
EVM contracts have two code segments:
- Deploy code (init code): runs only during contract creation.
- Runtime code: returned by deploy code and stored as the contract's permanent code.
Deploy code typically builds runtime bytes in memory and executes RETURN(offset, size).
solc JSON Assembly Layout
In legacy assembly JSON, the object is split into top-level deploy code and nested runtime code:
- Top-level
.code: deploy instruction stream. .data["0"]: runtime object..data[<hex>]: additional referenced data objects (for example constructor-time dependencies).
Conceptually:
{
".code": [ /* deploy instructions */ ],
".data": {
"0": { /* runtime assembly object */ },
"ab12...": { /* dependency object or hash */ }
}
}
The EVM assembly layer exposes this as Assembly { code, data }, with runtime_code() reading data["0"].
Dependencies and CREATE / CREATE2
Factory-style deploy code can reference other contract objects. In assembly, this is represented via data entries and push-style aliases:
PUSH [$](PUSH_DataOffset) for object offsetPUSH #[$](PUSH_DataSize) for object sizePUSH data(PUSH_Data) for raw dependency chunks
These operands are resolved during assembly preprocessing before LLVM lowering.
Deploy Stub Shape
The minimal deploy stub pattern is:
- Load runtime size (
datasize). - Load runtime offset (
dataoffset). - Copy bytes from code section to memory.
- Return copied bytes.
The EVM codegen emits this canonical form in minimal_deploy_code() using:
llvm.evm.datasize(metadata !"...")llvm.evm.dataoffset(metadata !"...")llvm.memcpyfromaddrspace(4)(code) toaddrspace(1)(heap)llvm.evm.return
datasize / dataoffset Builtins
Yul builtins datasize(<object>) and dataoffset(<object>) lower to EVM intrinsics with metadata object names.
In solx, these are translated to LLVM intrinsics:
llvm.evm.datasizellvm.evm.dataoffset
This is how deploy stubs reference embedded runtime/dependency objects without hardcoding absolute byte offsets.
Metadata Hash and CBOR Tail
Runtime bytecode may include CBOR metadata appended at the end.
- The payload can include compiler version info and optional metadata hash fields.
- Hash behavior is configurable with
--metadata-hash(for exampleipfs). - CBOR appending can be disabled with
--no-cbor-metadata.
In the build pipeline, metadata bytes are appended to runtime objects before final assembly/linking.
Library Linking
Library references are resolved at link time:
- The linker patches linker symbols with final addresses.
- If a symbol is unresolved, solx records its offsets and emits placeholders in hex output.
- Placeholder format follows the common pattern
__$<keccak-256-digest>$__.
Standard JSON output reports unresolved positions through evm.*.linkReferences so external tooling can link later.
Dependency Resolution and Path Aliasing
The assembly preprocessor performs a normalization pass over all contracts before lowering:
- Hash deploy and runtime sub-objects.
- Build
hash -> full contract pathmapping. - Rewrite
.dataentries from embedded objects to stable path references (Data::Path). - Build index mappings for deploy and runtime dependency tables.
- Replace instruction aliases (
PUSH_DataOffset,PUSH_DataSize,PUSH_Data) with resolved identifiers.
Two details are important:
- Entry
"0"is always treated as runtime code and mapped to<contract>.runtime. - Hex indices are normalized to 32-byte (64 hex char) aliases before lookup, so short keys and padded keys resolve consistently.
This path aliasing step gives deterministic dependency identifiers for later object assembly and linking.