wasm linker: aggressive rewrite towards Data-Oriented Design #22220

andrewrk · 2024-12-13T02:20:10Z

The goals of this branch are to:

compile faster when using the wasm linker and backend
enable saving compiler state by directly copying in-memory linker state to disk.
more efficient compiler memory utilization
introduce integer type safety to wasm linker code
generate better WebAssembly code
fully participate in incremental compilation
do as much work as possible outside of flush(), while continuing to do linker garbage collection.
avoid unnecessary heap allocations
avoid unnecessary indirect function calls

In order to accomplish these goals, this removes the ZigObject abstraction, as well as Symbol and Atom. These abstractions resulted in overly generic code, doing unnecessary work, and needless complications that simply go away by creating a better in-memory data model and emitting more things lazily.

For example, this makes wasm codegen emit MIR which is then lowered to wasm code during linking, with optimal function indexes etc, or relocations are emitted if outputting an object. Previously, this would always emit relocations, which are fully unnecessary when emitting an executable, and required all function calls to use the maximum size LEB encoding.

This branch introduces the concept of the "prelink" phase which occurs after all object files have been parsed, but before any Zcu updates are sent to the linker. This allows the linker to fully parse all objects into a compact memory model, which is guaranteed to be complete when Zcu code is generated.

Merge Checklist

data_segments state needs to be reset on update
call the gc mark functions in updateFunc
implement the prelink phase in the frontend
fix regressions / get the tests passing again
eliminate TODOs
track function import ref count for optimal leb encoding
sort undef data segments separately and memset them at runtime
only emit referenced navs in release modes
repeat prelink when linker input changes occur. this must redo all linker input tasks and then redo all functions and navs.

Demo: Incremental Compilation

After this branch is ready to merge, I'll put a demo here.

Demo: Serializing and Deserializing Linker State

After this branch is ready to merge, I'll put a demo here.

Perf Data Point: hello world

Benchmark 1 (1510 runs): 0.14.0-dev.2548+0f17cbfc6/bin/zig build-exe ../test/standalone/simple/hello_world/hello.zig -target wasm32-wasi -fno-llvm -fno-lld -fno-compiler-rt
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          33.1ms ± 4.29ms    21.8ms … 45.9ms          0 ( 0%)        0%
  peak_rss           92.6MB ±  659KB    90.8MB … 95.1MB         27 ( 2%)        0%
  cpu_cycles         52.5M  ± 1.02M     49.4M  … 59.6M          20 ( 1%)        0%
  instructions       68.8M  ± 8.58K     68.8M  … 68.8M           5 ( 0%)        0%
  cache_references   3.69M  ± 35.4K     3.60M  … 4.12M          23 ( 2%)        0%
  cache_misses        549K  ± 17.1K      494K  …  605K          12 ( 1%)        0%
  branch_misses       383K  ± 4.31K      369K  …  403K          29 ( 2%)        0%
Benchmark 2 (1510 runs): 0.14.0-dev.2611+50897fc04/bin/zig build-exe ../test/standalone/simple/hello_world/hello.zig -target wasm32-wasi -fno-llvm -fno-lld -fno-compiler-rt
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          33.1ms ± 4.17ms    21.4ms … 47.2ms          1 ( 0%)          +  0.1% ±  0.9%
  peak_rss           92.1MB ±  672KB    90.0MB … 94.4MB         12 ( 1%)          -  0.5% ±  0.1%
  cpu_cycles         51.3M  ± 1.01M     48.0M  … 57.4M          28 ( 2%)        ⚡-  2.3% ±  0.1%
  instructions       67.4M  ± 6.50K     67.4M  … 67.5M          16 ( 1%)        ⚡-  2.0% ±  0.0%
  cache_references   3.60M  ± 35.3K     3.50M  … 3.79M          17 ( 1%)        ⚡-  2.5% ±  0.1%
  cache_misses        543K  ± 16.5K      498K  …  595K          14 ( 1%)          -  1.2% ±  0.2%
  branch_misses       367K  ± 3.95K      358K  …  385K          22 ( 1%)        ⚡-  4.1% ±  0.1%

Followup

After landing this branch I plan to set a firm release date for the 0.14.0 tag.

ELF, COFF, and MachO need the same treatment. I started with Wasm because it is significantly fewer lines of code. Some strategies can be shared there, however, I don't expect to keep as much in memory with those linkers, since the total object file size could be enormous.

Post-Merge Roadmap:

One month of QA for 0.14.0
Release 0.14.0
Enhance wasm linker enough to pass LLD's test suite for Wasm.
Remove dependency on LLD for Wasm.
Repeat steps 3-4 for ELF
Repeat steps 3-4 for COFF
Repeat steps 3-4 for MachO
Rework ELF linker code with respect to incremental compilation goals
Rework COFF linker code with respect to incremental compilation goals
Rework MachO linker code with respect to incremental compilation goals

The goals of this branch are to: * compile faster when using the wasm linker and backend * enable saving compiler state by directly copying in-memory linker state to disk. * more efficient compiler memory utilization * introduce integer type safety to wasm linker code * generate better WebAssembly code * fully participate in incremental compilation * do as much work as possible outside of flush(), while continuing to do linker garbage collection. * avoid unnecessary heap allocations * avoid unnecessary indirect function calls In order to accomplish this goals, this removes the ZigObject abstraction, as well as Symbol and Atom. These abstractions resulted in overly generic code, doing unnecessary work, and needless complications that simply go away by creating a better in-memory data model and emitting more things lazily. For example, this makes wasm codegen emit MIR which is then lowered to wasm code during linking, with optimal function indexes etc, or relocations are emitted if outputting an object. Previously, this would always emit relocations, which are fully unnecessary when emitting an executable, and required all function calls to use the maximum size LEB encoding. This branch introduces the concept of the "prelink" phase which occurs after all object files have been parsed, but before any Zcu updates are sent to the linker. This allows the linker to fully parse all objects into a compact memory model, which is guaranteed to be complete when Zcu code is generated. This commit is not a complete implementation of all these goals; it is not even passing semantic analysis.

Makes linker functions have small error sets, required to report diagnostics properly rather than having a massive error set that has a lot of codes. Other linker implementations are not ported yet. Also the branch is not passing semantic analysis yet.

See #363. Please file issues rather than making TODO comments.

mainly, rework how relocations works. This is the point at which symbol indexes are known - not before. And don't emit unnecessary relocations! They're only needed when emitting an object file. Changes wasm linker to keep MIR around long-lived so that fixups can be reapplied after linker garbage collection. use labeled switch while we're at it

Still, the branch is not yet passing semantic analysis.

and more disciplined type safety for output function indexes

in which case the values array is set to undefined

Recognize three distinct phases: * before prelink ("object phase") * after prelink, before flush ("zcu phase") * during flush ("flush phase") With this setup, we create data structures during the object phase, then mutate them during the zcu phase, and then further mutate them during the flush phase. In order to make the flush phase repeatable, the data structures are copied just before starting the flush phase. Further Zcu updates occur against the non-copied data structures. What's not implemented is frontend garbage collection, in which case some more changes will be needed in this linker logic to achieve a valid state with data invariants intact.

and expose object_host_name as an option for setting the lib name for object files, since the wasm linking standards don't specify a way to do it.

one hash table lookup per fixup

instead of recursion, callers of the function are responsible for checking the respective tables that might have new entries in them and then calling lowerZcuData again.

codegen can generate zcu data dependencies that need to be populated

it cannot be done earlier since ids are not stable yet

this strategy uses a "postponed" queue to handle codegen tasks that spawn too early. there's probably a better way.

with 497592c

andrewrk force-pushed the wasm-linker branch from c9bf6eb to 4154612 Compare December 14, 2024 22:04

alexrp mentioned this pull request Dec 15, 2024

compiler: Switch to DWARF 5 by default for zig cc and the LLVM backend. #22235

Draft

andrewrk force-pushed the wasm-linker branch 6 times, most recently from 5c37f96 to 26c93f4 Compare December 24, 2024 02:41

andrewrk added 22 commits December 26, 2024 18:17

remove "FIXME" from codebase

c91f212

See #363. Please file issues rather than making TODO comments.

macho linker conforms to explicit error sets, again

747ab0f

elf linker: conform to explicit error sets

d329e74

rework error handling in the backends

c720e0d

compiler: add type safety for export indices

3e37a3c

std.array_list: tiny refactor for pleasure

c63464c

wasm codegen: fix some compilation errors

d3b23c6

wasm: implement errors_len as a MIR opcode with no linker involvement

76cffc2

wasm codegen: switch on bool instead of int

1d85b8c

wasm codegen: rename func: CodeGen to cg: CodeGen

0a145a6

wasm: move error_name lowering to Emit phase

6cd0792

wasm: use call_intrinsic MIR instruction

674d275

switch to ArrayListUnmanaged for machine code

546fe9e

wasm: fix many compilation errors

3e2f6da

Still, the branch is not yet passing semantic analysis.

wasm linker: support export section as implicit symbols

e1eeb4a

frontend: add const to more Zcu pointers

1995cb4

wasm linker: implement name, module name, and type for function imports

3327eb4

wasm linker: flush implemented up to the export section

7def410

wasm linker: flush export section

79ae3b1

andrewrk added 29 commits December 26, 2024 18:20

wasm linker: add __zig_error_name_table data when needed

29f83f5

wasm codegen: fix extra index not relative

e7fda5c

wasm linker: fix calling imported functions

174fce2

and more disciplined type safety for output function indexes

std.ArrayHashMap: allow passing empty values array

e123d51

in which case the values array is set to undefined

wasm linker: handle extern functions in updateNav

3ec8290

wasm linker: allow undefined imports when lib name is provided

3254bed

and expose object_host_name as an option for setting the lib name for object files, since the wasm linking standards don't specify a way to do it.

wasm codegen: fix call_indirect

dbb8872

wasm linker: fix eliding empty data segments

5761bab

wasm linker: implement data fixups

1c05115

one hash table lookup per fixup

wasm linker: avoid recursion in lowerZcuData

c35b415

instead of recursion, callers of the function are responsible for checking the respective tables that might have new entries in them and then calling lowerZcuData again.

wasm linker: also call lowerZcuData in updateFunc

402f84c

codegen can generate zcu data dependencies that need to be populated

wasm linker: initialize the data segments table in flush

27150e9

it cannot be done earlier since ids are not stable yet

wasm linker: zcu data fixups are already applied

6d0057b

implement error table and error names data segments

bf63c57

wasm linker: fix data section in flush

d58c323

implement the prelink phase in the frontend

c87373e

this strategy uses a "postponed" queue to handle codegen tasks that spawn too early. there's probably a better way.

wasm linker: implement stack pointer global

8e11532

std.io: remove the "temporary workaround" for stage2_aarch64

12752db

wasm linker: implement indirect function calls

01266ab

fix stack pointer initialized to wrong vaddr

50f8c0a

use fixed writer in more places

c89139c

wasm linker: fix missing function type entry for import

0897dd2

wasm linker: fix active data segment offset value

eb88e1d

Compilation: account for C objects and resources in prelink

cbf6200

wasm linker: fix relocation parsing

365edce

wasm linker: fix crashes when parsing compiler_rt

869b7b2

fix missing missing entry symbol error when no zcu

eb7d938

resolve merge conflicts

9885cb1

with 497592c

andrewrk force-pushed the wasm-linker branch from 26c93f4 to 9885cb1 Compare December 27, 2024 02:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wasm linker: aggressive rewrite towards Data-Oriented Design #22220

wasm linker: aggressive rewrite towards Data-Oriented Design #22220

andrewrk commented Dec 13, 2024 •

edited

Loading

wasm linker: aggressive rewrite towards Data-Oriented Design #22220

Are you sure you want to change the base?

wasm linker: aggressive rewrite towards Data-Oriented Design #22220

Conversation

andrewrk commented Dec 13, 2024 • edited Loading

Merge Checklist

Demo: Incremental Compilation

Demo: Serializing and Deserializing Linker State

Perf Data Point: hello world

Followup

andrewrk commented Dec 13, 2024 •

edited

Loading