Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wasm linker: aggressive rewrite towards Data-Oriented Design #22220

Open
wants to merge 68 commits into
base: master
Choose a base branch
from

Conversation

andrewrk
Copy link
Member

@andrewrk andrewrk commented Dec 13, 2024

The goals of this branch are to:

  • compile faster when using the wasm linker and backend
  • enable saving compiler state by directly copying in-memory linker state to disk.
  • more efficient compiler memory utilization
  • introduce integer type safety to wasm linker code
  • generate better WebAssembly code
  • fully participate in incremental compilation
  • do as much work as possible outside of flush(), while continuing to do linker garbage collection.
  • avoid unnecessary heap allocations
  • avoid unnecessary indirect function calls

In order to accomplish these goals, this removes the ZigObject abstraction, as well as Symbol and Atom. These abstractions resulted in overly generic code, doing unnecessary work, and needless complications that simply go away by creating a better in-memory data model and emitting more things lazily.

For example, this makes wasm codegen emit MIR which is then lowered to wasm code during linking, with optimal function indexes etc, or relocations are emitted if outputting an object. Previously, this would always emit relocations, which are fully unnecessary when emitting an executable, and required all function calls to use the maximum size LEB encoding.

This branch introduces the concept of the "prelink" phase which occurs after all object files have been parsed, but before any Zcu updates are sent to the linker. This allows the linker to fully parse all objects into a compact memory model, which is guaranteed to be complete when Zcu code is generated.

Merge Checklist

  • data_segments state needs to be reset on update
  • call the gc mark functions in updateFunc
  • implement the prelink phase in the frontend
  • fix regressions / get the tests passing again
  • eliminate TODOs
  • track function import ref count for optimal leb encoding
  • sort undef data segments separately and memset them at runtime
  • only emit referenced navs in release modes
  • repeat prelink when linker input changes occur. this must redo all linker input tasks and then redo all functions and navs.

Demo: Incremental Compilation

After this branch is ready to merge, I'll put a demo here.

Demo: Serializing and Deserializing Linker State

After this branch is ready to merge, I'll put a demo here.

Perf Data Point: hello world

Benchmark 1 (1510 runs): 0.14.0-dev.2548+0f17cbfc6/bin/zig build-exe ../test/standalone/simple/hello_world/hello.zig -target wasm32-wasi -fno-llvm -fno-lld -fno-compiler-rt
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          33.1ms ± 4.29ms    21.8ms … 45.9ms          0 ( 0%)        0%
  peak_rss           92.6MB ±  659KB    90.8MB … 95.1MB         27 ( 2%)        0%
  cpu_cycles         52.5M  ± 1.02M     49.4M  … 59.6M          20 ( 1%)        0%
  instructions       68.8M  ± 8.58K     68.8M  … 68.8M           5 ( 0%)        0%
  cache_references   3.69M  ± 35.4K     3.60M  … 4.12M          23 ( 2%)        0%
  cache_misses        549K  ± 17.1K      494K  …  605K          12 ( 1%)        0%
  branch_misses       383K  ± 4.31K      369K  …  403K          29 ( 2%)        0%
Benchmark 2 (1510 runs): 0.14.0-dev.2611+50897fc04/bin/zig build-exe ../test/standalone/simple/hello_world/hello.zig -target wasm32-wasi -fno-llvm -fno-lld -fno-compiler-rt
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          33.1ms ± 4.17ms    21.4ms … 47.2ms          1 ( 0%)          +  0.1% ±  0.9%
  peak_rss           92.1MB ±  672KB    90.0MB … 94.4MB         12 ( 1%)          -  0.5% ±  0.1%
  cpu_cycles         51.3M  ± 1.01M     48.0M  … 57.4M          28 ( 2%)        ⚡-  2.3% ±  0.1%
  instructions       67.4M  ± 6.50K     67.4M  … 67.5M          16 ( 1%)        ⚡-  2.0% ±  0.0%
  cache_references   3.60M  ± 35.3K     3.50M  … 3.79M          17 ( 1%)        ⚡-  2.5% ±  0.1%
  cache_misses        543K  ± 16.5K      498K  …  595K          14 ( 1%)          -  1.2% ±  0.2%
  branch_misses       367K  ± 3.95K      358K  …  385K          22 ( 1%)        ⚡-  4.1% ±  0.1%

Followup

After landing this branch I plan to set a firm release date for the 0.14.0 tag.

ELF, COFF, and MachO need the same treatment. I started with Wasm because it is significantly fewer lines of code. Some strategies can be shared there, however, I don't expect to keep as much in memory with those linkers, since the total object file size could be enormous.

Post-Merge Roadmap:

  1. One month of QA for 0.14.0
  2. Release 0.14.0
  3. Enhance wasm linker enough to pass LLD's test suite for Wasm.
  4. Remove dependency on LLD for Wasm.
  5. Repeat steps 3-4 for ELF
  6. Repeat steps 3-4 for COFF
  7. Repeat steps 3-4 for MachO
  8. Rework ELF linker code with respect to incremental compilation goals
  9. Rework COFF linker code with respect to incremental compilation goals
  10. Rework MachO linker code with respect to incremental compilation goals

The goals of this branch are to:
* compile faster when using the wasm linker and backend
* enable saving compiler state by directly copying in-memory linker
  state to disk.
* more efficient compiler memory utilization
* introduce integer type safety to wasm linker code
* generate better WebAssembly code
* fully participate in incremental compilation
* do as much work as possible outside of flush(), while continuing to do
  linker garbage collection.
* avoid unnecessary heap allocations
* avoid unnecessary indirect function calls

In order to accomplish this goals, this removes the ZigObject
abstraction, as well as Symbol and Atom. These abstractions resulted
in overly generic code, doing unnecessary work, and needless
complications that simply go away by creating a better in-memory data
model and emitting more things lazily.

For example, this makes wasm codegen emit MIR which is then lowered to
wasm code during linking, with optimal function indexes etc, or
relocations are emitted if outputting an object. Previously, this would
always emit relocations, which are fully unnecessary when emitting an
executable, and required all function calls to use the maximum size LEB
encoding.

This branch introduces the concept of the "prelink" phase which occurs
after all object files have been parsed, but before any Zcu updates are
sent to the linker. This allows the linker to fully parse all objects
into a compact memory model, which is guaranteed to be complete when Zcu
code is generated.

This commit is not a complete implementation of all these goals; it is
not even passing semantic analysis.
Makes linker functions have small error sets, required to report
diagnostics properly rather than having a massive error set that has a
lot of codes.

Other linker implementations are not ported yet.

Also the branch is not passing semantic analysis yet.
See #363. Please file issues rather than making TODO comments.
mainly, rework how relocations works. This is the point at which symbol
indexes are known - not before. And don't emit unnecessary relocations!
They're only needed when emitting an object file.

Changes wasm linker to keep MIR around long-lived so that fixups can be
reapplied after linker garbage collection.

use labeled switch while we're at it
Still, the branch is not yet passing semantic analysis.
and more disciplined type safety for output function indexes
in which case the values array is set to undefined
Recognize three distinct phases:
* before prelink ("object phase")
* after prelink, before flush ("zcu phase")
* during flush ("flush phase")

With this setup, we create data structures during the object phase, then
mutate them during the zcu phase, and then further mutate them during
the flush phase. In order to make the flush phase repeatable, the data
structures are copied just before starting the flush phase.

Further Zcu updates occur against the non-copied data structures.

What's not implemented is frontend garbage collection, in which case
some more changes will be needed in this linker logic to achieve a valid
state with data invariants intact.
and expose object_host_name as an option for setting the lib name for
object files, since the wasm linking standards don't specify a way to do
it.
one hash table lookup per fixup
instead of recursion, callers of the function are responsible for
checking the respective tables that might have new entries in them and
then calling lowerZcuData again.
codegen can generate zcu data dependencies that need to be populated
it cannot be done earlier since ids are not stable yet
this strategy uses a "postponed" queue to handle codegen tasks that
spawn too early. there's probably a better way.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant