Improve performance of flush_delayed_lets for large expressions

We're seeing cases of quadratic behavior in compiling large expressions, particularly in list and array literals (see the generated `uucp_case_map_data.ml` in `uucp` for an example). Profiling suggests it's the fault of `To_cmm_env.flush_delayed_lets`, which uses the unoptimized O(n log n) `Patricia_tree.filter_map`. This patch implements `filter_map` in O(n) time in the obvious way. It also introduces `filter_map_sharing`, which is asymptotically the same but saves a good deal more work in the case that the map is mostly returned unchanged. Testing indicates the improved `filter_map` brought `flush_delayed_lets` down from ~70% (!) of execution time to ~60%, but switching to `filter_map_sharing` brought it down to ~30%. (This still seems high, but OTOH this is a file with almost nothing interesting to compile.)
lukemaurer · Nov 26, 2024 · bf4a909 · bf4a909
1 parent 1f09efd
commit bf4a909
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/middle_end/flambda2/to_cmm/to_cmm_env.ml b/middle_end/flambda2/to_cmm/to_cmm_env.ml
@@ -962,7 +962,7 @@ let flush_delayed_lets ~mode env res =
     bindings_to_flush := M.add b.order binding !bindings_to_flush
   in
   let bindings_to_keep =
-    Variable.Map.filter_map
+    Variable.Map.filter_map_sharing
       (fun _ (Binding b as binding) ->
         match b.inline with
         | Do_not_inline ->