- Adding norm_to_scale_identity_weight_per_block to multiply and update_cache methods of estimator which allows the identity_weight to be scaled differently for each block according to some kind of norm (or norm-like function) of the curvature for that block. #873
Job | Run time |
---|---|
3m 29s | |
3m 33s | |
7m 2s |