Separate NNUE nets progress tests #5066
Replies: 3 comments
-
Not much point IMO. You can't really treat search and eval separately - eval changes affect search. So if you replace latest net with an ancient net it will probably lose much more elo than how much the net is actually worse by. |
Beta Was this translation helpful? Give feedback.
-
If so, I suggest to run 2 tests - first between the 2 nets at latest development version, and second - test them at the first development version with latest net architecture. I think average ELO difference will be very close to real. |
Beta Was this translation helpful? Give feedback.
-
not much sense in tracking such things independently and also hard to do across net architecture changes |
Beta Was this translation helpful? Give feedback.
-
Currently regression tests are only versus SF 16.
Separate measurement of NNUE net improvement may be also interesting.
The way I suggest is to run test of latest net versus the first net of the latest NNUE generation that was accepted for abrok version, or / and the net of the latest official version.
Beta Was this translation helpful? Give feedback.
All reactions