Separate NNUE nets progress tests #5066

PavelChess · 2024-02-23T09:46:42Z

PavelChess
Feb 23, 2024

Currently regression tests are only versus SF 16.
Separate measurement of NNUE net improvement may be also interesting.

The way I suggest is to run test of latest net versus the first net of the latest NNUE generation that was accepted for abrok version, or / and the net of the latest official version.

XInTheDark · 2024-02-23T09:51:31Z

XInTheDark
Feb 23, 2024

Not much point IMO. You can't really treat search and eval separately - eval changes affect search. So if you replace latest net with an ancient net it will probably lose much more elo than how much the net is actually worse by.

0 replies

PavelChess · 2024-02-24T16:51:16Z

PavelChess
Feb 24, 2024
Author

If so, I suggest to run 2 tests - first between the 2 nets at latest development version, and second - test them at the first development version with latest net architecture.

I think average ELO difference will be very close to real.

0 replies

vondele · 2024-03-03T11:17:45Z

vondele
Mar 3, 2024
Maintainer

not much sense in tracking such things independently and also hard to do across net architecture changes

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate NNUE nets progress tests #5066

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Separate NNUE nets progress tests #5066

PavelChess Feb 23, 2024

Replies: 3 comments

XInTheDark Feb 23, 2024

PavelChess Feb 24, 2024 Author

vondele Mar 3, 2024 Maintainer

PavelChess
Feb 23, 2024

XInTheDark
Feb 23, 2024

PavelChess
Feb 24, 2024
Author

vondele
Mar 3, 2024
Maintainer