Is constraining and renormalizing the output logits an unbiased way to convert a free form LLM into a classifier? #352

dhuynh95 · 2023-11-11T15:50:31Z

dhuynh95
Nov 11, 2023

Hi everyone,

I am interested in the calibration of LLMs to ensure their trustworthiness. We saw with GPT4 paper that RLHF for instance destroyed calibrated answers.

I am interested in measuring calibration of models that were trained on general text generation and not classification specifically, e.g. GPT4.
For instance, imagine I want the model to do classification by using outlines to only output "Cat" or "Dog".

Can I use outline to provide an accurate measure of output probability without having to fine-tune the model for classification? Aka, does zero-ing all tokens that don't produce the class I am interested in, can be a good way to provide output probability similar to what I would get if I were to fine-tune a classification head?

Thanks for the help :)

rlouf · 2023-11-13T09:59:41Z

rlouf
Nov 13, 2023
Maintainer

That's an excellent question, and very much a research question at this point :)

0 replies

rlouf · 2024-01-04T21:07:37Z

rlouf
Jan 4, 2024
Maintainer

Implementing #479 may make it easier to answer this question. Note that you can already get the logits at each step by building the sequence generator with sequence_generator

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is constraining and renormalizing the output logits an unbiased way to convert a free form LLM into a classifier? #352

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Is constraining and renormalizing the output logits an unbiased way to convert a free form LLM into a classifier? #352

dhuynh95 Nov 11, 2023

Replies: 2 comments

rlouf Nov 13, 2023 Maintainer

rlouf Jan 4, 2024 Maintainer

dhuynh95
Nov 11, 2023

rlouf
Nov 13, 2023
Maintainer

rlouf
Jan 4, 2024
Maintainer