• Justin@lemmy.jlh.name
    link
    fedilink
    arrow-up
    1
    ·
    edit-2
    1 month ago

    I guess that makes sense, but I wonder if it would be hard to get clean data out of the per-token confidence values. The LLM could be hallucinating, or it could just be generating bad grammar. It seems like it’s hard enough already to get LLMs to distinguish between “killing processes” and murder, but maybe there could be some novel training and inference techniques that come up.