incomingpain 17 hours ago

I legitimately dont understand why anyone would want a 4B model.

They might as well call all models 4B and smaller after psychedelics because they be hallucinating.

  • mdaniel 15 hours ago

    Plus, this specific use case is also "to detect legally relevant text like license declarations in code and documentation" so I guess they really bought into that regex adage about "and now you have two problems" and thought they'd introduce some floating point math instead

  • hammyhavoc 16 hours ago

    And yet so many people on HN are adamant that the more tokens, the better, and it's all just a matter of throwing more money at it, and it's inevitable it will somehow "get better", because there's "so much money riding on it".

    I wonder when the penny will drop?

    • incomingpain 13 hours ago

      I was just testing this a bit more.

      I grabbed qwen3:4b. Cranked it to the max of 32k tokens.

      It's fast to be sure, and im struggling to get it to hallucinate; but it is giving me a ton of 'The provided context does not include specifics'

      Resource-wise its like running 12-16B, but faster. But soon as you expand the 12B to like 10k tokens it's clearly better for barely anymore resources.

    • incomingpain 15 hours ago

      I wonder if my understanding is flawed. I've tested this using lm studio. Lots of dials are involved.

alganet 16 hours ago

This actually sounds like a cool use case.