• 0 Posts
  • 131 Comments
Joined 3 years ago
cake
Cake day: August 10th, 2023

help-circle




  • Because in my personal experience through use 25% doesn’t seem quite right.

    Besides these companies have a monetary incentive to ensure LLMs show high numbers on these tests. One of the most widely use tests (bench verified) is itself a currated selection of problems. In real world usage the failure rate is going to be much higher.

    A rational person trust but verifies, and at least for me the verification doesn’t hold up to even a tiny bit of scrutiny so having doubts is a perfectly healthy thing to do.

    Just because someone disagrees with with a data source does not make them irrational. There are some extremely well verified truths that are irrational to dismiss but not all data sources / studies have had that amount of rigor applied against them. Data can tell a story, but it doesn’t always tell the whole truth. People manipulate data to their own benifit.

    People confuse the scientific method and academic research for “this one academic source says this it must be true” when really you need more then that.







  • Gunna be honest, if you really feel that way you’d fire the vibe coder. I don’t give a shit if it’s broken ifs due to the negligence of management and other employees.

    Poor planning on your part does not constitute an emergency on mine. And ofc this varies based on the service as all things do. But it’s rarely important enough that I’d stay over. Like if lives where at risk or it’s a “bring the company down but” maybe I’d stay. But there would be hell to pay for whoever broke it later. Either that or the company would loose me as an employee