@Natanael

Natanael@slrpnk.net · 1 day ago

You get used to it. I don’t even see the code

Natanael@slrpnk.net · 2 days ago

That what the brits did to get Germans to talk in WW2

Natanael@slrpnk.net · 3 days ago

Already broken

https://en.wikipedia.org/wiki/8K_resolution

8K resolution refers to an image or display resolution with a width of approximately 8,000 pixels. 8K UHD (7680 × 4320)

Natanael@slrpnk.net · 3 days ago

https://www.benq.com/en-ap/knowledge-center/knowledge/what-is-resolution-of-monitor-full-hd-vs-2k-vs-4k.html

The number refers to the horizontal resolution. FHD is nearly 2K pixels wide, just as 4K resolutions are nearly 4K pixels wide, although FHD is the typical term for the resolution and QHD is more commonly called 2K instead than FHD

Natanael@slrpnk.net · 4 days ago

Tapes themselves are cheaper, but the drive (and potentially operating cost?) can definitely be higher for the industrial stuff

Natanael@slrpnk.net · 4 days ago

And my TV is still a cheap full HD (2K) screen from 2011, so I’ve got no reason to buy media in higher quality

Natanael@slrpnk.net · 5 days ago

Orange is the new black

Natanael@slrpnk.net · 8 days ago

And while they tried to DRM it, the DVD standard still ended up having to maintain compatibility across all readers and discs, but for bluray they regularly deprecate older readers who no longer can play newer movies because new releases use new encryption keys which the old readers don’t have access to (and for this reason the PS consoles are the best bluray movie players because Sony keeps them updated)

Natanael@slrpnk.net · 9 days ago

Because few people know what’s realistic for LLMs

Natanael@slrpnk.net · edit-2 10 days ago

Humans learn a lot through repetition, no reason to believe that LLMs wouldn’t benefit from reinforcement of higher quality information. Especially because seeing the same information in different contexts helps mapping the links between the different contexts and helps dispel incorrect assumptions. But like I said, the only viable method they have for this kind of emphasis at scale is incidental replication of more popular works in its samples. And when something is duplicated too much it overfits instead.

They need to fundamentally change big parts of how learning happens and how the algorithm learns to fix this conflict. In particular it will need a lot more “introspective” training stages to refine what it has learned, and pretty much nobody does anything even slightly similar on large models because they don’t know how, and it would be insanely expensive anyway.

Natanael@slrpnk.net · edit-2 11 days ago

Yes, but should big companies with business models designed to be exploitative be allowed to act hypocritically?

My problem isn’t with ML as such, or with learning over such large sets of works, etc, but these companies are designing their services specifically to push the people who’s works they rely on out of work.

The irony of overfitting is that both having numerous copies of common works is a problem AND removing the duplicates would be a problem. They need an understanding of what’s representative for language, etc, but the training algorithms can’t learn that on their own and it’s not feasible go have humans teach it that and also the training algorithm can’t effectively detect duplicates and “tune down” their influence to stop replicating them exactly. Also, trying to do that latter thing algorithmically will ALSO break things as it would break its understanding of stuff like standard legalese and boilerplate language, etc.

The current generation of generative ML doesn’t do what it says on the box, AND the companies running them deserve to get screwed over.

And yes I understand the risk of screwing up fair use, which is why my suggestion is not to hinder learning, but to require the companies to track copyright status of samples and inform ends users of licensing status when the system detects a sample is substantially replicated in the output. This will not hurt anybody training on public domain or fairly licensed works, nor hurt anybody who tracks authorship when crawling for samples, and will also not hurt anybody who has designed their ML system to be sufficiently transformative that it never replicates copyrighted samples. It just hurts exploitative companies.

Natanael@slrpnk.net · edit-2 11 days ago

Remember when media companies tried to sue switch manufacturers because their routers held copies of packets in RAM and argued they needed licensing for that?

https://www.eff.org/deeplinks/2006/06/yes-slashdotters-sira-really-bad

Training an AI can end up leaving copies of copyrightable segments of the originals, look up sample recover attacks. If it had worked as advertised then it would be transformative derivative works with fair use protection, but in reality it often doesn’t work that way

See also

https://curia.europa.eu/juris/liste.jsf?nat=or&mat=or&pcs=Oor&jur=C%2CT%2CF&for=&jge=&dates=&language=en&pro=&cit=none%252CC%252CCJ%252CR%252C2008E%252C%252C%252C%252C%252C%252C%252C%252C%252C%252Ctrue%252Cfalse%252Cfalse&oqp=&td=%3BALL&avg=&lgrec=en&parties=Football%2BAssociation%2BPremier%2BLeague&lg=&page=1&cid=10711513

Natanael@slrpnk.net · edit-2 11 days ago

Math and formal logic are effectively equivalent and philosophy without conditional logic is useless. Scientifically useful philosophy is just “explorative logic” or something like it

Natanael@slrpnk.net · 14 days ago

As you turn around a bigger joke arrives

Natanael@slrpnk.net · edit-2 18 days ago

Quantum mechanics still have endless ratios which aren’t discrete. Especially ratios between stuff like wavelengths, particle states, and more

Natanael@slrpnk.net · 18 days ago

Complex numbers, and a bunch more things too

Natanael@slrpnk.net · 20 days ago

But you can’t detect such things without either server side scanning (kills E2EE dead) or client side scanning (will always be limited in what it can detect, and it’s easy to patch out of clients, AND there’s still the risk of govs maliciously pushing detection of banned media)

Natanael@slrpnk.net · 24 days ago

Not fully encrypted unless you enable lockdown mode (and losing various features)

Natanael@slrpnk.net · 24 days ago

The perceptual hash algorithm was broken in hours, then so fully broken that modified images were visually indistinguishable from unmodified images, so you could send people images with hash values that match flagged photos.

Also, then there’s the thing of the risk of various jurisdictions pushing for adding detection of other banned content.

Natanael@slrpnk.net · 24 days ago

But once a process is running its trivial to get weeks of extremely detailed history and lots of secrets you thought were ephemeral