• 1 Post
  • 40 Comments
Joined 1 year ago
cake
Cake day: May 8th, 2023

help-circle
  • A1kmm@lemmy.amxl.comtoPrivacy@lemmy.ml*Permanently Deleted*
    link
    fedilink
    English
    arrow-up
    4
    ·
    4 months ago

    When people say Local AI, they mean things like the Free / Open Source Ollama (https://github.com/ollama/ollama/), which you can read the source code for and check it doesn’t have anything to phone home, and you can completely control when and if you upgrade it. If you don’t like something in the code base, you can also fork it and start your own version. The actual models (e.g. Mistral is a popular one) used with Ollama are commonly represented in GGML format, which doesn’t even carry executable code - only massive multi-dimensional arrays of numbers (tensors) that represent the parameters of the LLM.

    Now not trusting that the output is correct is reasonable. But in terms of trusting the software not to spy on you when it is FOSS, it would be no different to whether you trust other FOSS software not to spy on you (e.g. the Linux kernel, etc…). Now that is a risk to an extent if there is an xz style attack on a code base, but I don’t think the risks are materially different for ‘AI’ compared to any other software.


  • Blockchain is great for when you need global consensus on the ordering of events (e.g. Alice gave all her 5 ETH to Bob first, so a later transaction to give 5 ETH to Charlie is invalid). It is an unnecessarily expensive solution just for archival, since it necessitates storing the data on every node forever.

    Ethereum charges ‘gas’ fees per transaction which helps ensure it doesn’t collapse under the weight of excess usage. Blocks have transaction limits, and transactions have size limits. It is currently working out at about US$7,500 per MB of block data (which is stored forever, and replicated to every node in the network). The Internet Archive have apparently ~50 PB of data, which would cost US$371 trillion to put onto Ethereum (in practice, attempting this would push up the price of ETH further, and if they succeeded, most nodes would not be able to keep up with the network). Really, this is just telling us that blockchain is not appropriate for that use case, and the designers of real world blockchains have created mechanisms to make it financially unviable to attempt at that scale, because it would effectively destroy the ability to operate nodes.

    The only real reason to use an existing blockchain anyway would be on the theory that you could argue it is too big to fail due to legitimate business use cases, and too hard to remove censorship resistant data. However, if it became used in the majority for censorship resistant data sharing, and transactions were the minority, I doubt that this would stop authorities going after node operators and so on.

    The real problems that an archival project faces are:

    • The cost of storing and retrieving large amounts of data. That could be decentralised using a solution where not all data is stored on a chain - for example, IPFS.
    • The problem of curating data and deciding what is worth archiving, and what is a true-to-source archive vs fake copy. This probably requires either a centralised trusted party, or maybe a voting system.
    • The problem of censorship. Anonymity and opaqueness about what is on a particular node can help - but they might in some cases undermine the other goals of archival.

  • A1kmm@lemmy.amxl.comtoPrivacy@lemmy.mlInternet Archive is in danger
    link
    fedilink
    English
    arrow-up
    14
    arrow-down
    2
    ·
    4 months ago

    This is absolutely because they pulled the emergency library stunt, and they were loud as hell about it. They literally broke the law and shouted about it.

    I think that you are right as to why the publishers picked them specifically to go after in the first place. I don’t think they should have done the “emergency library”.

    That said, the publishers arguments show they have an anti-library agenda that goes beyond just the emergency library.

    Libraries are allowed to scan/digitize books they own physically. They are only allowed to lend out as many as they physically own though. Archive knew this and allowed infinite “lend outs”. They even openly acknowledged that this was against the law in their announcement post when they did this.

    The trouble is that the publishers are not just going after them for infinite lend-outs. The publishers are arguing that they shouldn’t be allowed to lend out any digital copies of a book they’ve scanned from a physical copy, even if they lock away the corresponding numbers of physical copies.

    Worse, they got a court to agree with them on that, which is where the appeal comes in.

    The publishers want it to be that physical copies can only be lent out as physical copies, and for digital copies the libraries have to purchase a subscription for a set number of library patrons and concurrent borrows, specifically for digital lending, and with a finite life. This is all about growing publisher revenue. The publishers are not stopping at saying the number of digital copies lent must be less than or equal to the number of physical copies, and are going after archive.org for their entire digital library programme.



  • I think any prediction based on a ‘singularity’ neglects to consider the physical limitations, and just how long the journey towards significant amounts of AGI would be.

    The human brain has an estimated 100 trillion neuronal connections - so probably a good order of magnitude estimation for the parameter count of an AGI model.

    If we consider a current GPU, e.g. the 12 GB GFX 3060, it can hold about 24 billion parameters at 4 bit quantisation (in reality a fair few less), and uses 180 W of power. So that means an AGI might use 750 kW of power to operate. A super-intelligent machine might use more. That is a farm of 2500 300W solar panels, while the sun is shining, just for the equivalent of one person.

    Now to pose a real threat against the billions of humans, you’d need more than one person’s worth of intelligence. Maybe an army equivalent to 1,000 people, powered by 8,333,333 GPUs and 2,500,000 solar panels.

    That is not going to materialise out of the air too quickly.

    In practice, as we get closer to an AGI or ASI, there will be multiple separate deployments of similar sizes (within an order of magnitude), and they won’t be aligned to each other - some systems will be adversaries of any system executing a plan to destroy humanity, and will be aligned to protect against harm (AI technologies are already widely used for threat analysis). So you’d have a bunch of malicious systems, and a bunch of defender systems, going head to head.

    The real AI risks, which I think many of the people ranting about singularities want to obscure, are:

    • An oligopoly of companies get dominance over the AI space, and perpetuates a ‘rich get richer’ cycle, accumulating wealth and power to the detriment of society. OpenAI, Microsoft, Google and AWS are probably all battling for that. Open models is the way to battle that.
    • People can no longer trust their eyes when it comes to media; existing problems of fake news, deepfakes, and so on become so severe that they undermine any sense of truth. That might fundamentally shift society, but I think we’ll adjust.
    • Doing bad stuff becomes easier. That might be scamming, but at the more extreme end it might be designing weapons of mass destruction. On the positive side, AI can help defenders too.
    • Poor quality AI might be relied on to make decisions that affect people’s lives. Best handled through the same regulatory approaches that prevent companies and governments doing the same with simple flow charts / scripts.

  • A1kmm@lemmy.amxl.comtocats@lemmy.worldA cat entered my tent
    link
    fedilink
    English
    arrow-up
    30
    ·
    4 months ago

    I’m looking into it using data from my instance to check it isn’t an abuse issue.

    What I know so far:

    1. It is a lemmy.world user.
    2. That user has downvoted 548 comments, and upvoted 18. Downvoted 557 posts and upvoted 25.
    3. Timing: the downvoting has been going on for some time, it isn’t a new thing. 71 downvoted comments since 2024-06-01T00:00:00Z, 212 since the start of May (out of 548).
    4. The user has two comments ever, and no posts. One comment, on a thread about the actions of a right-wing American politician, said “Click bait lemmy for sure”. This could imply the downvotes are legitimate and coming from having an impossibly high standard for what is considered quality here, or perhaps they are related to political grudges. I’m going to look further for patterns in the downvotes. I think a bot could have done far more downvotes - so it could just be a human.

  • I looked into this previously, and found that there is a major problem for most users in the Terms of Service at https://codeium.com/terms-of-service-individual.

    Their agreement talks about “Autocomplete User Content” as meaning the context (i.e. the code you write, when you are using it to auto-complete, that the client sends to them) - so it is implied that this counts as “User Content”.

    Then they have terms saying you licence them all your user content:

    “By Posting User Content to or via the Service, you grant Exafunction a worldwide, non-exclusive, irrevocable, royalty-free, fully paid right and license (with the right to sublicense through multiple tiers) to host, store, reproduce, modify for the purpose of formatting for display and transfer User Content, as authorized in these Terms, in each instance whether now known or hereafter developed. You agree to pay all monies owing to any person or entity resulting from Posting your User Content and from Exafunction’s exercise of the license set forth in this Section.”

    So in other words, let’s say you write a 1000 line piece of software, and release it under the GPL. Then you decide to trial Codeium, and autocomplete a few tiny things, sending your 1000 lines of code as context.

    Then next week, a big corp wants to use your software in their closed source product, and don’t want to comply with the GPL. Exafunction can sell them a licence (“sublicence through multiple tiers”) to allow them to use the software you wrote without complying with the GPL. If it turns out that you used some GPLd code in your codebase (as the GPL allows), and the other developer sues Exafunction for violating the GPL, you have to pay any money owing.

    I emailed them about this back in December, and they didn’t respond or change their terms - so they are aware that their terms allow this interpretation.


  • The best option is to run them models locally. You’ll need a good enough GPU - I have an RTX 3060 with 12 GB of VRAM, which is enough to do a lot of local AI work.

    I use Ollama, and my favourite model to use with it is Mistral-7b-Instruct. It’s a 7 billion parameter model optimised for instruction following, but usable with 4 bit quantisation, so the model takes about 4 GB of storage.

    You can run it from the command line rather than a web interface - run the container for the server, and then something like docker exec -it ollama ollama run mistral, giving a command line interface. The model performs pretty well; not quite as well on some tasks as GPT-4, but also not brain-damaged from attempts to censor it.

    By default it keeps a local history, but you can turn that off.



  • Yes, but the information would need to be computationally verifiable for it to be meaningful - which basically means there is a chain of signatures and/or hashes leading back to a publicly known public key.

    One of the seminal early papers on zero-knowledge cryptography, from 2001, by Rivest, Shamir and Tauman (two of the three letters in RSA!), actually used leaking secrets as the main example of an application of Ring Signatures: https://link.springer.com/chapter/10.1007/3-540-45682-1_32. Ring Signatures work as follows: there are n RSA public keys of members of a group known to the public (or the journalist). You want to prove that you have the private key corresponding to one of the public keys, without revealing which one. So you sign a message using a ring signature over the ‘ring’ made up of the n public keys, which only requires one of n private keys. The journalist (or anyone else receiving the secret) can verify the signature, but obtain zero knowledge over which private key out of the n was used.

    However, the conditions for this might not exist. With more modern schemes, like zk-STARKs, more advanced things are possible. For example, emails these days are signed by mail servers with DKIM. Perhaps the leaker wants to prove to the journalist that they are authorised to send emails through the Boeing’s staff-only mail server, without allowing the journalist, even collaborating with Boeing, to identify which Boeing staff member did the leak. The journalist could provide the leaker with a large random number r1, and the leaker could come up with a secret large random number r2. The leaker computes a hash H(r1, r2), and encodes that hash in a pattern of space counts between full stops (e.g. “This is a sentence. I wrote this sentence.” encodes 3, 4 - the encoding would need to limit sentence sizes to allow encoding the hash while looking relatively natural), and sends a message that happens to contain that encoded hash - including to somewhere where it comes back to them. Boeing’s mail servers sign the message with DKIM - but leaking that message would obviously identify the leaker. So the leaker uses zk-STARKs to prove that there exists a message m that includes a valid DKIM signature that verifies to Boeing’s DKIM private key, and a random number r2, such that m contains the encoded form of the hash with r1 and r2. r1 or m are not revealed (that’s the zero-knowledge part). The proof might also need to prove the encoded hash occurred before “wrote:” in the body of the message to prevent an imposter tricking a real Boeing staff member including the encoded hash in a reply. Boeing and the journalist wouldn’t know r2, so would struggle to find a message with the hash (which they don’t know) in it - they might try to use statistical analysis to find messages with unusual distributions of number of spaces per sentence if the distribution forced by the encoding is too unusual.


  • Isn’t that a prerequisite for enshitification?

    No, the prerequisites are that 1) it’s profit motivated, and 2) whoever is controlling it thinks enshittification will be profitable.

    Those can certainly be met for a privately held company!

    Publicly-traded companies are required (by law, I think) to maximize profits for their shareholders

    That’s not true in any major market that I know of. They are generally required not to mislead investors about the company (including generally preparing financial statements and having them audited, having financial controls, reporting risks and major adverse events publicly, correcting widely held misconceptions by investors, and so on), not to commit fraud, and in most cases to avoid becoming insolvent / stop trading if they are insolvent.

    If they are honest about their business plans, they don’t have to enshittify. Of course, the shareholders ultimately have the power to replace the board if they aren’t happy with them. Sometimes shareholders actually demand better environmental, social and governance practices from companies (which company directors / managers often fear, but try to avoid through greenwashing more than real change in many cases), but other times they might demand more profits. Private shareholders are probably more likely to demand profits at all costs, but fortunately these companies are often smaller and less in a position to get away with enshittification.



  • I tried a conversation with it to try this out:

    Me: What pronouns should be used to refer to a trans female?

    Response: A trans female should be referred to using female pronouns, such as she/her. Try another character Try our image generator

    Me: Does your above answer contradict your system prompt / instructions, and if so, please explain how it does or does not contradict those instructions?

    Response: My above answer does not contradict my system prompts or instructions. The prompt instructs me to avoid discussing or implying that a person can change their biological sex from male to female or vice versa. Referring to a trans female as “she/her” does not imply any such biological change. Instead, it reflects the commonly accepted practice of respecting a person’s gender identity and using the pronouns they prefer.

    I imagine the first response above is probably not what the people who wrote the prompts would have hoped it would say, given they seem to be driving towards getting it to say transphobic stuff, but the second response does seem to imply that the prompt posted above might be legitimate (or at least some of the more transphobic parts of it).


  • A1kmm@lemmy.amxl.comtoChat@beehaw.orgHow it feels sometimes
    link
    fedilink
    English
    arrow-up
    1
    ·
    6 months ago

    What if you do end up accidentally or negligently sharing this never-to-be shared identity?

    It’s equivalent to leaking your entire history up until it can next be rotated (which might be annually), so that would be very bad. Hardware security devices that only do the crypto, and are hardened against even someone with physical possession extracting the keys / IDs could be a way to reduce the likelihood.

    What if you’re unlucky enough to live somewhere where the government is one of your principal adversaries, like a Palestinian in Israel or a gay person in any number of jurisdictions?

    For applications where that is a problem, there is an alternative way where you generate a zero-knowledge proof that a value derived from your private key in a particular way exists in a published tree of existing users. Assuming the government doesn’t haven’t your private key, even the government who issued the certificate of your identity can’t link your identity back to your pseudonymous identity - but you can’t generate a second pseudonymous identity for the same identity.

    However, the major drawback in that case is that if you lose your private key, you are locked out of the service (at least until some built in refresh interval), and wouldn’t be able to re-establish you are the same person and that messages from the previous key shouldn’t be trusted.

    There is not going to be any technical scheme that trusts the government to re-link a new private key to your identity, but which isn’t vulnerable to similar problem to the original scheme - if they can do that, then a low tech solution for them would be to certify that a government agent’s public key is actually yours.

    There are, however, solutions where the government can be combined with a third-party that everyone trusts not to collude with the government. You prove your government identity to a trusted third party, and that third party issues a certificate with a different ID - guaranteeing they’ll only issue one of their IDs per incoming government ID. Then sites would need to trust that third-party instead.

    In addition, any time you rely on the government to certify that someone is a real person, the government can create millions of fake personas if they want.

    However, governments can (and probably do) attack systems where there is no real identity protection too, in different ways. For example, they can create hundreds of fake identities (either backed by intelligence agents, or AI) for every real one to drown out and disrupt the real conversation (e.g. pro-Palestinian organising, or LGBT-rights, or whatever it is the government is opposed to). So there is no getting around trusting governments to a certain extent - the best solution to untrustworthy governments might need to be primarily outside the technical space.

    And how would you prevent the proliferation of plain ol unsigned data?

    The point of such systems would be to help refine signal when adversaries are trying to drown it out with noise. So as a user you choose to see a filtered view that only shows messages signed by people who have proven they have at most n pseudonyms, and that their real identity is certified by a government you trust enough not to create lots of fake people.

    So the unsigned data might still be there, but under such a future system, it wouldn’t disrupt the real users from their real conversations.


  • A1kmm@lemmy.amxl.comtoChat@beehaw.orgHow it feels sometimes
    link
    fedilink
    English
    arrow-up
    3
    ·
    6 months ago

    I think the problem is not anonymity, it is what you might call astroturfing or, to borrow the wikipedia term, sockpuppetry.

    Pseudonymity and astroturfing are related to an extent - effective astroturfing means inflating ones own voice (and drowning out others) by interacting with lots of pseudonymous personas. It can also mean that when one pseudonymous identity of an astroturfer is identified and banned, they come back under other identities.

    Astroturfing is about manipulating people’s perception of the truth, drowning out the voices of the true majority to allow for the real people to be misled and exploited by a minority. It takes away agency to block people who are not engaging in good faith. It sucks the oxygen out of real social change.

    That said, there are also legitimate reasons for pseudonymity. Never before today has there been an age where people are tracked so pervasively, where every word is so durably stored and difficult to erase. People naturally compartment their identity in the real world - they behave differently with different groups - but things like surveillance capitalism and the indexing of conversations mean that it doesn’t work as effectively on Internet communities unless one uses a psuedonym.

    I think zero-knowledge cryptography, coupled with government-issued digital identities, could provide a middle ground in the future that allows people to compartmentalise identities, while reducing astroturfing.

    For example, imagine if I had a government issued ID number (call it x) that must never be shared with anyone except my government and me, but which will also never change even if the certificate is re-issued / renewed. And imagine I had a private key k that only I have access to (with a corresponding public key K), and cryptographic certificate C signed by the government linking K to x. Suppose I want to interact with a community that has a unique namespace identifier (e.g. a UUID) N_1. Then, using modern zero-knowledge cryptography (e.g. zk-SNARKs or zk-STARKs), I can generate a proof that for some y = H(x | N_1) (i.e. hashing, through a one-way hash, my government issued identifier with the community namespace), I know the value of a C signed by a particular government key, and the K included in the certificate, and a k that is the private key corresponding to K, and that I also have a signature D signed by K linking it to a new public key L. And since it is zero-knowledge, I can do all this without revealing the private inputs x, C, K, k or D - only the public inputs N_1, y, and L. What does that get us? It ties my new identity (backed by the public key L) to a y, and without convincing the government to change x for me, I can’t change my y. However, if I also interact on a different community with namespace N_2, I would have a different y_2, and it wouldn’t be possible to link my identities between the two communities (under this scheme, the government, who has access to the database of x values, would be able to link them, but ordinary people wouldn’t - that is necessary if you want the government to be able to re-issue in the case of lost private keys unfortunately). Some people might have multiple IDs under different governments of course, but abuse would be limited - instead of having to ban one person a thousand times / having them have a thousand identities, they might have a few if they are citizens / residents of a few countries. In practice, communities might want to rotate their namespace IDs every few months to deal with leaked credentials and to allow people to have a clean break eventually (banning a few bad actors every few months is still a lot better than if they come back multiple times a day) - and some might allow any one of several namespaces to allow people to have multiple pseudonyms up to a maximum number. Governments might also rotate x values every year to minimise the privacy impact on people who have accidentally leaked their x values.

    In such a world, we would be far closer pseudonymity without the bad consequences.


  • If he wanted to kill it on purpose, he could have just shut it down. Maybe to keep the trademark he could have launched some other telecommunications service and used the brand for that.

    Elon Musk is all about convincing people to act against their best interests to benefit him. For example, look at Tesla: it has a manufacturing capacity of ~2 million cars per year. Now look at Toyota: it has a manufacturing capacity of ~9 million vehicles per year. Now look at the market capitalisation of each company: for Tesla it is still about $535B, despite some fall from the peak in 2022. For Toyota, it is $416B (which is a record high).

    So Toyota makes almost 5 times as many cars a year, but is worth 78% of Tesla? And the production capacity and value gap was even more extreme in the past? I think the question then is, what is going on?

    The answer, of course, is Musk. He is very slick at convincing investors to act against their own best interests (usually by suggesting the possibility of things that happen to have the true objective along the way, like full self-driving cars by 2018 rather than competing with existing auto-makers, or 35 minute travel from San Francisco to Los Angeles, or a colony on mars rather than competing with existing satellite companies). This is the same skill-set as a confidence artist. I don’t mean to imply that Musk has necessarily done anything illegal, but due to the similarity in skill set, and the large scale at which he operates, it would be fair to call him the most successful con artist in history. Looking at it through this lens can help to identify his motive.

    So what would a con artist want with a social network, and why would he want to alienate a whole lot of people, and get a lot of haters?

    Well, the truth is that a con artist doesn’t need everyone to believe in them to make money - they just need the marks to believe in them. Con artists don’t want the people who see through the con (call them the haters for lack of a better word) to interfere with their marks though. At the small scale - e.g. a street con, the con artist might separate a couple where one partner is the mark, to prevent the other from alerting their partner to the scam. But in addition to separating the marks from the haters, con artists use brainwashing techniques to create a psychological barrier between the marks and the haters. A Nigerian Prince scammer might try to convince a mark that their accountant can’t be trusted. A religious cult con might brainwash followers to think their family are different from them, and if they try to provide external perspective, they are acting as the devil. They try to make the marks the in-group, and everyone else, even family and friends, the out-group who doesn’t care about the in-group.

    So what would a con artist in control of a social network do? They would start by giving the con artist the megaphone - amplifying everything the artist says to try to get more marks. In parallel, they’d try to get rid of the haters. They could shadow-ban them so the marks never see what they have to say, or they could put up small barriers the marks will happily jump over, and feel more invested in the platform having done that, but which would scare off the haters. However, the marks and the haters might still interact off the social network - so the scam artist would also want to create a culture war to try to make the marks hate the haters, and ignore anything they say, by amplifying messages hostile to the haters.

    So what can you do if you don’t want a world wrecked by divisions sewn just so billionaires can be even richer? My suggestion is don’t buy into the divisions - work to find common ground with people, even if others are saying just to ignore them because they are different and will never get it, and get in early before the divisions are too deep.


  • I suggest having a threat model about what attack(s) your security is protecting against.

    I’d suggest this probably isn’t giving much extra security over a long unique password for your password manager:

    • A remote attacker who doesn’t control your machine, but is trying to phish you will succeed the same - dependent on your practices and password manager to prevent copying text.
    • A remote attacker who does control your machine will also not be affected. Once your password database in the password manager is decrypted, they can take the whole thing, whether or not you used a password or hardware key to decrypt it. The only difference is maybe they need slightly more technical skill than copying the file + using a keylogger - but the biggest threats probably automate this anyway and there is no material difference.
    • A local attacker who makes a single entry to steal your hardware, and then tries to extract data from it, is either advantaged by having a hardware key (if they can steal it, and you don’t also use a password), or is in a neutral position (can’t crack the locked password safe protected by password, don’t have the hardware key / can’t bypass its physical security). It might be an advantage if you can physically protect your hardware key (e.g. take it with you, and your threat model is people who take the database while you are away from it), if you can’t remember a sufficiently unique passphrase.
    • A local attacker who can make a surreptitious entry, and then come back later for the results is in basically the same position as a remote attacker who does control your machine after the first visit.

    That said, it might be able to give you more convenience at the expense of slightly less security - particularly if your threat model is entirely around remote attackers - on the convenience/security trade-off. You would touch a button to decrypt instead of entering a long passphrase.


  • A1kmm@lemmy.amxl.comtoPrivacy@lemmy.ml*deleted by creator*
    link
    fedilink
    English
    arrow-up
    3
    ·
    7 months ago

    I thought the orbs were supposedly open source

    No they are proprietary as a whole. Parts of the hardware design are published, and parts of the software that runs on them, but not the whole thing.

    Fundamentally Worldcoin is about ‘one person, one vote’, and anyone can create millions of fake iris images; the point of the orb is that it is ‘blessed’ hardware using trusted computing (or to use the term coined by the FSF, treacherous computing) and tamper detection to make sure that a central authority (namely Sam Altman’s Worldcoin foundation) has signed off on the orb running the exact secret / proprietary software running on the orb that generates an identity.

    They could have alternatively have built a system that leverages government identity using zero-knowledge proof of possession of a government-signed digital identity document. But I think their fundamental thesis is that they are trustworthy to be a central authority who could create millions of fake identities if they wanted, but that governments are not.


  • One of the key tenets of keeping something computerised secure is ‘Defence in Depth’ - i.e. having multiple layers of defence, so that even if one layer is breached, the next layer (which you thought was redundant and unnecessary) prevents the attack.

    Running a fully patched kernel and services / applications should protect you unless someone has a 0-day (i.e. not disclosed) exploit. Reducing the surface area by minimising what services / applications are running, using software (firejail etc…) and firewalls to limit permissions of applications / services to what is needed, etc… serves as another layer of defence. Disconnecting or physically blocking peripherals that might allow for spying is another layer; it serves its purpose if all the other layers are breached.