• FauxLiving@lemmy.world
    link
    fedilink
    English
    arrow-up
    9
    arrow-down
    3
    ·
    edit-2
    55 minutes ago

    People cheering for this have no idea of the consequence of their copyright-maximalist position.

    If using images, text, etc to train a model is copyright infringement then there will NO open models because open source model creators could not possibly obtain all of the licensing for every piece of written or visual media in the Common Crawl dataset, which is what most of these things are trained on.

    As it stands now, corporations don’t have a monopoly on AI specifically because copyright doesn’t apply to AI training. Everyone has access to Common Crawl and the other large, public, datasets made from crawling the public Internet and so anyone can train a model on their own without worrying about obtaining billions of different licenses from every single individual who has ever written a word or drawn a picture.

    If there is a ruling that training violates copyright then the only entities that could possibly afford to train LLMs or diffusion models are companies that own a large amount of copyrighted materials. Sure, one company will lose a lot of money and/or be destroyed, but the legal president would be set so that it is impossible for anyone that doesn’t have billions of dollars to train AI.

    People are shortsightedly seeing this as a victory for artists or some other nonsense. It’s not. This is a fight where large copyright holders (Disney and other large publishing companies) want to completely own the ability to train AI because they own most of the large stores of copyrighted material.

    If the copyright holders win this then the open source training material, like Common Crawl, would be completely unusable to train models in the US/the West because any person who has ever posted anything to the Internet in the last 25 years could simply sue for copyright infringement.

    • LustyArgonian@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      1
      ·
      44 minutes ago

      Copyright is a leftover mechanism from slavery and it will be interesting to see how it gets challenged here, given that the wealthy view AI as an extension of themselves and not as a normal employee. Genuinely think the copyright cases from AI will be huge.

    • JustARaccoon@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      ·
      1 hour ago

      In theory sure, but in practice who has the resources to do large scale model training on huge datasets other than large corporations?

      • FauxLiving@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        1
        ·
        edit-2
        52 minutes ago

        Distributed computing projects, large non-profits, people in the near future with much more powerful and cheaper hardware, governments which are interested in providing public services to their citizens, etc.

        Look at other large technology projects. The Human Genome Project spent $3 billion to sequence the first genome but now you can have it done for around $500. This cost reduction is due to the massive, combined effort of tens of thousands of independent scientists working on the same problem. It isn’t something that would have happened if Purdue Pharma owned the sequencing process and required every scientist to purchase a license from them in order to do research.

        LLM and diffusion models are trained on the works of everyone who’s ever been online. This work, generated by billions of human-hours, is stored in the Common Crawl datasets and is freely available to anyone who wants it. This data is both priceless and owned by everyone. We should not be cheering for a world where it is illegal to use this dataset that we all created and, instead, we are forced to license massive datasets from publishing companies.

        The amount of progress on these types of models would immediately stop, there would be 3-4 corporations would could afford the licenses. They would have a de facto monopoly on LLMs and could enshittify them without worry of competition.

  • jsomae@lemmy.ml
    link
    fedilink
    English
    arrow-up
    2
    arrow-down
    1
    ·
    1 hour ago

    Would really love to see IP law get taken down a notch out of this.

  • PushButton@lemmy.world
    link
    fedilink
    English
    arrow-up
    43
    arrow-down
    1
    ·
    5 hours ago

    Let’s go baby! The law is the law, and it applies to everybody

    If the “genie doesn’t go back in the bottle”, make him pay for what he’s stealing.

    • Zetta@mander.xyz
      link
      fedilink
      English
      arrow-up
      25
      arrow-down
      1
      ·
      5 hours ago

      The law absolutely does not apply to everybody, and you are well aware of that.

    • SugarCatDestroyer@lemmy.world
      link
      fedilink
      English
      arrow-up
      5
      ·
      5 hours ago

      I just remembered the movie where the genie was released from the bottle of a real genie, he turned the world into chaos by freeing his own kind, and if it weren’t for the power of the plot, I’m afraid people there would have become slaves or died out.

      Although here it is already necessary to file a lawsuit for theft of the soul in the literal sense of the word.

        • SugarCatDestroyer@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          3 hours ago

          Damn, what did you watch those masterpieces on? What kind of smoke were you sitting on then? Although I don’t know what secret materials you’re talking about. Maybe I watched something wrong… And what an episode?

      • BussyCat@lemmy.world
        link
        fedilink
        English
        arrow-up
        5
        arrow-down
        2
        ·
        4 hours ago

        It’s not because they would only train on things they own which is an absolute tiny fraction of everything that everyone owns. It’s like complaining that a rich person gets to enjoy their lavish estate when the alternative is they get to use everybody’s home in the world.

          • BussyCat@lemmy.world
            link
            fedilink
            English
            arrow-up
            3
            ·
            3 hours ago

            They have 0.2T in assets the world has around 660T in assets which as I said before is a tiny fraction. Obviously both hold a lot of assets that aren’t worthwhile to AI training such as theme parks but when you consider a single movie that might be worth millions or billions has the same benefit for AI training as another movie worth thousands. the amount of assets Disney owned is not nearly as relevant as you are making it out to be

  • SugarCatDestroyer@lemmy.world
    link
    fedilink
    English
    arrow-up
    22
    arrow-down
    1
    ·
    edit-2
    5 hours ago

    Unfortunately, this will probably lead to nothing: in our world, only the poor seem to be punished for stealing. Well, corporations always get away with everything, so we sit on the couch and shout “YES!!!” for the fact that they are trying to console us with this.

    • Modern_medicine_isnt@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      3 hours ago

      This issue is not so cut and dry. The AI companies are stealing from other companies more than ftom individual people. Publishing companies are owned by some very rich people. And they want thier cut.

      This case may have started out with authors, but it is mentioned that it could turn into publishing companies vs AI companies.

  • Deflated0ne@lemmy.world
    link
    fedilink
    English
    arrow-up
    26
    arrow-down
    1
    ·
    5 hours ago

    Good. Burn it down. Bankrupt them.

    If it’s so “critical to national security” then nationalize it.

    • A Wild Mimic appears!@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      6
      arrow-down
      1
      ·
      4 hours ago

      the “burn it down” variant would only lead to the scenario where the copyright holders become the AI companies, since they have the content to train it. AI will not go away, it might change ownership to someone worse tho.

      nationalizing sounds better; even better were to put in under UNESCO-stewardship.

  • A Wild Mimic appears!@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    10
    ·
    edit-2
    4 hours ago

    So, the US now has a choice: rescue AI and fix their fucked up copyright system, or rescue the fucked up copyright system and fuck up AI companies. I’m interested in the decision.

    I’d personally say that the copyright system needs to be fixed anyway, because it’s currently just a club for the RIAA&MPAA to wield against everyone (remember the lawsuits against single persons with alleged damages in the millions for downloading a few songs? or the current tries to fuck over the internet archive?). Should the copyright side win, then we can say goodbye to things like the internet archive or open source-AI; copyright holders will then be the AI-companies, since they have the content.

  • Treczoks@lemmy.world
    link
    fedilink
    English
    arrow-up
    24
    arrow-down
    1
    ·
    6 hours ago

    Well, theft has never been the best foundation for a business, has it?

    While I completely agree that copyright terms are completely overblown, they are valid law that other people suffer under, so it is 100% fair to make them suffer the same. Or worse, as they all broke the law for commercial gain.

  • Null User Object@lemmy.world
    link
    fedilink
    English
    arrow-up
    159
    arrow-down
    4
    ·
    8 hours ago

    threatens to “financially ruin” the entire AI industry

    No. Just the LLM industry and AI slop image and video generation industries. All of the legitimate uses of AI (drug discovery, finding solar panel improvements, self driving vehicles, etc) are all completely immune from this lawsuit, because they’re not dependent on stealing other people’s work.

  • halcyoncmdr@lemmy.world
    link
    fedilink
    English
    arrow-up
    103
    arrow-down
    1
    ·
    8 hours ago

    As Anthropic argued, it now “faces hundreds of billions of dollars in potential damages liability at trial in four months” based on a class certification rushed at “warp speed” that involves “up to seven million potential claimants, whose works span a century of publishing history,” each possibly triggering a $150,000 fine.

    So you knew what stealing the copyrighted works could result in, and your defense is that you stole too much? That’s not how that works.

    • zlatko@programming.dev
      link
      fedilink
      English
      arrow-up
      26
      ·
      6 hours ago

      Actually that usually is how it works. Unfortunately.

      *Too big to fail" was probably made up by the big ones.

    • Rivalarrival@lemmy.today
      link
      fedilink
      English
      arrow-up
      15
      arrow-down
      14
      ·
      6 hours ago

      The purpose of copyright is to drive works into the public domain. Works are only supposed to remain exclusive to the artist for a very limited time, not a “century of publishing history”.

      The copyright industry should lose this battle. Copyright exclusivity should be shorter than patent exclusivity.

        • Rivalarrival@lemmy.today
          link
          fedilink
          English
          arrow-up
          7
          arrow-down
          17
          ·
          6 hours ago

          Their winning of the case reinforces a harmful precedent.

          At the very least, the claims of those members of the class that are based on >20-year copyrights should be summarily rejected.

          • snooggums@lemmy.world
            link
            fedilink
            English
            arrow-up
            11
            arrow-down
            3
            ·
            5 hours ago

            Copyright owners winning the case maintains the status quo.

            The AI companies winning the case means anything leaked on the internet or even just hosted by a company can be used by anyone, including private photos and communication.

  • westingham@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    31
    ·
    8 hours ago

    I was reading the article and thinking “suck a dick, AI companies” but then it mentions the EFF and ALA filed against the class action. I have found those organizations to be generally reputable and on the right side of history, so now I’m wondering what the problem is.

      • Jason2357@lemmy.ca
        link
        fedilink
        English
        arrow-up
        10
        ·
        5 hours ago

        Take scraping. Companies like Clearview will tell you that scraping is legal under copyright law. They’ll tell you that training a model with scraped data is also not a copyright infringement. They’re right.

        I love Cory’s writing, but while he does a masterful job of defending scraping, and makes a good argument that in most cases, it’s laws other than Copyright that should be the battleground, he does, kinda, trip over the main point.

        That is that training models on creative works and then selling access to the derivative “creative” works that those models output very much falls within the domain of copyright - on either side of a grey line we usually call “fair use” that hasn’t been really tested in courts.

        Lets take two absurd extremes to make the point. Say I train an LLM directly on Marvel movies, and then sell movies (or maybe movie scripts) that are almost identical to existing Marvel movies (maybe with a few key names and features altered). I don’t think anyone would argue that is not a derivative work, or that falls under “fair use.” However, if I used literature to train my LLM to be able to read, and used that to read street signs for my self-driving car, well, yeah, that might be something you could argue is “fair use” to sell. It’s not producing copy-cat literature.

        I agree with Cory that scraping, per se, is absolutely fine, and even re-distributing the results in some ways that are in the public interest or fall under “fair use”, but it’s hard to justify the slop machines as not a copyright problem.

        In the end, yeah, fuck both sides anyway. Copyright was extended too far and used for far too much, and the AI companies are absolute thieves. I have no illusions this type of court case will do anything more than shift wealth from one robber-barron to another, and won’t help artists and authors.

        • FauxLiving@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          2 hours ago

          Say I train an LLM directly on Marvel movies, and then sell movies (or maybe movie scripts) that are almost identical to existing Marvel movies (maybe with a few key names and features altered). I don’t think anyone would argue that is not a derivative work, or that falls under “fair use.”

          I think you’re failing to differentiate between a work, which is protected by copyright, vs a tool which is not affected by copyright.

          Say I use Photoshop and Adobe Premiere to create a script and movie which are almost identical to existing Marvel movies. I don’t think anyone would argue that is not a derivative work, or that falls under “fair use”.

          The important part here is that the subject of this sentence is ‘a work which has been created which is substantially similar to an existing copyrighted work’. This situation is already covered by copyright law. If a person draws a Mickey Mouse and tries to sell it then Disney will sue them, not their pencil.

          Specific works are copyrighted and copyright laws create a civil liability for a person who creates works that are substantially similar to a copyrighted work.

          Copyright doesn’t allow publishers to go after Adobe because a person used Photoshop to make a fake Disney poster. This is why things like Bittorrent can legally exist despite being used primarily for copyright violation. Copyright laws apply to people and the works that they create.

          A generated Marvel movie is substantially similar to a copyrighted Marvel movie and so copyright law protects it. A diffusion model is not substantially similar to any copyrighted work by Disney and so copyright laws don’t apply here.

          • Glog78@digitalcourage.social
            link
            fedilink
            arrow-up
            1
            arrow-down
            1
            ·
            edit-2
            1 hour ago

            @FauxLiving @Jason2357

            I take a bold stand on the whole topic:

            I think AI is a big Scam ( pattern matching has nothing to do with !!! intelligence !!! ).

            And this Scam might end as the Dot-Com bubble in the late 90s ( https://en.wikipedia.org/wiki/Dot-com/_bubble ) including the huge economic impact cause to many people have invested in an “idea” not in an proofen technology.

            And as the Dot-Com bubble once the AI bubble has been cleaned up Machine Learning and Vector Databases will stay forever ( maybe some other part of the tech ).

            Both don’t need copyright changes cause they will never try to be one solution for everything. Like a small model to transform text to speech … like a small model to translate … like a full text search using a vector db to index all local documents …

            Like a small tool to sumarize text.

        • kibiz0r@midwest.social
          link
          fedilink
          English
          arrow-up
          2
          ·
          5 hours ago

          I agree, and I think your points line up with Doctorow’s other writing on the subject. It’s just hard to cover everything in one short essay.

    • peoplebeproblems@midwest.social
      link
      fedilink
      English
      arrow-up
      7
      arrow-down
      2
      ·
      5 hours ago

      I disagree with the EFF and ALA on this one.

      These were entire sets of writing consumed and reworked into poor data without respecting the license to them.

      Honestly, I wouldn’t be surprised if copyright wasn’t the only thing to be the problem here, but intellectual property as well. In that case, EFF probably has an interest in that instead. Regardless, I really think it need to be brought through court.

      LLMs are harmful, full stop. Most other Machine Learning mechanisms use licensed data to train. In the case of software as a medical device, such as image analysis AI, that data is protected by HIPPA and special attention is already placed in order to utilize it.

    • pelya@lemmy.world
      link
      fedilink
      English
      arrow-up
      6
      ·
      7 hours ago

      AI coding tools are using the exact same backends as AI fiction writing tools, so it would hurt the fledgling vibe coder profession (which according to proper software developers should not be allowed to exist at all).

  • guyincognito@piefed.social
    link
    fedilink
    English
    arrow-up
    28
    arrow-down
    1
    ·
    edit-2
    8 hours ago

    I thought it was hilarious how there was a quote in the article that said

    immense harm not only to a single AI company, but to the entire fledgling AI industry and to America’s global technological competitiveness

    It will only do this because all these idiotic American companies fired all their employees to replace them with AI. Hire then back and the edge won’t dull. But we all know that they won’t do this and just cry and point fingers wondering how they ever lost a technology race.

    Edited because it’s my first time using quotes and I don’t know how to use them properly haha

  • chaosCruiser@futurology.today
    link
    fedilink
    English
    arrow-up
    57
    arrow-down
    4
    ·
    10 hours ago

    Oh no! Building a product with stolen data was a rotten idea after all. Well, at least the AI companies can use their fabulously genius PhD level LLMs to weasel their way out of all these lawsuits. Right?

    • Rooskie91@discuss.online
      link
      fedilink
      English
      arrow-up
      40
      arrow-down
      2
      ·
      9 hours ago

      I propose that anyone defending themselves in court over AI stealing data must be represented exclusively by AI.

    • thesohoriots@lemmy.world
      link
      fedilink
      English
      arrow-up
      6
      ·
      6 hours ago

      PhD level LLM = paying MAs $21/hr to write summaries of paragraphs for them to improve off of. Google Gemini outsourced their work like this, so I assume everyone else did too.

  • 9point6@lemmy.world
    link
    fedilink
    English
    arrow-up
    27
    arrow-down
    4
    ·
    9 hours ago

    Probably would have been cheaper to license everything you stole, eh, Anthropic?

  • HugeNerd@lemmy.ca
    link
    fedilink
    English
    arrow-up
    3
    arrow-down
    4
    ·
    3 hours ago

    Too late. The systems we are building as a species will soon become sentient. We’ll have aliens right here, no UFOs required. Where the music comes from will no longer be relevant.

    • explodicle@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      1
      ·
      3 hours ago

      Ok perfect so since AGI is right around the corner and this is all irrelevant, then I’m sure the AI companies won’t mind paying up.

      • HugeNerd@lemmy.ca
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        2
        ·
        2 hours ago

        That’s not the way it works. Do you think the Roman Empire just picked a particular Tuesday to collapse? It’s a process and will take a while.