We Asked A.I. to Create the Joker. It Generated a Copyrighted Image.::Artists and researchers are exposing copyrighted material hidden within A.I. tools, raising fresh legal questions.

  • dragontamer@lemmy.world
    link
    fedilink
    English
    arrow-up
    94
    arrow-down
    25
    ·
    edit-2
    9 months ago

    Because this proves that the “AI”, at some level, is storing the data of the Joker movie screenshot somewhere inside of its training set.

    Likely because the “AI” was trained upon this image at some point. This has repercussions with regards to copyright law. It means the training set contains copyrighted data and the use of said training set could be argued as piracy.

    Legal discussions on how to talk about generative-AI are only happening now, now that people can experiment with the technology. But its not like our laws have changed, copyright infringement is copyright infringement. If the training data is obviously copyright infringement, then the data must be retrained in a more appropriate manner.

    • abhibeckert@lemmy.world
      link
      fedilink
      English
      arrow-up
      46
      arrow-down
      17
      ·
      edit-2
      9 months ago

      But where is the infringement?

      This NYT article includes the same several copyrighted images and they surely haven’t paid any license. It’s obviously fair use in both cases and NYT’s claim that “it might not be fair use” is just ridiculous.

      Worse, the NYT also includes exact copies of the images, while the AI ones are just very close to the original. That’s like the difference between uploading a video of yourself playing a Taylor Swift cover and actually uploading one of Taylor Swift’s own music videos to YouTube.

      Even worse the NYT intentionally distributed the copyrighted images, while Midjourney did so unintentionally and specifically states it’s a breach of their terms of service. Your account might be banned if you’re caught using these prompts.

      • jacksilver@lemmy.world
        link
        fedilink
        English
        arrow-up
        34
        arrow-down
        6
        ·
        9 months ago

        You do realize that newspapers do typically pay the licensing for images, it’s how things like Getty images exist.

        On the flip side, OpenAI (and other companies) are charging someone access to their model, which is then returning copyrighted images without paying the original creator.

        That’s why situations like this keep getting talked about, you have a 3rd party charging people for copyrighted materials. We can argue that it’s a tool, so you aren’t really “selling” copyrighted data, but that’s the issue that is generally be discussed in these kinds of articles/court cases.

        • ApollosArrow@lemmy.world
          link
          fedilink
          English
          arrow-up
          3
          ·
          9 months ago

          Mostly playing devil’s advocate here (since I don’t think ai should be used commercially), but I’m actually curious about this, since I work in media… You can get away using images or footage for free if it falls under editorial or educational purposes. I know this can vary from place to place, but with a lot of online news sites now charging people to view their content, they could potentially be seen as making money off of copyrighted material, couldn’t they?

          • jacksilver@lemmy.world
            link
            fedilink
            English
            arrow-up
            3
            ·
            9 months ago

            It’s not a topic that I’m super well versed in, but here is a thread from a photography forum indicating that news organizations can’t take advantage of fair use https://www.dpreview.com/forums/thread/4183940.

            I think these kinds of stringent rules are why so many are up in arms about how AI is being used. It’s effectively a way for big players to circumvent paying the people who out all the work into the art/music/voice acting/etc. The models would be nothing without the copyrighted material, yet no one seems to want to pay those people.

            It gets more interesting when you realize that long term we still need people creating lots of content if we want these models to be able to create things around concepts that don’t yet exist (new characters, genres of music, etc.)

      • dragontamer@lemmy.world
        link
        fedilink
        English
        arrow-up
        6
        arrow-down
        2
        ·
        9 months ago

        But where is the infringement?

        Do Training weights have the data? Are the servers copying said data on a mass scale, in a way that the original copyrighters don’t want or can’t control?

        • orclev@lemmy.world
          link
          fedilink
          English
          arrow-up
          14
          arrow-down
          6
          ·
          9 months ago

          Data is not copyrighted, only the image is. Furthermore you can not copyright a number, even though you could use a sufficiently large number to completely represent a specific image. There’s also the fact that copyright does not protect possession of works, only distribution of them. If I obtained a copyrighted work no matter the means chosen to do so, I’ve committed no crime so long as I don’t duplicate that work. This gets into a legal grey area around computers and the fundamental way they work, but it was already kind of fuzzy if you really think about it anyway. Does viewing a copyrighted image violate copyright? The visual data of that image has been copied into your brain. You have the memory of that image. If you have the talent you could even reproduce that copyrighted work so clearly a copy of it exists in your brain.

          • dragontamer@lemmy.world
            link
            fedilink
            English
            arrow-up
            5
            arrow-down
            5
            ·
            edit-2
            9 months ago

            only distribution of them.

            Yeah. And the hard drives and networks that pass Midjourney’s network weights around?

            That’s distribution. Did Midjourney obtain a license from the artists to allow large numbers of “Joker” copyrighted data to be copied on a ton of servers in their data-center so that Midjourney can run? They’re clearly letting the public use this data.

            • orclev@lemmy.world
              link
              fedilink
              English
              arrow-up
              7
              arrow-down
              3
              ·
              9 months ago

              Because they’re not copying around images of Joker, they’re copying around a work derived from many many things including images of Joker. Copying a derived work does not violate the copyright of the work it was derived from. The wrinkle in this case is that you can extract something very similar to the original works back out of the derived work after the fact. It would be like if you could bake a cake, pass it around, and then down the line pull a whole egg back out of it. Maybe not the exact egg you started with, but one very similar to it. This is a situation completely unlike anything that’s come before it which is why it’s not actually covered by copyright. New laws will need to be drafted (or at a bare minimum legal judgements made) to decide how exactly this situation should be handled.

              • archomrade [he/him]@midwest.social
                link
                fedilink
                English
                arrow-up
                5
                arrow-down
                2
                ·
                9 months ago

                Someone already downvoted you but this is exactly the topic of debate surrounding this issue.

                Other recognized fair-use exemptions have similar interpretations: a computer model analyzes a large corpus of copyrighted work for the purposes of being able to search their contents and retrieve relevant snippets and works based on semantic and abstract similarities. The computer model that is the representation of those works for that purpose is fair use: it contains only factual information about those works. It doesn’t matter if the works used for that model were unlicensed: the model is considered fair use.

                AI models operate by a very similar method, albeit one with a lot more complexity. But the model doesn’t contain copyrighted works, it is only itself a collection of factual information about the copyrighted works. The novel part of this case is that it can be used to re-construct expressions very similar to the original (it should be pointed out that the fidelity is often very low, and the more detailed the output the less like the original it becomes). It isn’t settled yet if that fact changes this interpretation, but regardless I think copyright is already not the right avenue to pursue, if the goal is to remediate or prevent harm to creators and encourage novel expressions.

                • orclev@lemmy.world
                  link
                  fedilink
                  English
                  arrow-up
                  3
                  arrow-down
                  1
                  ·
                  edit-2
                  9 months ago

                  Right, you’re basically making the same points as me, although technically the model itself is a copyrighted work. Part of the problem we’re running into these days is that copyright, patent, trademark, and trade secret, all date from a time when the difference between those things was fairly intuitive. With our modern digital world with things like 3D printers and the ease with which you can rapidly change the formats and encodings of arbitrary pieces of data the lines all start to blur together.

                  If you have a 3D scan of a statue of pikachu what rights are involved there? What if you print it? What if you use that model to generate a PNG? What if you print that PNG? What if you encode the model file using base64 and embed it in the middle of a gif of Rick Astley?

                  Corporations have already utterly fucked all our IP laws, it might be time to go back to the drawing board and reevaluate the whole thing, because what we have now often feels like it has more cracks than actual substance.

                  • archomrade [he/him]@midwest.social
                    link
                    fedilink
                    English
                    arrow-up
                    4
                    ·
                    9 months ago

                    Yea, sorry if it wasn’t clear, but I was agreeing with you (defending against the downvote).

                    There are a lot of things at play here, even if there seems to be a clear way to interpret copyright law (that’s untested, but still) that would determine the models being a fair use. I think people are rightfully angry/frustrated with the size of these companies building the models, and the risk posed by private ownership over them. If I were inclined to be idealistic, I would say that the models should be in the public domain and the taxes should be used so as to provide a UBI to counter any job loss/efficiencies provided by the automation, but that’s a tall order.

              • dragontamer@lemmy.world
                link
                fedilink
                English
                arrow-up
                1
                arrow-down
                1
                ·
                9 months ago

                derived

                https://www.law.cornell.edu/wex/derivative_work

                Copyrights allow their owners to decide how their works can be used, including creating new derivative works off of the original product. Derivative works can be created with the permission of the copyright owner or from works in the public domain. In order to receive copyright protection, a derivative work must add a sufficient amount of change to the original work.

                Are you just making shit up?

        • abhibeckert@lemmy.world
          link
          fedilink
          English
          arrow-up
          3
          arrow-down
          2
          ·
          9 months ago

          Do Training weights have the data?

          The answer to that question is extensively documented by thousands of research papers - it’s not up for debate.

        • Auli@lemmy.ca
          link
          fedilink
          English
          arrow-up
          1
          arrow-down
          1
          ·
          9 months ago

          There response well be we don’t know we can’t understand what its doing.

          • dragontamer@lemmy.world
            link
            fedilink
            English
            arrow-up
            3
            arrow-down
            7
            ·
            edit-2
            9 months ago

            There response well be we don’t know we can’t understand what its doing.

            What the fuck is this kind of response? Its just a fucking neural network running on GPUs with convolutional kernels. For fucks sake, turn on your damn brain.

            Generative AI is actually one of the easier subjects to comprehend here. Its just calculus. Use of derivatives to backpropagate weights in such a way that minimizes error. Lather-rinse-repeat for a billion iterations on a mass of GPUs (ie: 20 TFlop compute systems) for several weeks.

            Come on, this stuff is well understood by Comp. Sci by now. Not only 20 years ago when I learned about this stuff, but today now that AI is all hype, more and more people are understanding the basics.

            • Mirodir@discuss.tchncs.de
              link
              fedilink
              English
              arrow-up
              9
              arrow-down
              2
              ·
              9 months ago

              Understanding the math behind it doesn’t immediately mean understanding the decision progress during forward propagation. Of course you can mathematically follow it, but you’re quickly gonna lose the overview with that many weights. There’s a reason XAI is an entire subfield in Machine Learning.

              • dragontamer@lemmy.world
                link
                fedilink
                English
                arrow-up
                1
                ·
                9 months ago

                Understanding the math behind it doesn’t immediately mean understanding the decision progress during forward propagation.

                Ummm… its lossy compressed data from the training set.

                Is it a perfect copy? No. But copyright law covers “derivative data” so whatever, the law remains clear on this situation.

    • GenderNeutralBro@lemmy.sdf.org
      link
      fedilink
      English
      arrow-up
      21
      arrow-down
      5
      ·
      9 months ago

      Because this proves that the “AI”, at some level, is storing the data of the Joker movie

      I don’t think that’s a justified conclusion.

      If I watched a movie, and you asked me to reproduce a simple scene from it, then I could do that if I remembered the character design, angle, framing, etc. None of this would require storing the image, only remembering the visual meaning of it and how to represent that with the tools at my disposal.

      If I reproduced it that closely (or even not-nearly-that-closely), then yes, my work would be considered a copyright violation. I would not be able to publish and profit off of it. But that’s on me, not on whoever made the tools I used. The violation is in the result, not the tools.

      The problem with these claims is that they are shifting the responsibility for copyright violation off of the people creating the art, and onto the people making the tools used to create the art. I could make the same image in Photoshop; are they going after Adobe, too? Of course not. You can make copyright-violating work in any medium, with any tools. Midjourney is a tool with enough flexibility to create almost any image you can imagine, just like Photoshop.

      Does it really matter if it takes a few minutes instead of hours?

      • rambaroo@lemmy.world
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        4
        ·
        edit-2
        9 months ago

        AIs are not humans my dude. I don’t know why people keep using this argument. They specifically designed this thing to scrape copyrighted material, it’s not like an artist who was just inspired by something.

        • GenderNeutralBro@lemmy.sdf.org
          link
          fedilink
          English
          arrow-up
          6
          arrow-down
          3
          ·
          9 months ago

          Photoshop is not human. AutoTune is not human. Cameras are not human. Microphones are not human. Paintbrushes are not human. Etc.

          AI did not create this. A HUMAN created this with AI. The human is responsible for the creating it. The human is responsible for publishing it.

          Please stop anthropomorphizing AI!

        • archomrade [he/him]@midwest.social
          link
          fedilink
          English
          arrow-up
          4
          arrow-down
          2
          ·
          9 months ago

          It isn’t human, but that IS how it works.

          It’s analyzing material and extracting data about it, not compiling the data itself. In much the same way TDM (textual data mining) analyzes text and extracts information about it for the purposes of search and classification, or sentiment analysis, ECT, an “AI” model analyses material and extracts information on how to construct new language or visual media that relates to text prompts.

          It’s important to understand this because it’s core to the fair use defence getting claimed. The models are derived from copyrighted works, but they aren’t themselves infringing. There is precedent for similar cases being fair use.

    • archomrade [he/him]@midwest.social
      link
      fedilink
      English
      arrow-up
      15
      arrow-down
      4
      ·
      9 months ago

      I’ve had this discussion before, but that’s not how copyright exceptions work.

      Right or wrong (it hasn’t been litigated yet), AI models are being claimed as fair use exceptions to the use of copyrighted material. Similar to other fair uses, the argument goes something like:

      “The AI model is simply a digital representation of facts gleamed from the analysis of copyrighted works, and since factual data cannot be copyrighted (e.g. a description of the Mona Lisa vs the painting itself), the model itself is fair use”

      I think it’ll boil down to whether the models can be easily used as replacements to the works being claimed, and honestly I think that’ll fail. That the models are quite good at reconstructing common expressions of copyrighted work is novel to the case law, though, and worthy of investigation.

      But as someone who thinks ownership of expressions is bullshit anyway, I tend to think copyright is not the right way to go about penalizing or preventing the harm caused by the technology.

      • rottingleaf@lemmy.zip
        link
        fedilink
        English
        arrow-up
        7
        ·
        9 months ago

        “The AI model is simply a digital representation of facts gleamed from the analysis of copyrighted works, and since factual data cannot be copyrighted (e.g. a description of the Mona Lisa vs the painting itself), the model itself is fair use”

        So selling fan fiction and fan-made game continuations and modifications should be legal?

        • archomrade [he/him]@midwest.social
          link
          fedilink
          English
          arrow-up
          5
          arrow-down
          2
          ·
          9 months ago

          It should, but also that is significantly different from what an AI model is.

          It would be more like a list of facts and information about the structure of another work, and facts and patterns about lots of other similar works; and that list of facts can easily be used to create other, very similar works, but also it can be used to create entirely new works that follow patters from the other works.

          In as much as the model can be used to create infringing works -but is not one itself- makes this similar to other cases where a platform or tool can be used in infringing ways. In such cases, if the platform or tool is responsible for reasonable protections from such uses, then they aren’t held liable themselves. Think Youtube DMCA, Facebook content moderation, or even Google Books search. I think this is likely the way this goes; there is just too strong a case (with precedent) that the model is fair use.

        • Womble@lemmy.world
          link
          fedilink
          English
          arrow-up
          6
          arrow-down
          3
          ·
          9 months ago

          Not the OP, but yes it absolutely should. The idea you can legaly block someones creative expression because they are using elements of culture you have obtained a monopoly of is obscene.

          • rottingleaf@lemmy.zip
            link
            fedilink
            English
            arrow-up
            3
            arrow-down
            1
            ·
            9 months ago

            I know it should. Only then we’d have no IP remaining. As it should be, the only case where it’s valid is punishing somebody impersonating the author or falsely claiming authorship, and that’s frankly just fraud.

      • kromem@lemmy.world
        link
        fedilink
        English
        arrow-up
        7
        arrow-down
        5
        ·
        9 months ago

        Copyright law is the right tool, but the companies are chasing the wrong side of the equation.

        Training should not and I suspect will not be found to be infringement. If old news articles from the NYT can teach a model language in ways that help it review medical literature to come up with novel approaches to cure cancer, there’s a whole host of features from public good to transformational use going on.

        What they should be throwing resources at is policing usage not training. Make the case that OpenAI is liable for infringing generation. Ensure that there needs to be copyright checking on outputs. In many ways this feels like a repeat of IP criticisms around the time Google acquired YouTube which were solved with an IP tagging system.

        • azuth@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          6
          arrow-down
          2
          ·
          9 months ago

          Should Photoshop check your image for copyright infringement? Should Adobe be liable for copyright infringing or offensive images users of it’s program create?

          • kromem@lemmy.world
            link
            fedilink
            English
            arrow-up
            4
            arrow-down
            2
            ·
            9 months ago

            If it’s contributing creatively to your work, yeah, totally.

            If you ask Photoshop fill to add an italian plumber and you’ve been living under a rock for you life so you don’t realize it’s Mario, when you get sued by Nintendo for copyright infringement it’d be much better policy if it was Adobe on the hook for adding copyrighted material and not the end user.

            A better analogy is: if you hired a graphic designer and they gave you copyrighted material, who is liable?

            • azuth@sh.itjust.works
              link
              fedilink
              English
              arrow-up
              1
              arrow-down
              1
              ·
              9 months ago

              If it’s contributing creatively to your work, yeah, totally.

              AI is not contributing creatively though, programs do not create.

              If you ask Photoshop fill to add an italian plumber and you’ve been living under a rock for you life so you don’t realize it’s Mario, when you get sued by Nintendo for copyright infringement it’d be much better policy if it was Adobe on the hook for adding copyrighted material and not the end user.

              I am speaking of Photoshop used as a non-AI tool as it has been used to commit copyright infringement for decades before Photoshop fill was a thing. Should it check if your image infringes on copyright?

              A better analogy is: if you hired a graphic designer and they gave you copyrighted material, who is liable?

              The graphic designer. If you went ahead and redistributed it you would also be liable. Whatever program he used or it’s developer wouldn’t be liable.

              • kromem@lemmy.world
                link
                fedilink
                English
                arrow-up
                3
                arrow-down
                1
                ·
                9 months ago

                AI is not contributing creatively though, programs do not create.

                You and I will have to agree to disagree on that Kool-aid, and it’s that disagreement which is core to the model provider being liable for introducing copyright infringement.

          • wildginger@lemmy.myserv.one
            link
            fedilink
            English
            arrow-up
            2
            arrow-down
            1
            ·
            9 months ago

            Did photoshop create a portion of my image? Did adobe add a “generate the picture I asked for, for me, without my input beyond a typed prompt” as a feature?

            Because if they did, 100% yeah, theyre liable.

            • azuth@sh.itjust.works
              link
              fedilink
              English
              arrow-up
              1
              arrow-down
              2
              ·
              9 months ago

              They actually are not whether you use a prompt to generate the picture or a digitally paint it with a tablet.

              The user would be the one committing copyright infringement.

        • ryathal@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          4
          arrow-down
          2
          ·
          9 months ago

          There’s no money for them in that angle though. It’s much easier to sue xerox for enabling copyright violations than the person who used the machine to violate copyright.

          Courts have already handled this with copy machines. AI isn’t terribly different, it’s unlikely these suits against model creators succeed.

          • kromem@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            arrow-down
            1
            ·
            edit-2
            9 months ago

            There’s money (and more importantly, survival) if they can ensure liability of Xerox for infringement on the use of their centralized copiers.

            There actually isn’t survival as a company even if they succeed on training but not the other, which I don’t think they realize yet.

            As an aside, one of the worst legal takes I read on this was from a GC at the Copyright office during the 70s who extensively used poor analogies to copiers to justify an infringement argument.

    • orclev@lemmy.world
      link
      fedilink
      English
      arrow-up
      7
      arrow-down
      1
      ·
      9 months ago

      Wasn’t that known? Have midjourney ever claimed they didn’t use copyrighted works? There’s also an ongoing argument about the legality of that in general. One recent court case ruled that copyright does not protect a work from being used to train an AI. I’m sure that’s far from the final word on the topic, but it does mean this is a legal grey area at the moment.

      • dragontamer@lemmy.world
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        10
        ·
        edit-2
        9 months ago

        If it is known, then it is copyright infringement to download the training sets and therefore a crime to do so. You cannot reproduce a copy of the works without the express permission of the copyright holder.

        How many computers did Midjourney copy its training weights to? Has Midjourney (and the IT team behind it) paid royalties for every copyrighted image in its training set to have a proper copyright license to copy all of this data from computer to computer?

        I’m guessing no. Which means the Midjourney team (if you say is true) is committing copyright infringement every time they spin up a new server with these weights.


        Pro-AI side will obviously argue that the training weights do not contain the data of these copyrighted works. A claim that is looking more-and-more laughable as these experiments happen.

        • db0@lemmy.dbzer0.com
          link
          fedilink
          English
          arrow-up
          9
          arrow-down
          2
          ·
          9 months ago

          No it’s not illegal to download publicly available content it’s a copyright violation to republish it.

    • Jilanico@lemmy.world
      link
      fedilink
      English
      arrow-up
      11
      arrow-down
      5
      ·
      9 months ago

      Because this proves that the “AI”, at some level, is storing the data of the Joker movie screenshot somewhere inside of its training set.

      Is it tho? Honest question.

        • Jilanico@lemmy.world
          link
          fedilink
          English
          arrow-up
          4
          arrow-down
          2
          ·
          9 months ago

          It’s too hard to type up how generative AIs work, but look up a video on “how stable diffusion works” or something like that. I seriously doubt they have a massive database with every image from the Internet inside it, with the AI just spitting those pics out, but I’m no expert.

        • Schmidtster@lemmynsfw.com
          link
          fedilink
          English
          arrow-up
          2
          arrow-down
          1
          ·
          edit-2
          9 months ago

          I posted it on my website as fan art and it scraped it. I just used a different filter which falls under fair use.

      • ryannathans@aussie.zone
        link
        fedilink
        English
        arrow-up
        3
        arrow-down
        3
        ·
        9 months ago

        Sure, but so is your memory, you could study the originals and re-draw them a similar way.

        • Jilanico@lemmy.world
          link
          fedilink
          English
          arrow-up
          4
          arrow-down
          1
          ·
          9 months ago

          I agree, but I don’t think these generative AIs actually store image files off the Internet in a massive database. I could be wrong.

          • ryannathans@aussie.zone
            link
            fedilink
            English
            arrow-up
            5
            ·
            edit-2
            9 months ago

            That’s correct. The structure of information isn’t anywhere remotely similar to a file or database. Information pixel by pixel isn’t stored, it more loosely remembers correlations and similarities and facts about the content as opposed to storing and copying it

        • Jilanico@lemmy.world
          link
          fedilink
          English
          arrow-up
          4
          arrow-down
          2
          ·
          9 months ago

          So stable diffusion, midjourney, etc., all have massive databases with every picture on the Internet stored in them? I know the AI models are trained on lots of images, but are the images actually stored? I’m skeptical, but I’m no expert.

          • QubaXR@lemmy.world
            link
            fedilink
            English
            arrow-up
            6
            arrow-down
            6
            ·
            9 months ago

            These models were trained on datasets that, without compensating the authors, used their work as training material. It’s not every picture on the net, but a lot of it is scrubbing websites, portfolios and social networks wholesale.

            A similar situation happens with large language models. Recently Meta admitted to using illegally pirated books (Books3 database to be precise) to train their LLM without any plans to compensate the authors, or even as much as paying for a single copy of each book used.

            • Jilanico@lemmy.world
              link
              fedilink
              English
              arrow-up
              5
              arrow-down
              1
              ·
              9 months ago

              Most of the stuff that inspires me probably wasn’t paid for. I just randomly saw it online or on the street, much like an AI.

              AI using straight up pirated content does give me pause tho.

              • QubaXR@lemmy.world
                link
                fedilink
                English
                arrow-up
                4
                arrow-down
                3
                ·
                edit-2
                9 months ago

                I was on the same page as you for the longest time. I cringed at the whole “No AI” movement and artists’ protest. I used the very same idea: Generations of artists honed their skills by observing the masters, copying their techniques and only then developing their own unique style. Why should AI be any different? Surely AI will not just copy works wholesale and instead learn color, composition, texture and other aspects of various works to find it’s own identity.

                It was only when my very own prompts started producing results I started recognizing as “homages” at best and “rip-offs” at worst that gave me a stop.

                I suspect that earlier generations of text to image models had better moderation of training data. As the arms race heated up and pace of development picked up, companies running these services started rapidly incorporating whatever training data they could get their hands on, ethics, copyright or artists’ rights be damned.

                I remember when MidJourney introduced Niji (their anime model) and I could often identify the mangas and characters used to train it. The imagery Niji produced kept certain distinct and unique elements of character designs from that training data - as a result a lot of characters exhibited “Chainsaw Man” pointy teeth and sticking out tongue - without as much as a mention of the source material or even the themes.

            • archomrade [he/him]@midwest.social
              link
              fedilink
              English
              arrow-up
              1
              arrow-down
              3
              ·
              edit-2
              9 months ago

              These models were trained on datasets that, without compensating the authors, used their work as training material.

              Couple things:

              • this doesn’t explain ops question about how the information is stored. On fact op is right, that the images and source material is NOT stored in a database within the model, it basically just stores metadata about the source material as a whole in order to construct new material from text descriptions

              • the use of copyrighted works in the training isn’t necessarily infringing if the model is found to be a fair use, and there is a very strong fair use argument here.

              • QubaXR@lemmy.world
                link
                fedilink
                English
                arrow-up
                3
                ·
                9 months ago

                “metadata” is such a pretty word. How about “recipe” instead? It stores all information necessary to reproduce work verbatim or grab any aspect of it.

                The legal issue of copyright is a tricky one, especially in the US where copyright is often being weaponized by corporations. The gist of it is: The training model itself was an academic endeavor and therefore falls under a fair use. Companies like StabilityAI or OpenAI then used these datasets and monetized products built on them, which in my understanding skims gray zone of being legal.

                If these private for-profit companies simply took the same data and built their own, identical dataset they would be liable to pay the authors for use of their work in commercial product. They go around it by using the existing model, originally created for research and not commercial use.

                Lemmy is full of open source and FOSS enthusiasts, I’m sure someone can explain it better than I do.

                All in all I don’t argue about the legality of AI, but as a professional creative I highlight ethical (plagiarism) risks that are beginning to arise in majority of the models. We all know Joker, Marvel superheroes, popular Disney and WB cartoon characters - and can spot when “our” generations cross the line of copying someone else’s work. But how many of us are familiar with Polish album cover art, Brazilian posters, Chinese film superheroes or Turkish logos? How sure can we be that the work “we” produced using AI is truly original and not a perfect copy of someone else’s work? Does our ignorance excuse this second-hand plagiarism? Or should the companies releasing AI models stop adding features and fix that broken foundation first?

                • archomrade [he/him]@midwest.social
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  arrow-down
                  1
                  ·
                  9 months ago

                  “metadata” is such a pretty word. How about “recipe” instead?

                  Well isn’t recipe another one of those pretty words? ‘Metadata’ is specific to other precedents that deal with computer programs that gather data about works (see Authors Guild, Inc. v. HathiTrust and Authors Guild v. Google), but you’re welcome to challenge the verbiage if you don’t like it. Regardless, what we’re discussing is objectively something that describes copyrighted works, not copies or a copy of the works themselves. A computer program that is very good at analyzing textual/pixelated data is still only analyzing data, it is itself a novel, non-expressive factual representation of other expressive works, and because of this, it cannot be considered as infringement on its own.

                  It stores all information necessary to reproduce work verbatim or grab any aspect of it.

                  This isn’t really true, at least not for the majority of works analyzed by the model, but granted. If a person uses a tool to copy the work of another person, it is the person who is doing the copying, not the tool. I think it is far more reasonable to hold an individual who uses an AI model to infringe on a copyright responsible. If someone chooses to author a work with the use of a tool that does the work for them (in part or in whole), it is more than reasonable to expect that individual to check the work that is being produced.

                  All in all I don’t argue about the legality of AI, but as a professional creative I highlight ethical (plagiarism) risks that are beginning to arise in majority of the models.

                  As a professional creative myself, I think this is a load of horseshit. We always hold individual authors responsible for the work that they publish, and it should be no different here. That some choose to be lazy and careless is more of a reflection of them.

                  How sure can we be that the work “we” produced using AI is truly original and not a perfect copy of someone else’s work?

                  If you have the words to describe a desired image/text response to the model that produce a ‘perfect copy of someone else’s work’, then we have the words to search for that work, too.

                  Or should the companies releasing AI models stop adding features and fix that broken foundation first?

                  How about we stop expanding the scope of an already broken copyright law and fix that broken foundation first?

    • orclev@lemmy.world
      link
      fedilink
      English
      arrow-up
      7
      arrow-down
      2
      ·
      9 months ago

      If the training data is obviously copyright infringement, then the data must be retrained in a more appropriate manner.

      This is the crux of the issue, it isn’t obviously copyright infringement. Currently copyright is completely silent on the matter one way or another.

      The thing that makes this particularly interesting is that the traditional copyright maximalists, the ones responsible for ballooning copyright durations from its original reasonable limit of 14 years (plus one renewal) to its current absurd duration of 95 years, also stand to benefit greatly from generative works. Instead of the usual full court press we tend to see from the major corporations around anything copyright related we’re instead seeing them take a rather hands off approach.

      • dragontamer@lemmy.world
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        4
        ·
        9 months ago

        This is the crux of the issue, it isn’t obviously copyright infringement. Currently copyright is completely silent on the matter one way or another.

        Its clear that the training weights have the data on recreating this Joker scene. Its also clear that if the training-data didn’t contain this image, then the copy of the image would never result into the weights that have been copy/pasted everywhere.

        • orclev@lemmy.world
          link
          fedilink
          English
          arrow-up
          6
          arrow-down
          2
          ·
          edit-2
          9 months ago

          Except it isn’t a perfect copy. It’s very similar, but not exact. Additionally for every example you can find where it spits out a nearly identical image you can also find one where it produces nothing like it. Even more complicated you can get images generated that very closely match other copyrighted works, but which the model was never trained on. Does that mean copying the model violates the copyright of a work that it literally couldn’t have included in its data?

          You’re making a lot of assumptions and arguments that copyright covers things that it very much does not cover or at a minimum that it hasn’t (yet) been ruled to cover.

          Legally, as things currently stand, an AI model trained on a copyrighted work is not a copy of that work as far as copyright is concerned. That’s today’s legal reality. That might change in the future, but that’s far from certain, and is a far more nuanced and complicated problem than you’re making it out to be.

          Any legal decision that ruled an AI model is a copy of all the works used to train it would also likely have very far reaching and complicated ramifications. That’s why this needs to be argued out in court, but until then what midjourney is doing is perfectly legal.

          • dragontamer@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            9 months ago

            https://www.law.cornell.edu/wex/derivative_work

            Copyrights allow their owners to decide how their works can be used, including creating new derivative works off of the original product. Derivative works can be created with the permission of the copyright owner or from works in the public domain. In order to receive copyright protection, a derivative work must add a sufficient amount of change to the original work.

            The law is very clear on the nature of derivative works of copyrighted material.

            • orclev@lemmy.world
              link
              fedilink
              English
              arrow-up
              1
              ·
              9 months ago

              Not sure where they’re getting the bit about copyright disallowing derived works as that’s just not true. You can get permission to create a derived work, but you don’t need permission to create a derived work so long as the final result does not substantially consist of the original work.

              Unfortunately what constitutes “substantially” is somewhat vague. Various rulings have been made around that point, but I believe a common figure used is 30%. By that metric any given image represents substantially less than 30% of any AI model so the model itself is a perfectly legal derived work with its own copyright separate from the various works that were combined to create it.

              Ultimately though the issue here is that the wrong tool is being used, copyright just doesn’t cover this case, it’s just what people are most familiar with (not to mention most people are very poorly educated about it) so that’s what everyone reaches for by default.

              With generative AI what we have is a tool that can be used to trivially produce works that are substantially similar to existing copyrighted works. In this regard it’s less like a photocopier, and more like Photoshop, but with the critical difference that no particular talent is necessary to create the reproduction. Because it’s so easy to use people keep focusing on trying to kill the tool rather than trying to police the people using it. But they’re going about it all wrong, copyright isn’t the right weapon if that’s your goal. Copyright can be used to go after the people using generative AI tools, but not the people creating the tools.

              • dragontamer@lemmy.world
                link
                fedilink
                English
                arrow-up
                1
                ·
                9 months ago

                Because it’s so easy to use people keep focusing on trying to kill the tool rather than trying to police the people using it. But they’re going about it all wrong, copyright isn’t the right weapon if that’s your goal. Copyright can be used to go after the people using generative AI tools, but not the people creating the tools.

                Why? If the training weights are created and distributed in violation of copyright laws, it seems appropriate to punish those illegal training weights.

                In fact, all that people really are asking for, is for a new set of training weights to be developed but with appropriate copyright controls. IE: With express permission from the artists and/or entities who made the work.

                • orclev@lemmy.world
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  ·
                  9 months ago

                  Why? If the training weights are created and distributed in violation of copyright laws, it seems appropriate to punish those illegal training weights.

                  Because they aren’t illegal and they don’t violate copyright. People keep wanting them to be against copyright, but that’s just not how copyright works. There either needs to be amendments to copyright law in order to cover this case, but those changes would need to be very carefully tailored. It would be way too easy to make something that’s either overly broad and applies to a bunch of situations it wasn’t intended to, or way too narrow allowing for easy circumventing.

                  In fact, all that people really are asking for, is for a new set of training weights to be developed but with appropriate copyright controls. IE: With express permission from the artists and/or entities who made the work.

                  While that might appease some people, it wouldn’t appease everyone. There are a lot of workers in the creative fields that are feeling incredibly threatened by generative AI right now. Some of these fears are certainly overblown, but it’s also true corporations are going to be as shitty as possible and so some regulation is probably in order. That said, once again, copyright just doesn’t seem to be the right tool for the job here.

                  • dragontamer@lemmy.world
                    link
                    fedilink
                    English
                    arrow-up
                    1
                    ·
                    edit-2
                    9 months ago

                    Because they aren’t illegal and they don’t violate copyright

                    Because they are legal and they do violate copyright? People keep wanting them to be copyright free, but that’s not how copyright works. There don’t need to be amendments to copyright law in order to cover this case.

                    I mean, its obviously heading to the courts one way or the other, but I don’t think just making assertions like that are very good kind of arguing. The training weights here have clearly been proven to contain copyrighted data as per this article. I’m not sure if you’re making any kind of serious case that shows otherwise, but are instead just making a bunch of assertions that I could easily reverse.

    • CyberSeeker@discuss.tchncs.de
      link
      fedilink
      English
      arrow-up
      9
      arrow-down
      6
      ·
      9 months ago

      So let’s say I ask a talented human artist the same thing.

      Doesn’t this prove that a human, at some level, is storing the data of the Joker movie screenshot somewhere inside of their memory?

      • dragontamer@lemmy.world
        link
        fedilink
        English
        arrow-up
        10
        arrow-down
        6
        ·
        edit-2
        9 months ago

        So let’s say I ask a talented human artist the same thing.

        Artists don’t have hard drives or solid state drives that accept training weights.

        When you have a hard drive (or other object that easily creates copies), then the law that follows is copyright, with regards to the use and regulation of those copies. It doesn’t matter if you use a Xerox machine, VHS tape copies, or a Hard Drive. All that matters is that you’re easily copying data from one location to another.

        And yes. When a human recreates a copy of a scene clearly inspired by copyrighted data, its copyright infringement btw. Even if you recreate it from memory. It doesn’t matter how I draw Pikachu, if everyone knows and recognizes it as Pikachu, I’m infringing upon Nintendo’s copyright (and probably their trademark as well).

      • Auli@lemmy.ca
        link
        fedilink
        English
        arrow-up
        3
        arrow-down
        5
        ·
        9 months ago

        Nope humans don’t store data perfectly with perfect recall.

        • abhibeckert@lemmy.world
          link
          fedilink
          English
          arrow-up
          4
          arrow-down
          1
          ·
          edit-2
          9 months ago

          Humans can get pretty close to perfect recall with enough practice - show a human that exact joker image hundreds of thousands of times, they’re going to be able to remember every detail.

          That’s what happened here - the example images weren’t just in the training set once, they are in the training set over and over and over again across hundreds of thousands of websites.

          If someone wants these images nobody is going to use AI to access it - they’ll just do a google image search. There is no way Warner Brothers is harmed in any way by this, which is a strong fair use defence.

        • Jilanico@lemmy.world
          link
          fedilink
          English
          arrow-up
          4
          arrow-down
          2
          ·
          9 months ago

          Some do. Should we jail all the talented artists with photographic memories?

          • dragontamer@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            edit-2
            9 months ago

            If they’re copying copyrighted works, usually its a fine, especially if they’re making money from it.

            You know that performance artists get sued when they replicate a song in public from memory, right?

            • Jilanico@lemmy.world
              link
              fedilink
              English
              arrow-up
              1
              ·
              9 months ago

              I don’t think anyone is advocating to legalize the sale of copyrighted material made via AI.

    • Schmidtster@lemmynsfw.com
      link
      fedilink
      English
      arrow-up
      9
      arrow-down
      6
      ·
      9 months ago

      I mean anyone can use copyrighted material as inspiration for their work and it’s fair use and not a concern at all.

      Is Ai only bad since it can do what a human does better/faster? If that’s that case, than they don’t actually have an issue with the fact it’s copyrighted, or I wouldn’t be able to use it for inspiration either.

      • Match!!@pawb.social
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        3
        ·
        9 months ago

        In these cases it’s bad because it can do what a human does with no ethics, empathy, or regard for the law. If it had those things, it would be worse because we’d then be encroaching on the rights of sentient beings.

        • Auli@lemmy.ca
          link
          fedilink
          English
          arrow-up
          3
          arrow-down
          2
          ·
          9 months ago

          Problem is the AI didn’t do anything. People told the program tongo scrape the internet. So humans still made the decision with no regard doe the laws.

      • dragontamer@lemmy.world
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        4
        ·
        edit-2
        9 months ago

        Is Ai only bad since it can do what a human does better/faster?

        Legally speaking, AI is not anything. Its just a computer program. What you’re asking is completely a red-herring.

        The question here is if the training-weights constitute copyright infringement. Now look at any clip-art set. Most clip-art is so called “royalty free”, as in you can copy it from computer-to-computer without any copyright issues, because the author specifically said that its royalty free.

        But if you have a copyrighted font, then even copying that font from one computer to another constitutes copyright infringement. (IE: Literally, you aren’t allowed to copy this unless you have the permission of the author).

        So, when you download Midjourney’s training weights, does that act in of itself constitute a copy that violate’s the authors of “Joker” movie? As far as I can tell, yes. Because the training weights clearly contain Joker images.

        • Jilanico@lemmy.world
          link
          fedilink
          English
          arrow-up
          4
          arrow-down
          3
          ·
          9 months ago

          Looking at a copyrighted font with your computer means the font is in your computer’s memory. Do I go to jail for every site I visit that uses a fancy font?

          Font files ≠ framebuffer

          Images ≠ neural network weights

          • dragontamer@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            edit-2
            9 months ago

            Do I go to jail for every site I visit that uses a fancy font?

            If its a fancy copyrighted font without a license to copy… the Website owner gets sued. Because the website owner is the one making mass copies of said font.

            Do… you know what copyrites are? They relate to the copying of data.

            • Jilanico@lemmy.world
              link
              fedilink
              English
              arrow-up
              1
              ·
              9 months ago

              The framebuffer on your computer copies the data to display the font to you. That’s my point. Not every form of copying infringes on copyright.

              • dragontamer@lemmy.world
                link
                fedilink
                English
                arrow-up
                1
                ·
                edit-2
                9 months ago

                And my argument is that Midjourney’s servers are engaged in illegal copying. So I think your point is moot. Not the Web Browsers downloading images.

                The movie Joker’s image is being copied each time the training weights are copied to a new server. Is that not an illegal copy?

                • Jilanico@lemmy.world
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  ·
                  9 months ago

                  When you look at a picture of the joker online, your browser is caching an image file of the joker on your computer. Is that not an illegal copy?

                  • dragontamer@lemmy.world
                    link
                    fedilink
                    English
                    arrow-up
                    1
                    ·
                    edit-2
                    9 months ago

                    What the hell is this non-sequitur?

                    What do browser caches have to do with Midjourney servers and training weights?

                    I get that you wanna change the subject. But I dunno if it’s because you don’t understand my argument, or if you’ve realized that my argument is solid and therefore you have no actual counterargument.

                    The copy that people care about are the webservers. That’s why when you run Bittorrent, MPAA or RAII sue the people serving the data. Not the people who use the data. Have you followed any copyright case in the last two or three decades? In this case, it’d be a copyright case vs Midjourney servers.

        • Schmidtster@lemmynsfw.com
          link
          fedilink
          English
          arrow-up
          3
          arrow-down
          2
          ·
          edit-2
          9 months ago

          Hows it a red herring to point out we are allowed to use copyrighted materials already? Its not the concern here, yet its what they are using as the concern for their arguments against it.

          • dragontamer@lemmy.world
            link
            fedilink
            English
            arrow-up
            3
            arrow-down
            4
            ·
            edit-2
            9 months ago

            Because copyright law is clear in that computers can’t own a copyright.

            The humans at play are:

            1. The artist who created the original work.

            2. The computer IT team who are copying the data behind the scenes between servers.

            3. You who uses Midjourney to recreate “Joker” movie artwork, likely using the data in #2 which falls under copyright infringement.

            It doesn’t matter how #2 works. It doesn’t matter if its H.265 or MPEG2 or from VHS tapes, or if its a Neural Network using the latest-and-greatest training weights from a GPU-based datasystem. Its just a computer. The ones doing the copyright infringement are the people copying data from place to place.

            • orclev@lemmy.world
              link
              fedilink
              English
              arrow-up
              5
              arrow-down
              3
              ·
              9 months ago

              The AI model is not a copy of the set of data used to train it, it’s a derivative work. As such copyright as it currently stands does not apply. It’s possible, likely even, that copyright will be modified in some way soon to account for this, but the situation today says nope, not copyright infringement.

              • Dkarma@lemmy.world
                link
                fedilink
                English
                arrow-up
                5
                arrow-down
                3
                ·
                9 months ago

                They’re really trying so hard cuz they absolutely want this to be infringement but it simply isnt on any legal level.

    • rsuri@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      arrow-down
      2
      ·
      9 months ago

      But its not like our laws have changed

      And that’s the problem. The internet has drastically reduced the cost of copying information, to the point where entirely new uses like this one are now possible. But those new uses are stifled by copyright law that originates from a time when the only cost was that people with gutenberg presses would be prohibited from printing slightly cheaper books. And there’s no discussion of changing it because the people who benefit from those laws literally are the media.

      • dragontamer@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        9 months ago

        Copyright was literally invented because its cheap and easy to copy information (ie: Printing Press).

        When copies are easy, you screw over the original artist. A large scale regulation of copies must be enforced by the central authorities to make sure small artists get the payments that they deserve. It doesn’t matter if you use a printing press, a xerox machine, a photograph, a phonograph, a record, a CD-ROM copy, a tape recorder, or the newest and fanciest AI to copy someone’s work. Its a copy, and therefore under the copyright regulations.

    • LainTrain@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      6
      ·
      9 months ago

      By that logic I am also storing that image in my dataset, because I know and remember this exact image. I can reproduce it from memory too.

      • dragontamer@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        9 months ago

        You ever try to do a public performance of a copyrighted work, like “Happy Birthday to You” ??

        You get sued. Even if its from memory. Welcome to copyright law. There’s a reason why every restaraunt had to make up a new “Happy Happy Birthday, from the Birthday Crew” song.

        • LainTrain@lemmy.dbzer0.com
          link
          fedilink
          English
          arrow-up
          1
          arrow-down
          1
          ·
          9 months ago

          Yeah, but until I perform it without a license for profit, I don’t get sued.

          So it’s up to the user to make sure that if any material that is generated is copyright infringing, it should not be used.

          • dragontamer@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            9 months ago

            Otakon anime music videos have no profits but they explicitly get a license from RIAA to play songs in public.

            • LainTrain@lemmy.dbzer0.com
              link
              fedilink
              English
              arrow-up
              1
              arrow-down
              1
              ·
              9 months ago

              So? I’m not saying those are fair terms, I would also prefer if that were not the case, but AI isn’t performing in public any more having a guitar with you in public is ripping off Metallica.

              • dragontamer@lemmy.world
                link
                fedilink
                English
                arrow-up
                1
                ·
                edit-2
                9 months ago

                You don’t need to perform “for profit” to get sued for copyright infringement.

                but AI isn’t performing in public any more having a guitar with you in public is ripping off Metallica.

                Is the Joker image in that article derivative or substantially similar to a copyrighted work? Is the query available to anyone who uses Midjourney? Are the training weights being copied from server-to-server behind the scenes? Were the training weights derived from copyrighted data?

                • LainTrain@lemmy.dbzer0.com
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  arrow-down
                  1
                  ·
                  9 months ago

                  Yes and none of that matters in the slightest. By that logic the Library of Babel is also copyright infringement. By that logic my memory of the movie is copyright infringing even if I don’t do anything with it.

                  • dragontamer@lemmy.world
                    link
                    fedilink
                    English
                    arrow-up
                    1
                    ·
                    9 months ago

                    You’re taking a fictional work and trying to apply real world laws to it?

                    Copyright assumes that Library of Babel would take up so much space as it’d be impossible to create.

                    Which is true. Every possible combination of letters, spaces, and characters would never fit on anything in today’s universe (be it a 24 TB Hard Drive, or even a collection of thousands of them).

                    Secondly: any computer-generated work is automatically non-copyrighted as per US Law.

        • LainTrain@lemmy.dbzer0.com
          link
          fedilink
          English
          arrow-up
          1
          arrow-down
          4
          ·
          edit-2
          9 months ago

          What’s the difference? I could be just some code in the simulation

          Edit: downvoted by people who unironically stan Ted Kaczynski