Maven, a new social network backed by OpenAI’s Sam Altman, found itself in a controversy today when it imported a huge amount of posts and profiles from the Fediverse, and then ran AI analysis to alter the content.

  • lunarul@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 months ago

    I was confused why a package manager would need to import posts from a social network.

    Why name a new product the same as a very popular existing product?

  • threelonmusketeers@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 months ago

    I was confused on what they were trying to accomplish, and even after reading the article I am still somewhat confused.

    Instead, when a user posts something, the algorithm automatically reads the content and tags it with relevant interests so it shows up on those pages. Users can turn up the serendipity slider to branch out beyond their stated interests, and the algorithm running the platform connects users with related interests.

    Perhaps I’m a minority, but I don’t see myself getting much utility out of this. I already know what my interests are, and don’t have much interest in growing them algorithmically. If a topic is really interesting, I’ll eventually find out about it via an actual human.

    • technomad@slrpnk.net
      link
      fedilink
      English
      arrow-up
      0
      ·
      2 months ago

      Yeah, we’re trying to get the fuck away from algorithms. That’s what makes the fediverse such a big draw currently, for me.

      • FaceDeer@fedia.io
        link
        fedilink
        arrow-up
        2
        ·
        2 months ago

        You’re on slrpnk.net, I assume it’s not implementing any of this stuff. As long as you don’t sign up for Maven I don’t see how this is going to affect you.

        • technomad@slrpnk.net
          link
          fedilink
          English
          arrow-up
          0
          arrow-down
          1
          ·
          edit-2
          2 months ago

          I mean yeah, maybe it won’t affect me directly, I like the instance I’m on and it’s a pretty respectable one. However, indirectly, this is very relevant to any Fediverse user, regardless of the instance or platform they’re using. Allowing abuses like this to happen without any pushback is a surefire way of turning this place into a shithole just like the rest of the internet. I appreciate the fact that, at least for now, it’s different here.

          Also, maybe this isn’t my only homebase? Just saying.

  • verstra@programming.dev
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    1
    ·
    2 months ago

    Oh shit, the persona guy was right! We should all be adding license to our comments, so could not legally train model that are then used for commercial purposes.

    • Pennomi@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      2 months ago

      The easiest way is a sitewide NoAI meta tag, since it’s the current standard. Researchers are much more likely to respect a common standard and extremely unlikely to respect a single user’s personal solution adding a link to their comments.

      • onlinepersona@programming.dev
        link
        fedilink
        English
        arrow-up
        0
        arrow-down
        1
        ·
        2 months ago

        Why do you think it won’t hold water legally? There’s a case going right now against Github Copilot for scraping GPL licences code, even spitting it back out verbatim, and not making “open” AI actually open.

        Creative Commons is not a joke licence. It actually is used by artists, authors, and other creative types.

        Imagine Maven or another company doing the same shit they just did and it coming to light there were a bunch of noncommercially licences content in there. The authors could band together for a class action lawsuit and sue their asses. Given the reaction of users here and on mastodon, I wouldn’t even be surprised if it did happen.

        Anti Commercial-AI license

          • Venia Silente@lemm.ee
            link
            fedilink
            English
            arrow-up
            0
            arrow-down
            1
            ·
            2 months ago

            Don’t we also need a critical mass of people adding licenses to posts? So that a class action suit can be launched. Because it would be inviable and a very rapid path to self-defeat if people started to try and individually sue big corpo.

            Also I’m missing a way to automatically add this to my posts. Something like a browser extension.

            This post is licensed under CC BY-NC-SA 4.0.

              • Venia Silente@lemm.ee
                link
                fedilink
                English
                arrow-up
                0
                arrow-down
                1
                ·
                2 months ago

                Also for me I’m using a text expander so that after I type a shortcut it automatically adds the rest of the text for me.

                I request of you, show me your ways!

                • Danterious@lemmy.dbzer0.com
                  link
                  fedilink
                  English
                  arrow-up
                  0
                  arrow-down
                  1
                  ·
                  2 months ago

                  Well on firefox/chrome extensions you can search for text expander and choose an extension that works for you.

                  Or if you are using a phone you can do the same on the app store and I think there should be a few options.

                  Once you download one of them it should give instructions on how to use it, but in general it asks you to create a phrase that you want to be automatically triggered and a shorter phrase that automatically replaced with the longer phrase.

                  For example-

                  long phrase: The quick brown fox jumped over the moon.

                  short phrase: /qfox

                  and every time you typed /qfox it would replace it with “The quick brown fox jumped over the moon.”

                  Anti Commercial-AI license (CC BY-NC-SA 4.0)

    • onlinepersona@programming.dev
      link
      fedilink
      English
      arrow-up
      0
      arrow-down
      1
      ·
      2 months ago

      It’s especially for these kinds of dumb cases where they simply copy content wholesale and boast about it. With more people licencing their contents as non commercial, the “hot water” these companies get in could not just be trivial but actually legal.

      Would be great if web and mobile clients supported signatures or a “licence” field from which signatures were generated. Even better would be if people smarter than me added a feature to poison AI training data. This could also be done by a signature or some other method.

      Anti Commercial-AI license

  • Grandwolf319@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 months ago

    Genuine question, do instances not have a GPL license on their content? With that license, anyone can use all the data but only for open source software.

    • jackalope@lemmy.ml
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 months ago

      I don’t think you can use gpl for anything but code. Creative commons license would be more appropriate.

    • doctortofu@reddthat.com
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 months ago

      The wildest part is that he’s surprised that Mastodon peeps would react negatively to their posts being scrapped without consent or even notification and fed into an AI model. Like, are you for real dude? Have you spent more than 4 seconds on Mastodon and noticed their (our?) general attitude towards AI? Come the hell on…

      • Etterra@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        2 months ago

        It’s not surprised. He’s acting surprised because he got caught. It’s pretty standard for these jerkass tech bros. “Move fast break things” is code “break laws be unethical” - as I think we’ve all seen if you do it often and fast enough you can keep way ahead of any kind of accountability because everybody else is trying to play catch up well the last thing has already filtered out of the news cycle.

      • danc4498@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        2 months ago

        People can complain, but the Fediverse is built to make consuming user’s data easy. If you don’t want AI using your data, don’t put it on such an easily “scrapable” network.

        • bbuez@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          2 months ago

          Alternatively, use a closed ecosystem susceptible to data rot and loss.

          Want to contribute to our open source project? Join our discord

          Would you want art to be unfindable because scraping for AI image generation happens? It’s a solution looking for problems.

        • Scrubbles@poptalk.scrubbles.tech
          link
          fedilink
          English
          arrow-up
          0
          ·
          2 months ago

          This is what I’ve been saying the entire time. It sucks, and it’s wrong, but the fediverse is built from the ground up as an open sharing platform, where amour data is shared with anyone. It shouldn’t be, and it’s wrong, but there is nothing to stop anyone from doing it. To change that would alter federation at a core level

            • Scrubbles@poptalk.scrubbles.tech
              link
              fedilink
              English
              arrow-up
              1
              ·
              2 months ago

              I’ve had this argument with other people, but essentially at this point there is no licensing beyond server ownership here, and most servers don’t have any licenses defined. Even if they do, then sure they did something wrong… but how would you ever prove it or enforce it? The only way to actually disallow them is to switch from open federation to closed - which goes against what we’re trying to build with federation.

              • Grandwolf319@sh.itjust.works
                link
                fedilink
                English
                arrow-up
                0
                arrow-down
                1
                ·
                edit-2
                2 months ago

                There has been instances before where LLMs gave up clues as to what source it used. When that happens, they can be sued.

                Im okay with people using our data for whatever, since it’s all open and it should be. But I rather put a little bit of effort to make for profit use technically illegal. It’s better than nothing.

            • bamboo@lemm.ee
              link
              fedilink
              English
              arrow-up
              1
              ·
              2 months ago

              If it ends up being ruled that training an LLM is fair use so long as the LLM doesn’t reproduce the works it is trained on verbatim, then licensing becomes irrelevant.

        • Grandwolf319@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          0
          arrow-down
          1
          ·
          2 months ago

          Just because our data is accessible doesn’t mean it’s legally licensed to be used by a for profit company. Free doesn’t meant you can do what you want with it, it just means no cost.

          • danc4498@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            2 months ago

            I don’t disagree. I’m just saying that so long as you’re putting content on this platform, you are powerless to stop any service from using the features of the platform in whatever way they want.

            It was built for easy and open consumption of user content by other services.

        • lambalicious@lemmy.sdf.org
          link
          fedilink
          English
          arrow-up
          0
          arrow-down
          1
          ·
          2 months ago

          People can complain, but the Fediverse is built to make consuming user’s data easy

          Correction: it is built to make consuming users’s data not easy, but more human.

          WHat you are thinking of is AP, not “Fediverse”, and even then that’s a stretch.

          • danc4498@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            2 months ago

            Correction: it is built to make consuming users’s data not easy, but more human.

            What does that even mean?

            WHat you are thinking of is AP, not “Fediverse”, and even then that’s a stretch.

            Honestly, I think Fediverse is inseparable from AP (or some similar protocol). You can split hairs if you want, but the thing that makes it different from all other social media services is that it allows the content created by users on one service to be imported into a different service.

            You can hope and dream that it is only services like Lemmy consuming user content from services like Mastadon, but this same protocol makes it easy for services like ChatGPT to consume the same data.

      • FaceDeer@fedia.io
        link
        fedilink
        arrow-up
        1
        ·
        2 months ago

        It sounds like they weren’t “being fed into an AI model” as in being used as training material, they were just being evaluated by an AI model. However…

        Have you spent more than 4 seconds on Mastodon and noticed their (our?) general attitude towards AI?

        Yeah, the general attitude of wild witch-hunts and instant zero-to-11 rage at the slightest mention of it. Doesn’t matter what you’re actually doing with AI, the moment the mob thinks they scent blood the avalanche is rolling.

        It sounds like Maven wants to play nice, but if the “general attitude” means that playing nice is impossible why should they even bother to try?

        • xavier666@lemm.ee
          link
          fedilink
          English
          arrow-up
          0
          arrow-down
          1
          ·
          2 months ago

          Yeah, the general attitude of wild witch-hunts and instant zero-to-11 rage at the slightest mention of it. Doesn’t matter what you’re actually doing with AI, the moment the mob thinks they scent blood the avalanche is rolling.

          This wasn’t always the case. A lot of research on NLP uses scraped social media posts (2010’s). People never had a problem with that (at least the outrage wasn’t visible back then). The problem now is that our content is being used to create an AI product where there is zero consent taken from the end-user.

          Source: My research colleagues used to work on NLP

          • jackalope@lemmy.ml
            link
            fedilink
            English
            arrow-up
            1
            ·
            2 months ago

            Consent isn’t legally required if it’s fair use. Whether it’s fair use remains to be ruled on by the courts.