Maven Imported 1.12 Million Fediverse Posts

Sean Tilley@lemmy.world · 3 months ago

Maven Imported 1.12 Million Fediverse Posts

zoey@lemmy.blahaj.zone · 3 months ago

Pretty wild

doctortofu@reddthat.com · 3 months ago

The wildest part is that he’s surprised that Mastodon peeps would react negatively to their posts being scrapped without consent or even notification and fed into an AI model. Like, are you for real dude? Have you spent more than 4 seconds on Mastodon and noticed their (our?) general attitude towards AI? Come the hell on…

FaceDeer@fedia.io · 3 months ago

It sounds like they weren’t “being fed into an AI model” as in being used as training material, they were just being evaluated by an AI model. However…

Have you spent more than 4 seconds on Mastodon and noticed their (our?) general attitude towards AI?

Yeah, the general attitude of wild witch-hunts and instant zero-to-11 rage at the slightest mention of it. Doesn’t matter what you’re actually doing with AI, the moment the mob thinks they scent blood the avalanche is rolling.

It sounds like Maven wants to play nice, but if the “general attitude” means that playing nice is impossible why should they even bother to try?

xavier666@lemm.ee · 3 months ago

Yeah, the general attitude of wild witch-hunts and instant zero-to-11 rage at the slightest mention of it. Doesn’t matter what you’re actually doing with AI, the moment the mob thinks they scent blood the avalanche is rolling.

This wasn’t always the case. A lot of research on NLP uses scraped social media posts (2010’s). People never had a problem with that (at least the outrage wasn’t visible back then). The problem now is that our content is being used to create an AI product where there is zero consent taken from the end-user.

Source: My research colleagues used to work on NLP

jackalope@lemmy.ml · 3 months ago

Consent isn’t legally required if it’s fair use. Whether it’s fair use remains to be ruled on by the courts.

danc4498@lemmy.world · 3 months ago

People can complain, but the Fediverse is built to make consuming user’s data easy. If you don’t want AI using your data, don’t put it on such an easily “scrapable” network.

bbuez@lemmy.world · 3 months ago

Alternatively, use a closed ecosystem susceptible to data rot and loss.

Want to contribute to our open source project? Join our discord

Would you want art to be unfindable because scraping for AI image generation happens? It’s a solution looking for problems.

Scrubbles@poptalk.scrubbles.tech · 3 months ago

This is what I’ve been saying the entire time. It sucks, and it’s wrong, but the fediverse is built from the ground up as an open sharing platform, where amour data is shared with anyone. It shouldn’t be, and it’s wrong, but there is nothing to stop anyone from doing it. To change that would alter federation at a core level

Grandwolf319@sh.itjust.works · 3 months ago

That doesn’t mean it’s licensed to be used in a for profit software.

bamboo@lemm.ee · 3 months ago

If it ends up being ruled that training an LLM is fair use so long as the LLM doesn’t reproduce the works it is trained on verbatim, then licensing becomes irrelevant.

Scrubbles@poptalk.scrubbles.tech · 3 months ago

I’ve had this argument with other people, but essentially at this point there is no licensing beyond server ownership here, and most servers don’t have any licenses defined. Even if they do, then sure they did something wrong… but how would you ever prove it or enforce it? The only way to actually disallow them is to switch from open federation to closed - which goes against what we’re trying to build with federation.

Grandwolf319@sh.itjust.works · edit-2 3 months ago

There has been instances before where LLMs gave up clues as to what source it used. When that happens, they can be sued.

Im okay with people using our data for whatever, since it’s all open and it should be. But I rather put a little bit of effort to make for profit use technically illegal. It’s better than nothing.

Grandwolf319@sh.itjust.works · 3 months ago

Just because our data is accessible doesn’t mean it’s legally licensed to be used by a for profit company. Free doesn’t meant you can do what you want with it, it just means no cost.

danc4498@lemmy.world · 3 months ago

I don’t disagree. I’m just saying that so long as you’re putting content on this platform, you are powerless to stop any service from using the features of the platform in whatever way they want.

It was built for easy and open consumption of user content by other services.

lambalicious@lemmy.sdf.org · 3 months ago

People can complain, but the Fediverse is built to make consuming user’s data easy

Correction: it is built to make consuming users’s data not easy, but more human.

WHat you are thinking of is AP, not “Fediverse”, and even then that’s a stretch.

danc4498@lemmy.world · 3 months ago

Correction: it is built to make consuming users’s data not easy, but more human.

What does that even mean?

WHat you are thinking of is AP, not “Fediverse”, and even then that’s a stretch.

Honestly, I think Fediverse is inseparable from AP (or some similar protocol). You can split hairs if you want, but the thing that makes it different from all other social media services is that it allows the content created by users on one service to be imported into a different service.

You can hope and dream that it is only services like Lemmy consuming user content from services like Mastadon, but this same protocol makes it easy for services like ChatGPT to consume the same data.

Etterra@lemmy.world · 3 months ago

It’s not surprised. He’s acting surprised because he got caught. It’s pretty standard for these jerkass tech bros. “Move fast break things” is code “break laws be unethical” - as I think we’ve all seen if you do it often and fast enough you can keep way ahead of any kind of accountability because everybody else is trying to play catch up well the last thing has already filtered out of the news cycle.