Camera makers and pencil makers (and the users of those devices) aren’t making massive server farms that spy on every drop of information they can get ahold of.
If AI has the means to generate inappropriate material, then that means the developers have allowed it to train from inappropriate material.
Now when that’s the case, well where did the devs get the training data?.. 🤔
If AI has the means to generate inappropriate material, then that means the developers have allowed it to train from inappropriate material.
That’s not how generative AI works. It’s capable of creating images that include novel elements that weren’t in the training set.
Go ahead and ask one to generate a bonkers image description that doesn’t exist in its training data and there’s a good chance it’ll be able to make one for you. The classic example is an “avocado chair”, which an early image generator was able to produce many plausible images of despite only having been trained on images of avocados and chairs. It understood the two general concepts and was able to figure out how to meld them into a common depiction.
Yes, I’ve tried similar silly things. I’ve asked AI to render an image of Mr. Bean hugging Pennywise the clown. And it delivered, something randomly silly looking, but still not far off base.
But when it comes to inappropriate material, well the AI shouldn’t be able to generate any such thing in the first place, unless the developers have allowed it to train from inappropriate sources…
The trainers didn’t train the image generator on images of Mr. Bean hugging Pennywise, and yet it’s able to generate images of Mr. Bean hugging Pennywise. Yet you insist that it can’t generate inappropriate images without having been specifically trained on inappropriate images? Why is that suddenly different?
3,226 suspected images out of 5.8 billion. About 0.00006%. And probably mislabeled to boot, or it would have been caught earlier. I doubt it had any significant impact on the model’s capabilities.
Who is responsible then? Cuz the devs basically gotta let the AI go to town on many websites and documents for any sort of training set.
So you mean to say, you can’t blame the developers, because they just made a tool (one that scrapes data from everywhere possible), can’t blame the tool (don’t mind that AI is scraping all your data), and can’t blame the end users, because some dirty minded people search or post inappropriate things…?
First, you need to figure out exactly what it is that the “blame” is for.
If the problem is the abuse of children, well, none of that actually happened in this case so there’s no blame to begin with.
If the problem is possession of CSAM, then that’s on the guy who generated them since they didn’t exist at any point before then. The trainers wouldn’t have needed to have any of that in the training set so if you want to blame them you’re going to need to do a completely separate investigation into that, the ability of the AI to generate images like that doesn’t prove anything.
If the problem is the creation of CSAM, then again, it’s the guy who generated them.
If it’s the provision of general-purpose art tools that were later used to create CSAM, then sure, the AI trainers are in trouble. As are the camera makers and the pencil makers, as I mentioned sarcastically in my first comment.
AI only knows what has gone through it’s training data, both from the developers and the end users.
Yes, and as I’ve said repeatedly, it’s able to synthesize novel images from the things it has learned.
If you train an AI with pictures of green cars and pictures of red apples, it’ll be able to figure out how to generate images of red cars and green apples for you.
Camera makers and pencil makers (and the users of those devices) aren’t making massive server farms that spy on every drop of information they can get ahold of.
If AI has the means to generate inappropriate material, then that means the developers have allowed it to train from inappropriate material.
Now when that’s the case, well where did the devs get the training data?.. 🤔
That’s not how generative AI works. It’s capable of creating images that include novel elements that weren’t in the training set.
Go ahead and ask one to generate a bonkers image description that doesn’t exist in its training data and there’s a good chance it’ll be able to make one for you. The classic example is an “avocado chair”, which an early image generator was able to produce many plausible images of despite only having been trained on images of avocados and chairs. It understood the two general concepts and was able to figure out how to meld them into a common depiction.
Yes, I’ve tried similar silly things. I’ve asked AI to render an image of Mr. Bean hugging Pennywise the clown. And it delivered, something randomly silly looking, but still not far off base.
But when it comes to inappropriate material, well the AI shouldn’t be able to generate any such thing in the first place, unless the developers have allowed it to train from inappropriate sources…
The trainers didn’t train the image generator on images of Mr. Bean hugging Pennywise, and yet it’s able to generate images of Mr. Bean hugging Pennywise. Yet you insist that it can’t generate inappropriate images without having been specifically trained on inappropriate images? Why is that suddenly different?
The trainers taught it what Mr. Bean looks like and what Pennywise looks like - it took those concepts and combined them to create your image. To make CSAM it was, unfortunately, trained on CSAM https://cyber.fsi.stanford.edu/news/investigation-finds-ai-image-generation-models-trained-child-abuse
3,226 suspected images out of 5.8 billion. About 0.00006%. And probably mislabeled to boot, or it would have been caught earlier. I doubt it had any significant impact on the model’s capabilities.
Who is responsible then? Cuz the devs basically gotta let the AI go to town on many websites and documents for any sort of training set.
So you mean to say, you can’t blame the developers, because they just made a tool (one that scrapes data from everywhere possible), can’t blame the tool (don’t mind that AI is scraping all your data), and can’t blame the end users, because some dirty minded people search or post inappropriate things…?
So where’s the blame go?
First, you need to figure out exactly what it is that the “blame” is for.
If the problem is the abuse of children, well, none of that actually happened in this case so there’s no blame to begin with.
If the problem is possession of CSAM, then that’s on the guy who generated them since they didn’t exist at any point before then. The trainers wouldn’t have needed to have any of that in the training set so if you want to blame them you’re going to need to do a completely separate investigation into that, the ability of the AI to generate images like that doesn’t prove anything.
If the problem is the creation of CSAM, then again, it’s the guy who generated them.
If it’s the provision of general-purpose art tools that were later used to create CSAM, then sure, the AI trainers are in trouble. As are the camera makers and the pencil makers, as I mentioned sarcastically in my first comment.
You obviously don’t understand squat about AI.
AI only knows what has gone through it’s training data, both from the developers and the end users.
Hell, back in 2003 I wrote an adaptive AI for optical character recognition (OCR). I designed it for English, but also with a crude ability to learn.
I could have taught that thing hieroglyphics if I wanted to. But AI will never generate things that it’s never seen before.
Funny that AI has an easier time rendering inappropriate material than it does human hands…
Ha.
Yes, and as I’ve said repeatedly, it’s able to synthesize novel images from the things it has learned.
If you train an AI with pictures of green cars and pictures of red apples, it’ll be able to figure out how to generate images of red cars and green apples for you.
Exactly. And if you ask it for the opposite of an older MILF, then how does it know what younger ladies look like?