• ricecake@sh.itjust.works
    link
    fedilink
    arrow-up
    2
    ·
    6 months ago

    In the eyes of the law, intent does matter, as well as how it’s responded to.
    For csam material, you have to knowingly possess it or have sought to possess it.

    The AI companies use a project that indexes everything on the Internet, like Google, but with publicly available free output.

    https://commoncrawl.org/

    They use this data via another project, https://laion.ai/ , which uses the data to find images with descriptions attached, do some tricks to validate that the descriptions make sense, and then publish a list of “location of the image, description of the image” pairs.

    The AI companies use that list to grab the images train an AI on them in conjunction with the description.

    So, people at Stanford were doing research on the laion dataset when they found the instances of csam. The laion project pulled their datasets from being available while things were checked and new safeguards put in place.
    The AI companies also pulled their models (if public) while the images were removed from the data set and new safeguards implemented.
    Most of the csam images in the dataset were already gone by the time the AI companies would have attempted to access them, but some were not.

    A very obvious lack of intent to acquire the material, in fact a lack of awareness the material was possessed at all, transparency in response, taking steps to prevent further distribution, and taking action to prevent it from happening again both provides a defensive against accusations, and will make anyone interested less likely to want to make those accusations.

    On the other hand, the people who generated the images were knowingly doing so, which is a nono.