• Marxine@lemmy.ml
    link
    fedilink
    arrow-up
    10
    arrow-down
    5
    ·
    1 year ago

    VC backed AI makers and billionaire-ran corporations should definitely pay for the data they use to train their models. The common user should definitely check the licences of the data they use as well.

    • SixTrickyBiscuits@lemmy.world
      link
      fedilink
      arrow-up
      4
      arrow-down
      7
      ·
      1 year ago

      That is essentially impossible. How are they going to pay each reddit user whose comment the AI analyzed? Or each website it analyzed? We’re talking about terabytes of text data taken from a huge variety of sources.

      • CannaVet@lemmy.world
        link
        fedilink
        arrow-up
        9
        arrow-down
        4
        ·
        edit-2
        1 year ago

        Then it should be treated as what it is, an illegal venture based off of theft. I don’t get a legal pass to steal just because the groceries I stole got cooked into a meal and are therefore no longer the groceries I stole.

        • azuth@lemmy.world
          link
          fedilink
          arrow-up
          4
          arrow-down
          7
          ·
          1 year ago

          Firstly copyright infringement is not theft. It’s not theft because the grocer still has the groceries. It is a lesser crime which obviously hurts the victim less if at all in some cases.

          A summary is also not copyright infringement, it’s fair use. Of course copyright holders would love to copyright strike bad reviews (they already do even though it’s not illegal).

      • Marxine@lemmy.ml
        link
        fedilink
        arrow-up
        4
        arrow-down
        1
        ·
        1 year ago

        Billionaires can spend and burn their whole net worth for all I care. Datasets should be either:

        • Paid for to the provider platform, and each original content creator gets a share (eg. The platform keeps 10% of the sold price for hosting costs, the 90% remaining are distributed to content creators according to size and quality of the data provided)
        • Consciously donated by the content creators (eg: an OPT-IN term in the platform about donating agreed upon data for non-profit research), but the dataset must never be sold for or used for profit. Publicly available research purposes only.
        • Dataset is “rented” by the users and platform in an OPT-IN manner, and they receive royalties/payments for each purchase/usage of the dataset.

        The current manner things are done only favours venture capitalists (wage thieves), shareholders (also wage thieves) and billionaire C-suits (wage thieves as well).