• echo64@lemmy.world
    link
    fedilink
    English
    arrow-up
    24
    arrow-down
    1
    ·
    9 months ago

    Ai actually has huge problems with this. If you feed ai generated data into models, then the new training falls apart extremely quickly. There does not appear to be any good solution for this, the equivalent of ai inbreeding.

    This is the primary reason why most ai data isn’t trained on anything past 2021. The internet is just too full of ai generated data.

    • givesomefucks@lemmy.world
      link
      fedilink
      English
      arrow-up
      14
      arrow-down
      1
      ·
      edit-2
      9 months ago

      There does not appear to be any good solution for this

      Pay intelligent humans to train AI.

      Like, have grad students talk to it in their area of expertise.

      But that’s expensive, so capitalist companies will always take the cheaper/shittier routes.

      So it’s not there’s no solution, there’s just no profitable solution. Which is why innovation should never solely be in the hands of people whose only concern is profits

    • Ultraviolet@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      1
      ·
      9 months ago

      This is why LLMs have no future. No matter how much the technology improves, they can never have training data past 2021, which becomes more and more of a problem as time goes on.

    • T156@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      9 months ago

      And unlike with images where it might be possible to embed a watermark to filter out, it’s much harder to pinpoint whether text is AI generated or not, especially if you have bots masquerading as users.