• sad_detective_man@leminal.space
    link
    fedilink
    English
    arrow-up
    2
    ·
    10日前

    I was under the impression that LLM’s are being trained on reddit as user accounts separate from reddit itself. Are you saying that reddit is botting itself on the administrative level? Because that doesn’t seem as profitable as letting outside interests inflate the traffic and ad revenue as users

    • Angry_Autist (he/him)@lemmy.world
      link
      fedilink
      arrow-up
      2
      ·
      9日前

      Not all LLMs are being trained on API data, it is a lot more resource efficient just to give them a copy of the comment database, which has all records of edits stored

      • sad_detective_man@leminal.space
        link
        fedilink
        English
        arrow-up
        2
        ·
        9日前

        would a LLM operator pay for that or care for unedited data? the redditor in me assumes they would prefer it from the source and are aware of Redact but my better judgement thinks they’re probably just average capitalists with a shiny toy and don’t really care about what it reads. Hence why they are using it on reddit

        • Angry_Autist (he/him)@lemmy.world
          link
          fedilink
          arrow-up
          2
          ·
          8日前

          reddit was once considered the highest quality user generated data on current topics, and that archived data is valuable even if the last few years are unsellable