• FlexibleToast@lemmy.world
    link
    fedilink
    English
    arrow-up
    44
    arrow-down
    3
    ·
    1 day ago

    No, it’s value is that users have up voted and down voted good and bad data. This gives the AI training data that has been crowd sourced by humans. Bots manipulating votes would destroy the value of Reddit.

    • Tollana1234567@lemmy.today
      link
      fedilink
      English
      arrow-up
      2
      ·
      12 hours ago

      havnt they already destroyed the value of reddit, the bots already have massive hordes of downvoting/voting going on.

    • rafoix@lemmy.zip
      link
      fedilink
      English
      arrow-up
      49
      ·
      24 hours ago

      Bots have been controlling votes on Reddit for years. It costs about $50 to get to the front page of Reddit.

      • NuXCOM_90Percent@lemmy.zip
        link
        fedilink
        English
        arrow-up
        12
        arrow-down
        2
        ·
        edit-2
        21 hours ago

        Which is also generally very detectable (if you actually care) and is generally used to push (monetized) social media and not the answer to “what is the difference between normal and merino wool?” and so forth.

        The vast majority of the “user” interaction and memes is not the product (at play. it IS useful for societal manipulation but there are better platforms for that). It is all those useful questions and answers that people get pissy about folk deleting the answers to.

        Because people, generally, weren’t searching for team edward or jacob reddit but instead the fuck is a renesme reddit or LED versus fluourescent bulbs reddit and so forth. And the community labeling of the latter is generally REALLY good.

        Things DO get messy when the question becomes best synthetic boxer briefs reddit but… it is incredibly rare for an astro turfing campaign to be strong enough to get a genuinely bad product “on top”. You tend to just get an overly expensive branded white label in the top slot which may actually be identical to the “real” best one anyway. But considering you are relying on human opinion regardless, that isn’t far off of what you were gonna get without said astroturfing.

        • Michael@slrpnk.net
          link
          fedilink
          English
          arrow-up
          14
          ·
          edit-2
          20 hours ago

          As a user of the website for many years it’s entirely obvious that a majority of the website is gamed (besides its niche corners). Reddit’s bot prevention mechanisms just fuck VPN users - they are wholly inadequate.

          That thread is evidence of a organized bot campaign that they had 12 days to clean up (and didn’t). It’s naive to believe that the rest of the website isn’t similarly (and less obviously) affected by bots - with vote manipulation still standing.

          • plyth@feddit.org
            link
            fedilink
            English
            arrow-up
            3
            ·
            12 hours ago

            They could sell the cleaned votes to AI companies and keep the dirty data public for the scrapers.

            • Michael@slrpnk.net
              link
              fedilink
              English
              arrow-up
              3
              ·
              edit-2
              12 hours ago

              Meta/OpenAI openly pirating everything they can to train their LLMs is a good example of how data hungry these AI/etc. companies are.

              Is it plausible that companies request that Reddit narrows down data e.g. by demographic, geographic location, or likelihood of being a real person and request that data for purchase? Sure, but the LLMs seemingly require all data that exists that these companies can get their hands on - I highly doubt with the scale of data being consumed (and data theft being committed) that the big players care too much about Reddit data being tainted. If anything, it might even be desirable to them.

            • Michael@slrpnk.net
              link
              fedilink
              English
              arrow-up
              4
              ·
              edit-2
              18 hours ago

              Are you somebody invested financially in Reddit? Genuine question.

              Those niche subreddits can also have their moments, too. Maybe it’s not bots, but there are plenty of shills that have been caught in various niche subreddits I’ve frequented over the years (thanks to unpaid moderators).

              • FlexibleToast@lemmy.world
                link
                fedilink
                English
                arrow-up
                1
                ·
                18 hours ago

                No, I’m not. I don’t care at all if they’re successful or go under.

                Sure, but again it’s not likely to be most. You don’t seem to realize how hard it is to get data that is already classified. That stuff is gold to people developing AI. Most of the work in data science is cleaning data and getting it into a usable form.

                • Michael@slrpnk.net
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  ·
                  edit-2
                  13 hours ago

                  It’s noise, a very large part of it. Reddit is financially motivated to make the data appear as if it is signal. It isn’t - they have taken extremely minimal steps to ensure actual human participation.

                  This doesn’t matter to AI companies, but it only warps that technology more and more. AI is a sinking ship with current methodologies. Reddit will die when the AI bubble bursts and those involved with Reddit already cashed out enough to be filthy rich.

                  • FlexibleToast@lemmy.world
                    link
                    fedilink
                    English
                    arrow-up
                    1
                    ·
                    18 hours ago

                    At this point we’re just speculating. We don’t have evidence either way of its mostly good or mostly bad data.