• Deceptichum@quokk.au
    link
    fedilink
    English
    arrow-up
    14
    arrow-down
    2
    ·
    3 days ago

    Damn, guess if you want reddit data to train your AI that you’ll need to pay Spez for access.

    • tal@lemmy.today
      link
      fedilink
      English
      arrow-up
      10
      ·
      edit-2
      3 days ago

      It’s important for people writing papers and such who need to cite material.

      I wonder if there’s some way to use the TLS certificate to get a cryptographically-signed copy of a webpage with timestamp that someone could later validate as having been downloaded on that date. I don’t know if existing TLS libraries are capable of that. Like, Web browser menu option “Store cryptographically-signed webpage”. Absent a later certificate compromise, I’d think that that’d at least provide people a way to credibly say “this is really what was on that webpage on August 15th, 2026”. Like, you’d have to save a copy of the TLS session and then have libraries that could read and validate an already-generated session. The timestamp is already embedded in the session.

      Some protocols, like OTR, are designed to specifically not allow that, but AFAIK, TLS could.

      EDIT: Well, technically the timestamp is gonna be during the handshake, not tied to the HTTP request internal to the TLS session. It might be possible to game that by establishing a TLS session, holding it open without activity, and issuing a request much later. I’d think that that’d potentially be disallowed by Web servers one way or another, since otherwise you could probably do a denial-of-service attack by holding open a lot of sessions for a long time.

      EDIT2: Oh, wait, no, shouldn’t be an issue, because the HTTP Date response header is gonna have a timestamp tied to the response.

        • tal@lemmy.today
          link
          fedilink
          English
          arrow-up
          1
          ·
          3 days ago

          Unfortunately, it’ll be more than that, as that’ll be saving the plaintext files transferred internal to the TLS connection. The information that would need to be saved will normally just be thrown out, as it’ll be the TLS connection itself.

          On second thought, though, I don’t think that it’d be viable, since the way that something like this normally works is to just use (slow) public key encryption to transfer a symmetric session key and to then use (fast) symmetric encryption on the bulk data, and once you have a copy of the session key, you could forge whatever you want with it. This would only work if you were using asymmetric encryption to encrypt the data in the connection.

          kagis

          https://www.cloudflare.com/learning/ssl/what-is-a-session-key/

          What is a session key? Session keys and TLS handshakes

          The TLS (historically known as “SSL”) protocol uses both asymmetric/public key and symmetric cryptography, and new keys for symmetric encryption have to be generated for each communication session. Such keys are called “session keys.”

          Yeah. Oh, well. It was a happy thought for a moment.

    • PastafARRian@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      5
      ·
      3 days ago

      Don’t forget, Reddit is legally allowed to train on your content, but not the other way around. It’s consistent with US law, where corporate tax is half of income tax.