This is a question that comes to mind every time I spend a few days focusing on the fediverse. Normally I’m on the microblogging side, but now I have a Lemmy account it might start a proper discussion.

So, to the point, pretty much every fedi platform has similar problems with small servers taking a beating whenever a post goes viral. This ends up costing the server owner a bunch of money trying to keep their server alive while thousands of instances attempt to pull large static files from the original host’s post. This recently instigated this call to action on this forum.

I’ve never seen the question of torrents answered and it feels like a lot of effort and a bit self entitled to get the ear of fedi software devs to implement torrents as a solution, so I’m putting this here.

If media files were made into torrents when a post was being created, an extra object could be added to post objects like

'torrentcdn': {
  'https://imagePathAsKey.jpg': {
    'infohash': 'ba618eab...',
    'torrentLocation': 'https://directlinkto.torrent',
    'webseed': 'https://imagePathAsKey.jpg',
    ...
  }
}

This would not break compatibility as it would just be ignored by anything not looking for a ‘torrentcdn’ object, yet up to date instances could use this instead of directly pulling the static files.

This would benefit instances as when a post goes viral, the load would be distributed amongst all instances attempting to download the file.

This could also benefit clients and instances as larger files like short videos could be distributed using webtorrent, massively reducing the load on server when many people are watching the same video.

Thoughts?

  • key@lemmy.keychat.org
    link
    fedilink
    English
    arrow-up
    44
    ·
    5 months ago

    Mastodon and lemmy handle this in slightly different ways. Mastodon (according to the link) replicates media on every instance while lemmy (mostly) only replicates thumbnails. That means a popular post doesn’t cause load for one server on mastodon but does on lemmy. But Mastodon has a higher aggregate cost due to all the replicated data, which is what the linked proposal solves by making it sublinear.

    If the torrent is instance to instance I don’t see any real benefit (and instance to client is infeasible). On Mastodon side you still have data duplication driving storage costs and bandwidth usage regardless of whether it’s delivered via direct http or torrent. On the lemmy side it wouldn’t gain much (asymmetric load is based on subscription count and so not very bursty) but would add a lot of non-determinism and complexity to the already fragile federation process.

    Conventional solutions like cache/CDN/Object Storage or switching to a shared hosting solution (decoupled from instances like your link proposes) seems like a more feasible way to address things.

    • manicdave@feddit.ukOP
      link
      fedilink
      arrow-up
      11
      ·
      5 months ago

      This is a good answer.

      I’m not sure if I’d agree that instance to client is infeasible though. Peertube does it OK.

      • key@lemmy.keychat.org
        link
        fedilink
        English
        arrow-up
        8
        ·
        5 months ago

        Data size and user expectations is the main difference. It’s possible but there’d be a lot of latency and overhead for just scrolling down a page with a bunch of images. Maybe there’s fancy stuff you could do by batching images together and reusing connection pools but it feels sisyphean.

        • manicdave@feddit.ukOP
          link
          fedilink
          arrow-up
          3
          ·
          5 months ago

          The point would be that it’s a failover. It takes about two seconds for the video here to start streaming from the webseed and that’s probably just the wait for enough video to load in order to render. The standard peers don’t really become load bearing until the server is struggling.

  • atzanteol@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    28
    ·
    5 months ago

    Bit torrent is optimized for “lots of users downloading large files over low-bandwidth connections” which is sorta the opposite of this scenario.

    My understanding of the bit torrent protocol is that it’s not very efficient for “small” files or low latency. The overhead of connecting could add latency that effects the end user experience.

    • manicdave@feddit.ukOP
      link
      fedilink
      arrow-up
      6
      arrow-down
      2
      ·
      edit-2
      5 months ago

      I’m thinking in terms of what happens when someone on a $5 VPS hosting plan uploads a large image or small video and a thousand other instances want to grab it. The latency of a torrent isn’t as much of a problem as the server falling over. This is for propogation between servers rather than when a user requests a file.

  • Admiral Patrick@dubvee.org
    link
    fedilink
    English
    arrow-up
    19
    arrow-down
    1
    ·
    5 months ago

    I just put a caching layer in front of my /pictrs path. Problem solved.

    I use Nginx for that, but some instances use Cloudflare which can be configured similarly.

    • manicdave@feddit.ukOP
      link
      fedilink
      arrow-up
      6
      ·
      5 months ago

      You could just have a standard peertube instance hidden away on the backend and use the peertube embed code to insert videos into your microblog and pretend the Peertube instance doesn’t exist.

      I’ve played with peertube a lot, and as long as your cross site permissions are set up correctly, you can access the player API from your host site.

  • TCB13@lemmy.world
    link
    fedilink
    English
    arrow-up
    6
    ·
    5 months ago

    I believe what you’re suggesting is more around the lines of IPFS and I2P, but for large media files webtorrents could be a great solution.

    • rmuk@feddit.uk
      link
      fedilink
      English
      arrow-up
      4
      ·
      5 months ago

      IPFS was my first thought. I’ve only recently started using it, but it’s pleasantly surprised me so far.

    • manicdave@feddit.ukOP
      link
      fedilink
      arrow-up
      3
      ·
      5 months ago

      I wish IPFS was a solution but it’s just broken. I’ve got goto social running on a raspberry pi on a residential connection. If I try to run IPFS, my router crashes as it seems to try and connect to every peer on the network.

  • pe1uca@lemmy.pe1uca.dev
    link
    fedilink
    arrow-up
    5
    ·
    5 months ago

    How torrents validate the files being served?

    Recently I read a post where OP said they were transcoding torrents in place and still seeding them, so their question was if this was possible since the files were actually not the same anymore.
    A comment said yes, the torrent was being seeded with the new files and they were “poisoning” the torrent.

    So, how this can be prevented if torrents were implemented as a CDN?
    An in general, how is this possible? I thought torrents could only use the original files, maybe with a hash, and prevent any other data being sent.

    • manicdave@feddit.ukOP
      link
      fedilink
      arrow-up
      11
      ·
      5 months ago

      I don’t know what that post is about. It’s not possible to change the contents of a torrent. The torrent file itself is a list of checksums which validate byte ranges within the files being downloaded. If a client downloads a poisoned piece, it discards it and deprioritises the seed it got it from. Perhaps they’re transcoding a file, whilst still seeding the original.

      Torrents can work as a CDN for static files, because the downloader has to validate that the file is the same one as on the server using the checksums in the torrent file.

      • pe1uca@lemmy.pe1uca.dev
        link
        fedilink
        arrow-up
        3
        ·
        5 months ago

        Yeah, I just searched a bit and found this https://stackoverflow.com/questions/28348678/what-exactly-is-the-info-hash-in-a-torrent-file

        The torrent file contains the hashes of each piece and the hash of the information about the files with the hashes of the pieces, so they have complete validation of the content and amount of files being received.
        I wonder if the clients only validate when receiving or also when sending the data, this way maybe the seeding can be stopped if the file has been corrupted instead of relaying on the tracker or other clients to distrust someone that made a mistake like the OP of that post.

      • William@lemmy.world
        link
        fedilink
        arrow-up
        7
        ·
        5 months ago

        A torrent link won’t either? In either situation, the site needs to seed their own data, at a minimum.

        • manicdave@feddit.ukOP
          link
          fedilink
          arrow-up
          3
          ·
          5 months ago

          A torrent file and a webseed is enough. The client uses the torrent file to validate the download from a standard http source.

          The webseed can be the same source as the file your browser would normally download.

          So yeah the site needs to seed the file, but not necessarily using a torrent client.

  • steventrouble@programming.dev
    link
    fedilink
    arrow-up
    5
    ·
    edit-2
    5 months ago

    Great question! A distributed systems expert is the person you want to ask.

    This sounds like distributed systems 101, specifically the multicast problem, which has a lot of possible solutions, none of which Lemmy is using. In particular, gossip seems like a good candidate.

    Basically, when trying to download a media file from an instance, lemmy servers could ping a few other nodes with less traffic first to see if they have that file before trying to fetch it from the origin server.

    (Lemmy isn’t big enough that we should need to use gossip yet, though. I agree with the other post saying that some basic caching would go a long way.)

    Internally, torrent uses similar strategies to spread out traffic, but because it’s focused on longer term storage of larger files, it would be too wasteful and slow to use for individual png files.

  • thingsiplay@beehaw.org
    link
    fedilink
    arrow-up
    3
    ·
    5 months ago

    I’m not sure if this is relevant, but the first thought reading this is, aren’t the IPs shared too? I mean everyone connecting to the Torrent / page would be able to read from everyone connecting to it. But again, I have no idea if this is applicable in such a case. It’s just what I think when reading Torrent, that’s the extent of my knowledge.