- cross-posted to:
- technology@beehaw.org
- news@lemmy.world
- cross-posted to:
- technology@beehaw.org
- news@lemmy.world
cross-posted from: https://feddit.org/post/2958203
There is an interesting study (May 2024), also linked in the article: When Online Content Disappears
Historians of the future may struggle to understand fully how we lived our lives in the early 21st Century. That’s because of a potentially history-deleting combination of how we live our lives digitally – and a paucity of official efforts to archive the world’s information as it’s produced these days.
However, an informal group of organisations are pushing back against the forces of digital entropy – many of them operated by volunteers with little institutional support. None is more synonymous with the fight to save the web than the Internet Archive, an American non-profit based in San Francisco, started in 1996 as a passion project by internet pioneer Brewster Kahl. The organisation has embarked what may be the most ambitious digital archiving project of all time, gathering 866 billion web pages, 44 million books, 10.6 million videos of films and television programmes and more. Housed in a handful of data centres scattered across the world, the collections of the Internet Archive and a few similar groups are the only things standing in the way of digital oblivion.
“The risks are manifold. Not just that technology may fail, but that certainly happens. But more important, that institutions fail, or companies go out of business. News organisations are gobbled up by other news organisations, or more and more frequently, they’re shut down,” says Mark Graham, director of the Internet Archive’s Wayback Machine, a tool that collects and stores snapshots of websites for posterity. There are numerous incentives to put content online, he says, but there’s little pushing companies to maintain it over the long term.
Despite the Internet Archive’s achievements thus far, the organisation and others like it face financial threats, technical challenges, cyberattacks and legal battles from businesses who dislike the idea of freely available copies of their intellectual property. And as recent court losses show, the project of saving the internet could be just as fleeting as the content it’s trying to protect.
“More and more of our intellectual endeavours, more of our entertainment, more of our news, and more of our conversations exist only in a digital environment,” Graham says. “That environment is inherently fragile.”
What?
Yes, I know about the web archive. And I know that you can pull data without being logged in. But 100% of that data can be DMCAed at any point.
Wanna watch a trick?
https://tinyurl.com/missingf35
You can follow that link, it’s perfectly safe, and rather funny no less. It links to the archive…
https://web.archive.org/web/20230919001454if_/https://charleston.craigslist.org/avo/d/mount-pleasant-stealth-fighter/7667184419.html
Note the if_ after the date/time code. That bypasses their banner. None of my links are anywhere on the frontend of the archive, you literally have to know every link to find my archives.
And most of my archives aren’t even of websites, most of them are direct file downloads of older operating systems and games and stuff. Not like I’m about to share any of those here though.
I’ve been doing that for years and they haven’t found or removed a single thing I’ve archived. If they ever do, well so be it, but none of it is on the frontend, and the links are so obscure that there’s basically zero chance of anyone just randomly guessing them.
Somebody can always just get an offline copy of that data, that kever hits the internet so company’s won’t know where it is so it can’t be dmca’d.
A local copy on a single person’s storage that isn’t available for future researchers, isn’t exactly Meeting the requirements of this article.
I have a copy of slashdot when they turned it pink for April fools day. Does anyone know that? No. Could someone find it if they wanted to read it? No. Is that helpful for preservation? No. To be helpful I’d have to make it available and searchable. You know what that does? Makes it so it can be DCMA’d.
They can always make a torrent of it and share it like that if they are in a country with barelly any dmca laws.
Big “if” though, and that would be contingent on the fact that the data is desirable enough that other people are willing and able to host it long-term, even before being able to find a country like that, and set up a torrent. I’ve a few torrents that are dead now, for example, because people weren’t that interested in keeping a copy of what they pointed to/the tracker no longer works.
You’d still need to share the torrent to spread it anyhow, and that runs into the DMCA issue all over again. The pirate bay only hosted torrents and magnet links, but it still got shut down for piracy, way back when. “facilitating pervasive online infringement [of copyright]” is something that can get you shut down, as Limewire found out.
Actually no. They make it difficult and “don’t allow” people downloading data from the wayback machine.
Funny you’d say that. If you manipulate the link and add if_ or fw_ after the date code, you can most certainly download files directly from the wayback machine.
Oo can I have an example.
https://web.archive.org/web/20230919001454if_/https://charleston.craigslist.org/avo/d/mount-pleasant-stealth-fighter/7667184419.html
Edit: Yes that’s clearly an archived website and not a file to direct download, but that same if_ banner bypass trick works just the same for individual file downloads.
Thank you