There really should be more private trackers on I2P. Once you get over the slower speeds, there are a lot of benefits, like not really needing to port forward to be connectable (but you should to support the I2P network).
You probably aren't connectable from the outside. You need to port forward to support the network.
IPFS is kind of like the BitTorrent DHT. Every file is indexed by a cryptographic hash and can represent anything. There is no general way to search it, but you can build an index and search the index like how torrent search engines work.
Permanently Deleted
You don't need a VPN if you use I2P!
Yes, exactly why I wanted to start this project. It's nice to have the Internet Archive but we cannot trust that content won't be taken down eventually. Even just storage costs might become an issue in the future for data that gets maybe 30 total views over many years. But it is nice to hear some of the data you were looking at is coming back.
Long term, it would be nice for a community of users to create a decentralized index of Internet Archive metadata so it cannot get taken down and has the torrent files of the content so people can share it and participate in the seeding for the content they care about. The Internet Archive might cooperate to make it easier to do this, for example by using Bittorrent v2 which would help us detect file duplication and not have to use padding files since all files are aligned to pieces in v2.
Currently there is little incentive for people to seed the Internet Archive content but no doubt it will become more important to do that in the future.
Update: Downloading all archive.org metadata
Following up from my previous post.
I used the API at https://archive.org/developers/changes.html to enumerate all the item names in the archive. Currently there are over 256 million item names. However I went through a sample of them and noted the following:
- Many do not have the .torrent available because some of the files are locked due to copyright concerns, like their music collection. Ex: https://archive.org/details/lp_le-sonate-per-pianoforte-vol-1_carl-maria-von-weber-dino-ciani_0
- A lot of items have been removed from public access completely, and possibly deleted even on their storage backend. Ex: https://archive.org/details/0-5-1-0-hernan-hernandez
There are many, many items from the archive which have been removed. Much higher than I expected. If you have critical data, of course Internet Archive should never be your only backup.
I don't know the distribution of metadata and .torrent file sizes since i have not tried downloading them
The link to the above release post has the wrong caption for me. Its title says "Ambulance hits Oregon cyclist, rushes him to hospital, then sticks him with $1,800 bill, lawsuit says - Divisions by zero"
Yes, I think so. I'll definitely use the example for downloading some of the files (.torrent, metadata file) once I have some items. But first I need to find all the items ever uploaded.
Thank you for the tips. I am actually interested in enumerating metadata for all the "items" as defined by the API page ever uploaded. For example, one item = one ID:
Archive.org is made up of “items”. An item is a logical “thing” that we represent on one web page on archive.org. An item can be considered as a group of files that deserve their own metadata.
You did cause me to look at the API docs again, though, and I think I found something that does enumerate all item names, and as a bonus, it will keep you updated when changes are made: https://archive.org/developers/changes.html
We'll see how much progress I can make. It might take a while to get through all the millions of them.
Downloading all archive.org metadata
I'd love to know if anyone's aware of a bulk metadata export feature or repository. I would like to have a copy of the metadata and .torrent files of all items.
I guess one way is to use the CLI but this relies on knowing which item you want and I don't know if there's a way to get a list of all items.
I believe downloading via BitTorrent and seeding back is a win-win: it bolsters the Archive's resilience while easing server strain. I'll be seeding the items I download.
Edit: If you want to enumerate all item names in the entire archive.org repository, take a look at https://archive.org/developers/changes.html. This will do that for you!
Whatever happened to DNA-based storage research?
It seems like 6 or 7 years ago there was research into new forms of storage, using crystals or DNA that promised ultra high density storage. I know the read/write speed was not very fast, but I thought by now there would be more progress in the area. Apparently in 2021 there was a team that got a 16GB file stored in DNA. In the last month there's some company (Biomemory) that lets you store 1KB of data into DNA for $1,000, but if you want to read it, you have to send it to them. I don't understand why you would use that today.
I wonder if it will ever be viable for us to have DNA readers/writers... but I also wonder if there are other new types of data storage coming up that might be just as good.
If you know anything about the DNA research or other new storage forms, what do you think is the most promising one?
This was something I suggested for this instance, since there is even a guide for hosting an onion service: https://lemmy.dbzer0.com/post/135234
Maybe /u/db0 will have more time after the spam settles down, but it seems he's got a lot on his plate at the moment between being an admin and doing AI stuff.
Prediction: AT-style decentralized hoarding of the web
The more that content on the web is "locked down" with more stringent API requests and identity verification, e.g. Twitter, the more I wonder if I should be archiving every single HTTP request my browser makes. Or, rather, I wonder if in the future there will be an Archive Team style decentralized network of hoarders who, as they naturally browse the web, establish and maintain an archive collectively, creating a "shadow" database of content. This shadow archive is owned entirely by the collective and thus requests to it are not subject to the limitations set by the source service.
The main point is that the hoarding is not distinguishable from regular browsing from the perspective of the source website, so the hoarding system can't be shut down without also giving up access to regular users.
Verification that the content actually came from the real service could probably be done using the HTTPS packets themselves, and some sort of reputation system could prevent the source websites
Note that H.264 and H.265 are the video compression standards and x264 and x265 are FOSS video encoding libraries developed by VideoLAN.
Warez: Do you pirate software or just use FOSS?
In the past, most software I used was paid and proprietary and would have some sort of limitation that I would try to get around by any means possible. Sometimes that would be resetting the clock on my computer, disabling the internet, and other times downloading a patch.
But in the past few years I've stopped using those things and have focused only on free and open source software (FOSS) to fulfill my needs. I hardly have to worry about privacy problems or trying to lock down a program that calls home. I might be missing out on some things that commercial software delivers, but I'm hardly aware of what they are anymore. It seems like the trend is for commercial software providers to migrate toward online or service models that have the company doing all the computing. I'm opposed to that, since they can take away your service at any time.
What do you do?
Proton is a good service, but their years of reluctance to include more anonymous payment methods such as Monero and the inability to register an account from an anonymous IP address without a phone number makes me question the relative benefit of using them as a VPN.
These do not by themselves result in a compromise of anonymity if Proton is trustworthy and the Swiss laws still enable them to disassociate your identity (given via payments) and your account usage, but regulation and governments tend to become stricter rather than looser over time and I would demand more from a service you are entrusting with all your internet traffic.