Skip Navigation
InitialsDiceBearhttps://github.com/dicebear/dicebearhttps://creativecommons.org/publicdomain/zero/1.0/„Initials” (https://github.com/dicebear/dicebear) by „DiceBear”, licensed under „CC0 1.0” (https://creativecommons.org/publicdomain/zero/1.0/)DA
datahoarder @lemmy.ml

Update: Downloading all archive.org metadata

Following up from my previous post.

I used the API at https://archive.org/developers/changes.html to enumerate all the item names in the archive. Currently there are over 256 million item names. However I went through a sample of them and noted the following:

There are many, many items from the archive which have been removed. Much higher than I expected. If you have critical data, of course Internet Archive should never be your only backup.

I don't know the distribution of metadata and .torrent file sizes since i have not tried downloading them yet. It looks like it would require a lot of storage if there are many files or the content is huge (if only 50% of the items remain and the average .torrent + metadata is 20KB it would be over 2.5 TB to store). But on the other hand, the archive has a lot of random one off uploads that are not very big, so some metadata is 800 bytes and the torrent 3KB in those cases (only 640 GB to store if combined is 5 KB).

3 comments
  • Thanks for the update. In my recent research and backing up stuff, there was numerous content that are no longer available. But the entries are there, only the files are not. I think some files appeared back at later time. My assumption was most data is coming back slowly from a backup, if they have any. Torrent and metadata files are generated automatically. That means if they are deleted, then they would be rebuild after some time, I assume. There is so much data, I have no idea how long this would take...

    I will keep looking at some files again and again, to see if they come back. Otherwise we lost a lot of data and history.