Hey guys. About a week ago I posted this post asking about sources for public domain data. Because what good is a search engine, if there is nothing to search for! After digging around, it seemed like the Internet Archive was the best bet, but after about 5 minutes, I realized that once I do find something to upload to Autonomi, I have to write all the metadata, which is a very tedious thankless job, and if I didn’t want to do it, I knew nobody else would either. What I needed was a way to automate all this, which is what this is! This is all bundled in the colony-utils repo. Binaries are prebuilt for your convenience for Linux, Windows, and Mac. Grab the latest release here.
ia_downloader - Internet Archive Downloader
A tool for downloading content from the Internet Archive and preparing it for upload to Autonomi.
Key Features:
Downloads files from Internet Archive URLs with specified file extensions
Enhanced metadata extraction using multiple sources (Internet Archive, external APIs, AI via hugging face)
Progress tracking
Automatic thumbnail downloading and processing
Generates JSON-LD metadata
Configurable AI-powered metadata enhancement
Output structure for
colony_uploader
integration
colony_uploader - Bulk Upload Tool
A tool for uploading downloaded Internet Archive content to the Autonomi network via colonyd
.
Key Features:
Parallel processing of upload directories (EXPERIMENTAL: doesn’t work for me, let me know if it works for you!)
Real-time progress tracking
Cost tracking (ANT tokens and ETH gas fees)
Metadata upload to colony pods with proper JSON-LD formatting
Upload statistics and timing information
Quick Start
First you need to setup and have running colonyd
in a separate terminal. In colonyd
you’ll need to plug in your wallet information so you can pay for uploads. Like anything else, this is new software. I didn’t lose any money running it, but it doesn’t mean you won’t, so don’t use your lambo-when-moon ANT wallet to plug into this thing, use small amounts.
With colonyd
setup and running elsewhere:
# create a new pod for stuff go to into
colony add pod "My Stuff"
# Then find a link on IA of something you want to download. Note that this thing is
# currently tuned for ebooks. Your mileage may vary with other data types at this time until I get it tuned properly
# after you found your link, call ia_downloader specifying the pod where you want stuff to go and the data types you want to download
ia_downloader "My Stuff" "https://archive.org/details/george-orwell-1984_202309" "pdf,epub"
When its done you’ll see an output like this:
🏛️ Internet Archive Downloader
📋 Identifier: george-orwell-1984_202309
📁 Extensions: pdf
📂 Created directory: colony_uploader/george-orwell-1984_202309
⠁ 📋 Downloading files list... 📝 Pod name saved: colony_uploader/george-orwell-1984_202309/pod_name.txt
✅ Downloaded metadata for: 1984
👤 Author: George Orwell
🖼️ Thumbnail: ant://fb995d624684c50d9c5d0101a28729f8f5d003e6d63726f81a43861b8df048ef
🔍 Enhancing metadata... Trying multiple sources + AI
✅ AI enhancement successful (bart-large-cnn)
✅ Metadata enhanced from: Internet Archive (Enhanced) + Open Library + Wikipedia + AI
📁 Found 1 files to download
📂 Created directory: colony_uploader/george-orwell-1984_202309/pdf
✅ George Orwell - 1984.pdf -> ant://48bd835e4eef2631c66a840904a5cc114ca2403f2aa63b3d00aa789f6f1f2632
📊 Download Summary:
📁 Files downloaded: 1
💾 Total size: 1. MB (1644307 bytes)
🔍 Metadata source: Internet Archive (Enhanced) + Open Library + Wikipedia + AI
🖼️ Thumbnail: Downloaded
🎉 Download completed! Files saved to: colony_uploader/george-orwell-1984_202309
And it will create a colony_uploader
directory in the PWD where you’re working. In this directory you’ll find the files that you downloaded and a metadata.json file(s) that look like this:
{
"@context": {
"schema": "http://schema.org/"
},
"@id": "ant://48bd835e4eef2631c66a840904a5cc114ca2403f2aa63b3d00aa789f6f1f2632",
"@type": "schema:Book",
"schema:alternateName": "1984",
"schema:author": "George Orwell",
"schema:contentSize": "1644307",
"schema:datePublished": "1949",
"schema:description": "The story takes place in an imagined future in an unknown year believed to be 1984. Great Britain, now known as Airstrip One, has become a province of the totalitarian superstate Oceania. Orwell modelled the authoritarian state in the novel on the Soviet Union in the era of Stalinism, and Germany under the Third Reich.",
"schema:encodingFormat": "application/pdf",
"schema:image": "ant://fb995d624684c50d9c5d0101a28729f8f5d003e6d63726f81a43861b8df048ef",
"schema:inLanguage": "eng",
"schema:keywords": "ebook, pdf, orwell, 1984",
"schema:name": "George Orwell - 1984.pdf"
}
What I found is that the Internet Archive metadata actually sucks. So I also search wikipedia and Open Library, but due to redirects and bad IA descriptions of what the book actually is, that typically sucks too. So I also can use a hugging face API token to query one of the AI models and optimize the schema description. It isn’t perfect, but it sure beats doing this by hand! At this point feel free to edit whatever metadata you want in here or if there are multiple copies of stuff, you can delete those as well. Its good to have a human touch in this process. Later, I’ll add plugins for MusicBrainz and TMDB to get better metadata for music and video.
So now you’ve got some stuff and you want to upload it. All you need to do is run this:
colony_uploader
Depending on how much stuff you have, this will probably run for a while. It will upload all of the files, the metadata, and upload your pod to Autonomi.
While running it looks like this (I’m uploading a collection of public domain “banned” books and a couple Charlie Chaplin films currently ):
🏛️ Colony Uploader
🌐 Server: 127.0.0.1:3000
🧵 Threads: 1
📁 Directory: colony_uploader
🔐 Enter colonyd password: [hidden]
🔑 Authenticating with colonyd...
✅ Authentication successful
💰 Initial wallet balance: 0.010900 ETH
🔍 Scanning for uploader directories...
📁 Found 26 uploader directories
⠤ [################>-----------------------] 11/26 Processing directories...
✅ george-orwell-1984_202309 Success
✅ tarzanofapes00burruoft Success
✅ candide00volt_1 Success
✅ wonderfulwizardo00baumiala Success
✅ ulyssesshake1922_1large Success
✅ pointcounterpoin0000aldo_k3u2 Success
✅ jungle00sinc Success
✅ wealthofnations00smituoft Success
✅ beautifuldamned00fitzrich Success
✅ warofworlds00welluoft Success
✅ 6edoriginspecies00darwuoft Success
⠁ 🔄 charlie-chaplin-his-prehistoric-past-1914 📤 Charlie Chaplin - His Prehistoric Past (1914).ia.mp4
⠁ ⏳ callofthewild00lond Pending...
⠁ ⏳ in.ernet.dli.2015.238114 Pending...
⠁ ⏳ holybiblecontain00philuoft Pending...
⠁ ⏳ socialismutopian39257gut Pending...
⠁ ⏳ madamebovary00flau_5 Pending...
⠁ ⏳ adventuresoftoms00twaiiala Pending...
⠁ ⏳ fairytalesofbrot00grim Pending...
⠁ ⏳ plainliteraltran10burtuoft Pending...
⠁ ⏳ thestateandrevolutionbyv.i.lenin Pending...
⠁ ⏳ bostoniansnovel00jamerich Pending...
⠁ ⏳ the-communist-manifesto_202507 Pending...
⠁ ⏳ blackbeauty00seweiala Pending...
When its done it will tell you how long it took, how much data was uploaded, the ANT cost, and the ETH gas cost.
Then pat yourself on the back. Thank you for populating data to Autonomi (and making our node runners work for thier ANT )
Adventures in vibe coding (opinion part)
After some discussions on the forum about using AI agents to write code, I decided this would be a good vehicle to do a test to see if I could make something practical without coding it myself. No I did not code either the colony_uploader
or ia_downloader
programs at all. These were built with a clean thought out description of exactly what I wanted and a few iterations to perfect the UI. I thought of the idea Friday evening, kicked off the agent Saturday afternoon, finished it by Sunday, and today I’m uploading stuff to main net. Based on the stop watch, I spent 8 hours 42 minutes scoping out the architecture from scratch, interacting with the AI agent, and debugging/reading through the resultant code. 2046 lines of agent generated stuff that accomplishes exactly what I set out to do, all being generated while I was busy refactoring all of the wallet handling code in colonyd, the colony CLI, and some tweaks to colonylib. Is the code pretty? Nope. Could a well seasoned rust coder have done it better by hand? Absolutely. Could I have built this over a weekend by myself? Not a chance. I wouldn’t have even tried. And that I think is the point. Because AI agents exist, we have this tool, without this technology, we wouldn’t.
(end opinion)
For those that are curious, I’ve been using the free tier of AugmentCode for this stuff.
Hope you all find this useful! If you use the tool, please share your pod addresses so we can find your uploads!
What about Colony GUI?
I’m busy polishing it with my frontend guy. We’ll be making the official release Wednesday, stay tuned!