:IF: Colony - Internet Archive -> Autonomi bulk transfer tool

Hey guys. About a week ago I posted this post asking about sources for public domain data. Because what good is a search engine, if there is nothing to search for! After digging around, it seemed like the Internet Archive was the best bet, but after about 5 minutes, I realized that once I do find something to upload to Autonomi, I have to write all the metadata, which is a very tedious thankless job, and if I didn’t want to do it, I knew nobody else would either. What I needed was a way to automate all this, which is what this is! This is all bundled in the colony-utils repo. Binaries are prebuilt for your convenience for Linux, Windows, and Mac. Grab the latest release here.

:inbox_tray: ia_downloader - Internet Archive Downloader

A tool for downloading content from the Internet Archive and preparing it for upload to Autonomi.

Key Features:

:classical_building: Downloads files from Internet Archive URLs with specified file extensions
:magnifying_glass_tilted_left: Enhanced metadata extraction using multiple sources (Internet Archive, external APIs, AI via hugging face)
:bar_chart: Progress tracking
:framed_picture: Automatic thumbnail downloading and processing
:memo: Generates JSON-LD metadata
:bullseye: Configurable AI-powered metadata enhancement
:file_folder: Output structure for colony_uploader integration

:outbox_tray: colony_uploader - Bulk Upload Tool

A tool for uploading downloaded Internet Archive content to the Autonomi network via colonyd.

Key Features:

:high_voltage: Parallel processing of upload directories (EXPERIMENTAL: doesn’t work for me, let me know if it works for you!)
:bar_chart: Real-time progress tracking
:money_bag: Cost tracking (ANT tokens and ETH gas fees)
:memo: Metadata upload to colony pods with proper JSON-LD formatting
:chart_increasing: Upload statistics and timing information

:rocket: Quick Start

First you need to setup and have running colonyd in a separate terminal. In colonyd you’ll need to plug in your wallet information so you can pay for uploads. Like anything else, this is new software. I didn’t lose any money running it, but it doesn’t mean you won’t, so don’t use your lambo-when-moon ANT wallet to plug into this thing, use small amounts.

With colonyd setup and running elsewhere:

# create a new pod for stuff go to into
colony add pod "My Stuff"

# Then find a link on IA of something you want to download. Note that this thing is
# currently tuned for ebooks. Your mileage may vary with other data types at this time until I get it tuned properly

# after you found your link, call ia_downloader specifying the pod where you want stuff to go and the data types you want to download
ia_downloader "My Stuff" "https://archive.org/details/george-orwell-1984_202309" "pdf,epub"

When its done you’ll see an output like this:

🏛️ Internet Archive Downloader

📋 Identifier: george-orwell-1984_202309
📁 Extensions: pdf
📂 Created directory: colony_uploader/george-orwell-1984_202309
⠁ 📋 Downloading files list...                                                                                                                                                              📝 Pod name saved: colony_uploader/george-orwell-1984_202309/pod_name.txt
✅ Downloaded metadata for: 1984
👤 Author: George Orwell
🖼️ Thumbnail: ant://fb995d624684c50d9c5d0101a28729f8f5d003e6d63726f81a43861b8df048ef
🔍 Enhancing metadata... Trying multiple sources + AI
✅ AI enhancement successful (bart-large-cnn)
✅ Metadata enhanced from: Internet Archive (Enhanced) + Open Library + Wikipedia + AI
📁 Found 1 files to download
📂 Created directory: colony_uploader/george-orwell-1984_202309/pdf
✅ George Orwell - 1984.pdf -> ant://48bd835e4eef2631c66a840904a5cc114ca2403f2aa63b3d00aa789f6f1f2632

📊 Download Summary: 
   📁 Files downloaded: 1
   💾 Total size: 1. MB (1644307 bytes)
   🔍 Metadata source: Internet Archive (Enhanced) + Open Library + Wikipedia + AI
   🖼️ Thumbnail: Downloaded

🎉 Download completed! Files saved to: colony_uploader/george-orwell-1984_202309

And it will create a colony_uploader directory in the PWD where you’re working. In this directory you’ll find the files that you downloaded and a metadata.json file(s) that look like this:

{
  "@context": {
    "schema": "http://schema.org/"
  },
  "@id": "ant://48bd835e4eef2631c66a840904a5cc114ca2403f2aa63b3d00aa789f6f1f2632",
  "@type": "schema:Book",
  "schema:alternateName": "1984",
  "schema:author": "George Orwell",
  "schema:contentSize": "1644307",
  "schema:datePublished": "1949",
  "schema:description": "The story takes place in an imagined future in an unknown year believed to be 1984. Great Britain, now known as Airstrip One, has become a province of the totalitarian superstate Oceania. Orwell modelled the authoritarian state in the novel on the Soviet Union in the era of Stalinism, and Germany under the Third Reich.",
  "schema:encodingFormat": "application/pdf",
  "schema:image": "ant://fb995d624684c50d9c5d0101a28729f8f5d003e6d63726f81a43861b8df048ef",
  "schema:inLanguage": "eng",
  "schema:keywords": "ebook, pdf, orwell, 1984",
  "schema:name": "George Orwell - 1984.pdf"
}

What I found is that the Internet Archive metadata actually sucks. So I also search wikipedia and Open Library, but due to redirects and bad IA descriptions of what the book actually is, that typically sucks too. So I also can use a hugging face API token to query one of the AI models and optimize the schema description. It isn’t perfect, but it sure beats doing this by hand! At this point feel free to edit whatever metadata you want in here or if there are multiple copies of stuff, you can delete those as well. Its good to have a human touch in this process. Later, I’ll add plugins for MusicBrainz and TMDB to get better metadata for music and video.

So now you’ve got some stuff and you want to upload it. All you need to do is run this:

colony_uploader

Depending on how much stuff you have, this will probably run for a while. It will upload all of the files, the metadata, and upload your pod to Autonomi.

While running it looks like this (I’m uploading a collection of public domain “banned” books and a couple Charlie Chaplin films currently :smile:):

🏛️ Colony Uploader

🌐 Server: 127.0.0.1:3000
🧵 Threads: 1
📁 Directory: colony_uploader

🔐 Enter colonyd password: [hidden]
🔑 Authenticating with colonyd...
✅ Authentication successful
💰 Initial wallet balance: 0.010900 ETH
🔍 Scanning for uploader directories...
📁 Found 26 uploader directories

⠤ [################>-----------------------] 11/26 Processing directories...
  ✅ george-orwell-1984_202309 Success
  ✅ tarzanofapes00burruoft Success
  ✅ candide00volt_1 Success
  ✅ wonderfulwizardo00baumiala Success
  ✅ ulyssesshake1922_1large Success
  ✅ pointcounterpoin0000aldo_k3u2 Success
  ✅ jungle00sinc Success
  ✅ wealthofnations00smituoft Success
  ✅ beautifuldamned00fitzrich Success
  ✅ warofworlds00welluoft Success
  ✅ 6edoriginspecies00darwuoft Success
⠁ 🔄 charlie-chaplin-his-prehistoric-past-1914 📤 Charlie Chaplin - His Prehistoric Past (1914).ia.mp4
⠁ ⏳ callofthewild00lond Pending...
⠁ ⏳ in.ernet.dli.2015.238114 Pending...
⠁ ⏳ holybiblecontain00philuoft Pending...
⠁ ⏳ socialismutopian39257gut Pending...
⠁ ⏳ madamebovary00flau_5 Pending...
⠁ ⏳ adventuresoftoms00twaiiala Pending...
⠁ ⏳ fairytalesofbrot00grim Pending...
⠁ ⏳ plainliteraltran10burtuoft Pending...
⠁ ⏳ thestateandrevolutionbyv.i.lenin Pending...
⠁ ⏳ bostoniansnovel00jamerich Pending...
⠁ ⏳ the-communist-manifesto_202507 Pending...
⠁ ⏳ blackbeauty00seweiala Pending...

When its done it will tell you how long it took, how much data was uploaded, the ANT cost, and the ETH gas cost.

Then pat yourself on the back. Thank you for populating data to Autonomi (and making our node runners work for thier ANT :laughing:)

Adventures in vibe coding (opinion part)

After some discussions on the forum about using AI agents to write code, I decided this would be a good vehicle to do a test to see if I could make something practical without coding it myself. No I did not code either the colony_uploader or ia_downloader programs at all. These were built with a clean thought out description of exactly what I wanted and a few iterations to perfect the UI. I thought of the idea Friday evening, kicked off the agent Saturday afternoon, finished it by Sunday, and today I’m uploading stuff to main net. Based on the stop watch, I spent 8 hours 42 minutes scoping out the architecture from scratch, interacting with the AI agent, and debugging/reading through the resultant code. 2046 lines of agent generated stuff that accomplishes exactly what I set out to do, all being generated while I was busy refactoring all of the wallet handling code in colonyd, the colony CLI, and some tweaks to colonylib. Is the code pretty? Nope. Could a well seasoned rust coder have done it better by hand? Absolutely. Could I have built this over a weekend by myself? Not a chance. I wouldn’t have even tried. And that I think is the point. Because AI agents exist, we have this tool, without this technology, we wouldn’t.

(end opinion)

For those that are curious, I’ve been using the free tier of AugmentCode for this stuff.

Hope you all find this useful! If you use the tool, please share your pod addresses so we can find your uploads!

What about Colony GUI?

I’m busy polishing it with my frontend guy. We’ll be making the official release Wednesday, stay tuned!

19 Likes

Great to see big dataset onramping start to become a reality, especially as many of them (IA included) start to censor their datasets.

8 Likes

Thanks for this! It would be great to see what the inputs/outputs looked like, i.e. ‘showing the working’. I’d certainly appreciate the input on what programming with AI might look like in a practical sense.

I appreciate you may not have this data still, but it would be great to see.

2 Likes

Once this is tested a bit, and someone can create a video demo of retrieving IA content this would be a great way to introduce the idea to Brewster (IA founder).

He’s responsive on Mastodon and has told me he’s interested in Autonomi for this. He’s also had conversations with David dating back to at least 2018.

@Gill_McLaughlin is this something Autonomi want to take the lead on or should the community raise it once we have something to show him?

I don’t know if he’s still active on his X account.

11 Likes

With @zettawatt contributing tooling maybe Autonomi would contribute ANT to IA? Free uploading of their data, + free retrieval and serving of the data through their archive.org domain using @Traktion’s anttpd proxy, would dramatically reduce their operating costs while still allowing them to solicit donations. Could be very appealing to them and would be a ‘killer app’ for Autonomi, valuable for a forthcoming marketing campaign.

5 Likes

@zettawatt is it possible to adapt the code to also crawl arxiv papers (https://arxiv.org/) ? If so, that may be a more manageable (size-wise + easy of use) than websites, also with better metadata

3 Likes

I saved my original ‘spec’, just in case someone asked :smile:. I wrote my thoughts in an org file (emacs markdown) that looked like this:

* High level description
  - want a fast an easy way to bulk download items from the Internet Archive and upload them to Autonomi and add relevant metadata using colony
  - minimal inputs required from the user. Must be as automated as possible
  - command line interface should be well documented and interactive using the same libraries as used in colonyd.rs and colony.rs
  - should leverage the colonyd REST API to handle colony metadata transactions
* Program details
  - Broken into 2 separate rust programs: ia_downloader.rs and colony_uploader.rs
    - create both of these in the src/bin directory
  - For now, just do the ia_downloader.rs, the colony_uploader.rs will come next
** ia_downloader.rs
*** arguments
    - pod name or address to record meta data in, for example: "genesis pod" or "12345bcdabcdef123467890a5678ef123456789090abcdef7890abcdef123456"
      - the pod must already have been created by the colony program for it to be found
    - html link where the content lives on the internet archive, for example: https://archive.org/details/george-orwell-1984_202309
    - comma separated list of file extensions to get, for example: "pdf,txt,epub"
    - optional uploader directory, default to colony_uploader and create this directory in the PWD if it doesn't exist and the argument is not specified
*** functionality
    - For the link given, create a new sub-directory in the uploader directory with the name of the internet archive object being downloaded
      - For example, the given link: https://archive.org/details/george-orwell-1984_202309 would have a sub-directory called "george-orwell-1984_202309"
    - For the link given, download the metadata file  in the newly created sub-directory.
      - For the example link above, it looks like this: https://archive.org/download/george-orwell-1984_202309/george-orwell-1984_202309_meta.xml
    - For the link given, download the files list XML file in the newly created sub-directory
      - For the example link above, it looks like this: https://archive.org/download/george-orwell-1984_202309/george-orwell-1984_202309_files.xml
    - Parse the files list XML file for the file names with the extensions listed
      - The XML file will have a "file name" identifier. To find the 'pdf' file extension file, the XML line would look
        something like this: <file name="George Orwell - 1984.pdf" source="original">
        - only the string after the `<file_name=` is what should be checked for the file extension
      - if a file with the matching extension is not found, throw a warning and gracefully continue
    - For each extension found, create sub-directories in the object directory using the suffix as the directory name
      - For example, for the 'pdf' extension, the path would be "george-orwell-1984_202309/pdf"
    - Download each of the files found in the XML based on the requested file extensions into their respective extension directory
      - Multiple files exist on this page. Download the files that match the file extensions in the file extension list. For example:
        - https://archive.org/download/george-orwell-1984_202309/George%20Orwell%20-%201984.pdf would download into "george-orwell-1984_202309/pdf"
        - https://archive.org/download/george-orwell-1984_202309/George%20Orwell%20-%201984_djvu.txt would download into "george-orwell-1984_202309/txt"
        - https://archive.org/download/george-orwell-1984_202309/George%20Orwell%20-%201984.epub would download into "george-orwell-1984_202309/epub"
    - For each downloaded file, run the Autonomi command to get the address where it will exist on the public network
      - See the 'data_put_public()' function in the Autonomi APIs here: https://github.com/maidsafe/autonomi/blob/main/autonomi/src/client/high_level/data/public.rs#L38
        This function returns the data address. Should be able to use the same calculation this code uses to get the map_xor_name value
    - Next write the JSON-LD description from the metadata file and data that can come from the file itself
      - Write the JSON-LD formatted file into a new file call metadata.json and place it into the respective extension directory
      - The schema used for the JSON-LD comes from schema.org
        - The "schema:name" is the name of the file
        - The "schema:author" is the "creator" from the XML file
        - The "schema:contentSize" is the size of the file in bytes
        - The "schema:encodingFormat" is IANA listed format based on the file extension
          - is there a way to list all of these or do a lookup? Unsure the best way to query for these and apply the right value from the file extension.
        - The "schema:description" is the "description" text from the XML file
        - The "@type" is "schema:Book" because the XML lists this as an 'ebook'
          - This mapping is unique for books. This should be some kind of a list that is easy to add to as we add additional type mappings in the future.
        - The "@id" is the Autonomi address calculated in the previous step prefixed by "ant://"
      - As an example, the JSON-LD file should look like this:
        ```
{
  "@context": {"schema": "http://schema.org/"},
  "@type": "schema:Book",
  "@id": "ant://dcb90722cd6c7a3c66527fd8401970cad21cfc61f17e37abd421414ca26900f6",
  "schema:name": "George Orwell - 1984.pdf"
  "schema:description": "1984 is a dystopian social science fiction novel and cautionary tale by English writer George Orwell. It was published on 8 June 1949 by Secker &amp; Warburg as Orwell's ninth and final book completed in his lifetime. Thematically, it centres on the consequences of totalitarianism, mass surveillance and repressive regimentation of people and behaviours within society. Orwell, a democratic socialist, modelled the authoritarian state in the novel on the Soviet Union in the era of Stalinism, and Germany under the Third Reich. More broadly, the novel examines the role of truth and facts within societies and the ways in which they can be manipulated.<br /><br />The story takes place in an imagined future in an unknown year believed to be 1984, when much of the world is in perpetual war. Great Britain, now known as Airstrip One, has become a province of the totalitarian superstate Oceania, which is led by Big Brother, a dictatorial leader supported by an intense cult of personality manufactured by the Party's Thought Police. Through the Ministry of Truth, the Party engages in omnipresent government surveillance, historical negationism, and constant propaganda to persecute individuality and independent thinking.<br /><br />The protagonist, Winston Smith, is a diligent mid-level worker at the Ministry of Truth who secretly hates the Party and dreams of rebellion. Smith keeps a forbidden diary. He begins a relationship with a colleague, Julia, and they learn about a shadowy resistance group called the Brotherhood. However, their contact within the Brotherhood turns out to be a Party agent, and Smith and Julia are arrested. He is subjected to months of psychological manipulation and torture by the Ministry of Love and is released once he has come to love Big Brother.<br /><br />Nineteen Eighty-Four has become a classic literary example of political and dystopian fiction. It also popularised the term "Orwellian" as an adjective, with many terms used in the novel entering common usage, including "Big Brother", "doublethink", "Thought Police", "thoughtcrime", "Newspeak", and "2 + 2 = 5". Parallels have been drawn between the novel's subject matter and real life instances of totalitarianism, mass surveillance, and violations of freedom of expression among other themes. Orwell described his book as a "satire", and a display of the "perversions to which a centralised economy is liable," while also stating he believed "that something resembling it could arrive." Time included the novel on its list of the 100 best English-language novels which were published from 1923 to 2005, and it was placed on the Modern Library's 100 Best Novels list, reaching number 13 on the editors' list and number 6 on the readers' list. In 2003, it was listed at number eight on The Big Read survey by the BBC",
  "schema:author": "George Orwell",
  "schema:contentSize": "1624768",
  "schema:encodingFormat": "application/pdf"
}
        ```
        when the '*_meta.xml' downloaded from the internet archive looks like this:
        ```
<metadata>
<script/>
<identifier>george-orwell-1984_202309</identifier>
<mediatype>texts</mediatype>
<collection>opensource</collection>
<creator>George Orwell</creator>
<date>1949-06-08</date>
<description><b><u><i>1984</i></u></b> is a dystopian social science fiction novel and cautionary tale by English writer George Orwell. It was published on 8 June 1949 by Secker &amp; Warburg as Orwell's ninth and final book completed in his lifetime. Thematically, it centres on the consequences of totalitarianism, mass surveillance and repressive regimentation of people and behaviours within society. Orwell, a democratic socialist, modelled the authoritarian state in the novel on the Soviet Union in the era of Stalinism, and Germany under the Third Reich. More broadly, the novel examines the role of truth and facts within societies and the ways in which they can be manipulated.<br /><br />The story takes place in an imagined future in an unknown year believed to be 1984, when much of the world is in perpetual war. Great Britain, now known as Airstrip One, has become a province of the totalitarian superstate Oceania, which is led by Big Brother, a dictatorial leader supported by an intense cult of personality manufactured by the Party's Thought Police. Through the Ministry of Truth, the Party engages in omnipresent government surveillance, historical negationism, and constant propaganda to persecute individuality and independent thinking.<br /><br />The protagonist, Winston Smith, is a diligent mid-level worker at the Ministry of Truth who secretly hates the Party and dreams of rebellion. Smith keeps a forbidden diary. He begins a relationship with a colleague, Julia, and they learn about a shadowy resistance group called the Brotherhood. However, their contact within the Brotherhood turns out to be a Party agent, and Smith and Julia are arrested. He is subjected to months of psychological manipulation and torture by the Ministry of Love and is released once he has come to love Big Brother.<br /><br />Nineteen Eighty-Four has become a classic literary example of political and dystopian fiction. It also popularised the term "Orwellian" as an adjective, with many terms used in the novel entering common usage, including "Big Brother", "doublethink", "Thought Police", "thoughtcrime", "Newspeak", and "2 + 2 = 5". Parallels have been drawn between the novel's subject matter and real life instances of totalitarianism, mass surveillance, and violations of freedom of expression among other themes. Orwell described his book as a "satire", and a display of the "perversions to which a centralised economy is liable," while also stating he believed "that something resembling it could arrive." Time included the novel on its list of the 100 best English-language novels which were published from 1923 to 2005, and it was placed on the Modern Library's 100 Best Novels list, reaching number 13 on the editors' list and number 6 on the readers' list. In 2003, it was listed at number eight on The Big Read survey by the BBC.<br /></description>
<language>eng</language>
<scanner>Internet Archive HTML5 Uploader 1.7.0</scanner>
<subject>ebook</subject>
<subject>pdf</subject>
<subject>orwell</subject>
<subject>1984</subject>
<title>1984</title>
<uploader>mmima191@gmail.com</uploader>
<publicdate>2023-09-03 17:40:03</publicdate>
<addeddate>2023-09-03 17:40:03</addeddate>
<curation>[curator]validator@archive.org[/curator][date]20230903174529[/date][comment]checked for malware[/comment]</curation>
<identifier-access>http://archive.org/details/george-orwell-1984_202309</identifier-access>
<identifier-ark>ark:/13960/s2qk0zmwcfj</identifier-ark>
<ppi>300</ppi>
<ocr>tesseract 5.3.0-3-g9920</ocr>
<ocr_parameters>-l eng</ocr_parameters>
<ocr_module_version>0.0.21</ocr_module_version>
<ocr_detected_script>Latin</ocr_detected_script>
<ocr_detected_script_conf>1.0000</ocr_detected_script_conf>
<ocr_detected_lang>en</ocr_detected_lang>
<ocr_detected_lang_conf>1.0000</ocr_detected_lang_conf>
<page_number_confidence>100</page_number_confidence>
<page_number_module_version>1.0.3</page_number_module_version>
</metadata>
        ```
        - Note that the mapping between XML and JSON-LD should be easy to add to as this is just one file type. Later we'll add videos, images, audio, etc.
    - Use the indicatif library to indicate status
    - Use icons like the colony.rs program to make the application visually appealing
** colony_uploader.rs
*** arguments
   - --server to specify the colonyd server location, default to 127.0.0.1
   - --port to specify the colonyd port, default to 3000
   - --threads to specify the number of uploader directories to process in parallel, default to 10
   - --keep to keep the uploader directories after processing
*** functionality
   - on start, enter the password for colonyd
     - if colonyd is not running, throw an error explaining that colonyd must be running first and exit
   - gets colonyd token and has a watch dog timer thread that automatically updates the token every 9 minutes so we don't lose access
   - walks through each uploader directory
     - uploads the downloaded files to the Autonomi network as public files
     - uploads the downloaded thumbnails to the Autonomi network as public files
     - reads the metadata.json and metadata_[\d].json files and writes this subject data to the colony pod specified in the pod_name.txt using the colonyd REST API
       - if the specified pod name does not exist, throw an error on this uploader directory, but do not panic the application
   - each of the above uploader directory operations runs concurrently
     - show an indicatif spinner for each concurrently running uploader directory operation. Use the uploader directory path as the line text
     - show a checkbox once the uploader directory operation has completed successfully
     - show an X icon on uploader directory operations that fail
     - show a pending icon for uploader directories that have not yet started processing
     - on uploader directory success delete the associated working directory and content unless the --keep option is provided
   - when all uploader directories are procesed, call the colonyd upload REST API endpoing to upload all of the metadata updates to Autonomi
     - perform this operation as long as there was at least one success
   - keep track of success and failure
     - on completion, report the total number of files uploaded, the total size of the uploads, the total cost in ANT tokens and the total cost in ETH gas fees

and then told the agent this:

Follow the instructions in this file to build a new rust program file called ia_downloader.rs. Then stop to enable iteration before proceeding to the colony_uploader implementation.
7 Likes

Some of the iteration inputs

I have a few changes I want to make:
- The file size calculation is wrong, it always returns 0. Use the size of the file after it is downloaded to update this attribute
- When downloading music or other file types, there won't always be an author. For music it could be set to 'artist' or for movies it could be 'director' or 'writer'' this. This should be more flexible
- For the http path, not only should it accept the 'details' path, but also the 'downloads' path for content
Some more updates:
-  use the 'title' from the XML metadata as the 'schema:name'
- If there are multiple objects of the same type, make separate metadata.json files and index them. For example, if there are 2 pdf files, make metadata_1.json and metadata_2.json
 I want to also associate an image with each file. Download the '__ia_thumb.jpg' file (could also be in other image formats), get the Autonomi address for this file, and list the address in the JSON-LD metadata as 'schema:image'
Now lets get much higher quality metadata. Is there a way to download a summary of what each of these is about? Ideally, we'd get the basic information from internet archive and query some other API to get high level information about the media we're downloading. Or even if there was a free way to query an AI somewhere to fill in the necessary metadata. What is available to do this?

I didn’t know what to do here :rofl: It figured out several solutions so I picked one:

Go with the configurable system that tries multiple systems. The end goal is to get the highest quality metadata possible for each item.

now bug fixes:

This function is incorrect. We need to duplicate the method used in the actual Autonomi library to get the proper address value. For example, this function returned an address value of d53c42029872588bad85208db8a72bc3c18e93c6dba5e7cfaff14734b35db229 for this file: /home/system/colony-utils/test_files/George Orwell - 1984.pdf . The properly calcualted address value is 48bd835e4eef2631c66a840904a5cc114ca2403f2aa63b3d00aa789f6f1f2632 . We will need to import the latest autonomi crate and use the 'map_xor_name' address as defined here: https://github.com/maidsafe/autonomi/blob/main/autonomi/src/client/high_level/data/public.rs#L49
 I added a $HOME/.config/ia_downloader/config.json configuration file with my huggingface_api_key. I set the enable_ai_enhancement to true as well as passed the --ai-enhance argument, but this option does not appear to be working, what is the problem?
I'm able to use the API key now. Lets make the model a variable in the configuration file. Set the default to "https://api-inference.huggingface.co/models/facebook/bart-large-cnn"
 I forgot to use the pod name information provided in the argument. Write a new file called pod_name.txt next to the metadata XML files and write the provided pod string in this file.
Now that the ia_downloader is in good shape, let's move on to the colony_uploader. Please implement this program by following the instructions in this highlighted section.

Then I followed the same process to do the uploader. All the same iteration loop. Kick it off, go do something else, review the output, test, iterate again.

This is my raw workflow to get the program I wanted. It doesn’t feel like programming anymore, more like directing a new college grad employee to go make something for me. The goal seems to be eliminate assumptions, explain in fine grained detail, and keep a close eye on outputs to make sure the thing doesn’t go off the tracks. Basically the same thing I do as a front line manager.

2 Likes

yea, absolutely. I separated the ia_downloader and colony_uploader scripts for this reason so I could make a separate downloader but use the same uploader backend. Any other sites you guys want, let me know and I can put together some for that as well.

3 Likes

That would be great! I’ll make a short demo video soon. I plan to make one for the Colony GUI as well for the Thursday IF ‘go live’ miestone. Probably can make one for this at the same time.

7 Likes

My upload overnight finished. Here is an example output result:

📊 Upload Summary:
   📁 Files uploaded: 63
   💾 Total size: 1011.97 MB (1061129332 bytes)
   ✅ Successful directories: 25
   ❌ Failed directories: 1
   💰 Total cost (ANT): 0.0004
   ⛽ Total cost (ETH): 0.0027
   💰 Final wallet balance: 0.0081 ETH
   📤 Metadata uploaded to Autonomi
   ⏱️ Total time: 9h 9m 58s

⚠️ Upload completed with 1 failures

Looks like my Charlie Chaplin movie had chunk errors, but it may be ok, will need to download to see if it all is there. Have some tuning to do on error handling. About $8/GB in gas for those 63 (mostly little) files :face_vomiting:

9 Likes

So currently 0.000016 USD to autonomi nodes and 8 USD to eth validators per gb :sweat_smile:

7 Likes

this is supercool! :tada:

ps: can we create a separate topic for moaning about majority of upload cost being fees? it’s not like this is any news and it really is annoying to read it again and again as if it was a great new discovery …

7 Likes

My main concern is getting the network running smoothly, costs be damned. It’s not great now, but I am confident we will settle on a good solution. Don’t let perfect be the enemy of good enough.

9 Likes

Great use-case and implementation :saluting_face:

5 Likes

Indeed!!! we are approaching something!

4 Likes

This is fantastic.

It doesn’t sound far from being able to ‘pin’ torrents as discussed in the other thread. E.g. a similar concept, but instead of downloading IA stuff & uploading to Autonomi, it downloads torrents & uploads to Autonomi (and the small task of building that into a torrent client with a wallet!).

This kind of tool will open the gates to serious data going onto the network, and being findable thanks to Colony :clap: :clap: :clap:

10 Likes

Great work with this! I will likely use it myself.

9 Likes

Btw @zettawatt , what is your favourite AI for coding?

I’m going to give it a go when I have a suitable app to write! :sweat_smile:

2 Likes

I have been using AugmentCode lately. It seems to do the best job in terms of real work. They have a free tier and a trial period for the professional service.

2 Likes