Just an update on the Proposal: Tarchive experiment.
I’ve implemented a POC locally, which uses a public archive to wrap an archive.tar and an archive.tar.idx file. The former is a regular tar file and the latter is an index file generated through tarindexer.
This builds on the LFU caching that was added in the prior release, so that reading the same chunk multiple times is very rapid within AntTP.
I used IMIM as the use case, which has about 12 key files to load when loading a blog. These include the public archive, the app-config.json (for routing), the CSS and JS files, the index.html, a couple of fonts. While Angular could do better at squashing these into fewer files, it does a decent job here.
For 12 files, at 4 chunks each (3x data, 1x datamap for all files under about 12 MB), that’s about 48 chunks to download.
Using the tarchive POC, a maximum of 12 chunks were downloaded - the public archive (4), the archive.tar (4) and the archive.tar.idx (4). Loading time was substantially reduced (especially while the network is rather slow).
While caching masks the subsequent requests, whether 12 or 48, the initial load is much more responsive. As soon as I have the index.html file, the rest take about 1ms each to serve (i.e. as fast as AntTP can pull them from cache).
So, when will it be released? Well, with the POC showing good results, I wanted to make some further improvements first.
Instead of using a public archive to contain the tar + idx, I want to just upload the whole tarchive as a single file. For ease of implementation, the strategy will be:
- Create a tar file for the target directory
- Run tarindex to generate the index
- Concatenate the index to the tar file to create a tar.idx.img (or whatever name)
- Upload tar.idx.img to autonomi
Why not just create a tar file with it all in? Well, there is no index at the start of the tar, which means I would need to navigate to the end of the file, then figure out where the index starts. I could reserve a specific index size or make the last line an index size or some such, but I figured I could make my life easier.
How? Well, a tar file includes its size in its header. It’s always in the same offset. So, it’s trivial to retrieve this value, then use it as an offset against the full/concatenated file, to return the index. No need to reinvent the wheel, right? The network doesn’t care what is in the chunk, so it should work well.
So, I just need to implement the above and then I can read everything from one set of chunks - that’s 4 in total (for archives up to about 12 MB). For larger archives, searching to the end of the file may result in pulling another 4 chunks, but that’s still pretty efficient. For IMIM, the whole app fits easily within 12 MB, so it should be very suitable.
What about modifications? Well, the original tar file could be appended to, then reindexed/concatenated. Rinse and repeat.
I have visitors over the next few days, so I may not have time until next week to look at it. However, I thought I’d share the results, as they’re pretty exciting for performance. Getting a whole app in a mere 4 chunks should result in speedy loading times!
Edit: realised that the size is only for the first file in the archive. No bother, I’ll read the last file of the archive instead!