Proposal for more efficiently handling small files

@zettawatt, did you see this thread and my past post? Proposal: Tarchive - #23 by Traktion

Tar files are great for this stuff. Also tarindexer (python app) basically creates the manifest of offsets (and is easy to reimplement).

The chunk streamer library already accepts offsets/limits, so easy to get the right bits, from the right chunks.

A simple solution, is just wrapping the files.tar and the files.tar.idx in a public archive, then just checking for their existence on request, then parsing index for a filename, then retrieving correct bytes.

Then you can have ant://xor/filename_in_tar.txt and it will resolve from tar and return data.

To be performant, caching the chunks would be ideal though, as many files will be within the same chunks. You don’t want to have to keep downloading them.

Seems your post mostly aligns with this too?

5 Likes