Creating an append-only file type from native data types

I’m working on an app that needs an append-only file type. I believe the following algorithm will work, but I’m wondering if there is a better way. It looks something like this:

Start with a pointer to a scratchpad type. The first 4MB if data is written to the scratchpad by the user app.

Once a write will overflow the scratchpad:

  • The scratchpad contents are written to a new chunk
  • Any overflowed data is written into the scratchpad
  • A graphEntry type is created pointing to the newly created chunk and the scratchpad
  • The pointer is updated to point to the graphEntry

This continues and each time the scratchpad overflows, a new chunk is written out, a new graphEntry is created, and the pointer is updated to reference the new graphEntry.

Now my question is how best to handle the graphEntry to point to all the chunks? I see multiple possiblities:

  • Each new chunk creates a new graphEntry that points to the new chunk and the previous graphEntry. This would be the cheapest from a write standpoint, but slowest to read since you have to iteratively pull each graphEntry to build the full list.
  • Each new chunk creates a new graphEntry that points to all previous chunks and the scratchpad. Most expensive write, fastest read.
  • Hybrid of the 2 and create a new iterative graphEntry up to a fixed number, say 20 chunks, and then reference these 20 chunk graphEntries. This could be scaled for the best tradeoff of price vs performance.

Is this the right way to go about this? If so, which approach should I choose? Or am I just making this harder than it needs to be? :slight_smile:

5 Likes

I’m not entirely clear on what you mean by a GE pointing to earlier chunks, but assume you mean the GE points to a chunk that contains the address of those chunks and the current scratchpad content.

Based on that, I think it’s hard to say what’s best because it will depend on both how well this works in practice (read, write) and on the needs of your app.

I’ve wondered about similar questions while building dweb as it’s History type is like appendable data, so think I understand what you are asking.

I haven’t gone further than thinking on this kind of question but it is an interesting problem and good to know others are thinking about this too.

The best way at this point is to build things IMO. Try out ideas and iterate based on the results.

4 Likes

Thanks @happybeing . Here are the 3 phases that I envision in pictorial form:

Does this make sense? See any major red flags with this approach?

3 Likes

I suppose you could link previous graph entries as parents to the latest, then assume only the first linked chunk is relevant (ignorenold scratch pads).

In fact, you could then share the same scratchpad with the latest graph entry, i.e. it is always reused for latest.

Maybe that is more like your option 1 in OP?

I wonder if storing a pointer in each graph entry is a useful line of thought too? This could then be updated to point to the next graph entry when it is created. Ofc, you have a pointer looked too to worry about then, but may be clean.

Indeed, I wonder about using pointers in a native way too. Storing a list of pointers in a scratchpad could be flexible. You could even store them in a chunk. Then when a new data chunk needs appending, you just popular the next free pointer.

Given you have a complete list of pointers, you can scroll forwards or backwards easily. Ofc, you’d want to store the last to avoid traversing to append.

Edit: this actually feels much like a regular datamap, but using pointers for each chunk. Like a mutable datamap.

More deeply, you could derive the pointer keys from a seed key. Then you wouldn’t need to store the pointers, but just derive the next when you need it. The lenght would not be limited by what has been defined (using above) too.

Ofc, you would still need to lookup the chunk at the end of the pointer, but that should be as low cost/latency as possible.

I’ve not tried any of the above, so just spit balling! :sweat_smile:

3 Likes

I think you don’t need pointers at all if you are using GEs. You generate a GE key from a subsequent numbers, then you can address any pointer you want from your chain (oh, were you just proposing using pointers instead of GEs, @traktion?), skip by 10 or by 100 etc. to find the last one. Or keep a pointer to last one (tail) in Scratchpad.

Idea with creating GEs to store 20 chunk addresses is nice, but what if your app terminates unexpectedly? You lose up to 20 pieces of data. If that’s acceptable to your app, then fine. Alternatively, you can always store them on computer’s disk until saved to Autonomi.

You don’t need chunk here, GE has outputs, where you can hold these.


Check out the Dev Forum

1 Like

Yes, pretty much. Just thinking they may be simpler for a list.

1 Like

Thanks for the feedback guys. Sounds like there are a few more options I need to investigate more.

2 Likes

The outputs are supposed to be other GEs but there’s nothing to prevent you doing your own thing. I tend to avoid that kind of hack, but so long as the underlying implementation remains unchanged it should be ok I think.

It’s interesting to see these ideas, and healthy to share them.

BTW as riddim mentioned on another thread, Pointer and Scratchpad types aren’t working yet - on the live network. (I’ve only tried Pointers and GE myself.)

You can build and test ideas on a local testnet though.

2 Likes

They are, but very slowly. Which could be understood as “kinda work” :slight_smile:

EDIT: truly, I did not test Scratchpad, only Pointer.


Check out the Dev Forum

3 Likes

Yeah - pointers seem to work as @loziniak pointed out (probably since the last update - I have to admit I might not have tested this lately)

Scratchpads don’t work for python… Simply because I can fetch them from the network but the current api doesn’t offer a way to read them… They’re (because of the state of the api… Not necessarily the network) write only for python users (which is super dull..) … For you rust engineers the situation should be different

3 Likes

Thanks for clarifying!

Maybe it’s just Pointers that have not been updated recently which are still stuck. Good to know latest is improving. :+1:

5 Likes