Lately I’ve been testing sequence data, doing 50 append operations in a row. I’d expect each append to take about the same time, but they gradually get slower, so I’ve been looking into it to try and understand why.
Logs for the data used in this post are available for the nodes and client.
This is the overall process for all 50 uploads. The time from first append to last append was about 2 minutes.
The timeline shows nodes top-to-bottom, genesis node at the top, node 11 at the bottom. Time runs left-to-right. Each dot corresponds to a log message.
The first append takes 227ms, the 50th takes 1220ms. What’s causing the slowdown? Each append takes about 20ms longer than the last (with some significant variation but the trend of increasing time is definitely there).
The first append takes 227ms:
Adding in the client connect / append_start / append_end timings shows how the client timing compares to the node timing:
Gaps between dots mean there’s no logs at that time, there’s either a) things happening there which aren’t logged or b) waiting for a response for an async function. This is level 3/5 for logging (info), if we displayed the most detailed level 5/5 logging (trace) it would probably be a continuous stream of log messages with no gaps.
The last append takes 1220ms:
Looking at the last appends from the client perspective, it seems like the time between connect and start_append becomes quite significant:
It looks like maybe the red node could be a factor in the slowdown (seen even more clearly in the full timeline), but that’s just a hunch to explore later.
Now we have ‘the vibe of the thing’ we can start to look into some details.
The append operation is a combined process involving various reading and payment and writing operations, but in this post I’m zooming in to look only at the very end of the process, where nodes actually write the data to their chunk store (about the last 5% of the overall time taken to do an append, so it’s probably not significant for the overall append time, but within itself there are significant aspects to look at).
Nodes take about 11.47 ms to do the write for an append operation (ie the time to run SequenceStorage.write).
In the chart below we can see that each write takes about the same time, there’s no gradual increase in the time taken to write chunks like was seen with the overall append operation. There’s some variation, but no increase.
SequenceStorage.chunks.put(&sequence)
(source) is taking 10.62ms of those 11.47ms, ie 93% of the work:
Within the chunks.put
function, the breakdown is
4.08ms for do_delete
(source)
3.62ms for used_space.increase
(source)
3.81ms for file.sync_all
(source).
These three calls account for 88% of the time taken to run chunks.put
:
The real surprise to me is updating the stats for used space taking 3.62ms. That’s a very long time. But it’s a surprisingly complex operation, see fn increase.
It looks like within do_delete
probably the call to used_space.decrease
is taking most of the time (probably 3.62 of 4.08 ms ie 89%), similar to used_space.increase
.
The file.sync_all
taking 3.81ms warrants a little exploring too…
The docs for sync_all: This function will attempt to ensure that all in-memory data reaches the filesystem before returning.
Sounds like sync_all
a feature worth keeping. I guess one day someone will compare the failure rate with/without and decide if it’s worth the extra work or not (removing sync is about 170x faster, see below).
Some further detailed discussion (and links) on sync_all in do we need to clarify the fsync close unixism.
use std::time::Instant;
use std::io::Write;
fn main() {
let start = Instant::now();
std::fs::File::create("data.txt").and_then(|mut file| {
file.write_all("my value".as_bytes()).expect("write failed");
file.sync_all()
}).expect("create failed");
println!("{:?}", start.elapsed().as_nanos());
}
// 4673048 nanoseconds
vs
use std::time::Instant;
use std::io::Write;
fn main() {
let start = Instant::now();
let mut file = std::fs::File::create("data.txt").expect("create failed");
file.write_all("my value".as_bytes()).expect("write failed");
println!("{:?}", start.elapsed().as_nanos());
}
// 27156 nanoseconds, 172x faster
Even though not syncing is so much faster, it seems like the added safety and data integrity is well worth it, especially at this stage of development, sync eliminates a potential thing that could go subtly wrong.
My hope is if we can find a more efficient way to manage the stats we should hopefully be able to bring this down from about ~12ms to ~4ms, which is a huge improvement (3x improvement). Still not much of dent in an overall 300ms append operation, but still a big saving for the particular section this post has been looking at.