The network performance has felt somewhat slower than I would have expected (although testnet6 has been pretty impressive). So to find where the bottlenecks may be, I decided to run a profiler on a private safe network.
- Start a private safe network (using SafeInABox)
- Start a vault and record the performance profile
$ perf record -F 99 ./safe_vault
- Upload a 7MB file.
- View the result using
perf report
13.00% Node thread safe_vault [.] crypto_core_curve25519_ref10_fe_mul ▒
9.48% Node thread safe_vault [.] SHA512_Transform ▒
8.87% Node thread safe_vault [.] SHA256_Transform ▒
8.51% Node thread safe_vault [.] _$LT$std..vec..Vec$LT$T$GT$$u20$as$u20$rustc_serialize..Encodable$GT$::encode::h55f3b8f473db3546 ▒
6.50% Node thread safe_vault [.] crypto_core_curve25519_ref10_fe_sq ▒
5.16% Node thread safe_vault [.] _$LT$u8$u20$as$u20$rustc_serialize..Decodable$GT$::decode::he30d85359a60fabe ▒
4.07% Node thread safe_vault [.] _$LT$messages..RoutingMessage$u20$as$u20$rustc_serialize..Encodable$GT$::encode::h7035b51be7c0c236 ▒
3.46% Node thread safe_vault [.] _$LT$rustc_serialize..writer..SizeChecker$u20$as$u20$rustc_serialize_crate..Encoder$GT$::emit_u8::h5643e95fac8714ec ▒
3.22% Node thread [unknown] [k] 0xffffffff8182d565 ▒
3.10% Node thread safe_vault [.] _$LT$std..vec..Vec$LT$T$GT$$u20$as$u20$rustc_serialize..Decodable$GT$::decode::hca5a9fb63841af69 ▒
2.67% Node thread safe_vault [.] crypto_core_curve25519_ref10_fe_cmov ▒
1.34% Node thread [unknown] [k] 0xffffffff810a975d ▒
1.22% Node thread safe_vault [.] _$LT$hash..sip..SipHasher$u20$as$u20$hash..Hasher$GT$::write::h44537f0d2d069bd3 ▒
1.09% Node thread safe_vault [.] crypto_core_curve25519_ref10_fe_sub ▒
0.97% Node thread safe_vault [.] crypto_core_curve25519_ref10_fe_sq2 ▒
0.67% Node thread safe_vault [.] crypto_core_curve25519_ref10_fe_add ▒
0.67% Node thread safe_vault [.] slide ▒
0.67% Node thread [unknown] [k] 0xffffffff81085ad1 ▒
0.55% Node thread libc-2.23.so [.] 0x000000000009eed6 ▒
0.43% Node thread safe_vault [.] _$LT$data..Data$u20$as$u20$rustc_serialize..Encodable$GT$::encode::h7eec61c793e1f9ba ▒
0.36% Node thread safe_vault [.] _$LT$authority..Authority$u20$as$u20$std..cmp..PartialOrd$GT$::lt::hec5f4c0023bc1ed8 ▒
0.36% Node thread safe_vault [.] cmov ▒
0.36% Node thread safe_vault [.] routing::core::Core::handle_hop_message::hf95bcd68d577c27c ▒
0.30% Node thread safe_vault [.] ._ladder_loop ▒
0.30% Node thread safe_vault [.] _$LT$authority..Authority$u20$as$u20$std..cmp..PartialOrd$GT$::gt::h6f67a70edcc95a01 ▒
0.30% Node thread safe_vault [.] sodium_memzero ▒
0.30% Node thread [unknown] [k] 0xffffffff81196658 ▒
0.24% Node thread libc-2.23.so [.] 0x000000000009eedb ▒
0.24% Node thread safe_vault [.] _$LT$authority..Authority$u20$as$u20$rustc_serialize..Encodable$GT$::encode::h8eb168b01b9e1607 ▒
0.24% Node thread safe_vault [.] _$LT$message_filter..MessageFilter$LT$Message$GT$$GT$::insert::h3909c146fb499331 ▒
0.24% Node thread safe_vault [.] crypto_core_curve25519_ref10_fe_tobytes ▒
0.24% Node thread safe_vault [.] crypto_core_curve25519_ref10_ge_double_scalarmult_vartime ▒
0.24% Node thread safe_vault [.] crypto_core_curve25519_ref10_ge_p1p1_to_p2 ▒
0.24% Node thread safe_vault [.] crypto_core_curve25519_ref10_sc_muladd ▒
0.18% Node thread libc-2.23.so [.] 0x000000000008e8ae ▒
0.18% Node thread libc-2.23.so [.] 0x000000000009eee1 ▒
0.18% Node thread libc-2.23.so [.] 0x000000000009efa0 ▒
0.18% Node thread libc-2.23.so [.] 0x000000000009efb1 ▒
0.18% Node thread safe_vault [.] crypto_core_curve25519_ref10_ge_p2_dbl ▒
0.18% Node thread safe_vault [.] ge_select ▒
0.18% Node thread safe_vault [.] rallocx ▒
0.18% Node thread safe_vault [.] routing::core::Core::handle_event::hba15935371e0ae4c
Info about test conditions:
- Virtualbox vm running ubuntu 16.04 and a private instance of the safe network
- VM is allocated 3 CPU and 2048MB RAM
- Vaults have a group size of 3, quorum size of 2
- Network consists of 6 vaults running in docker containers on the vm
- perf run on a 7th vault directly on the vm, ie not in a docker container
My initial thoughts:
- It’s pleasing to see crypto operations consuming the most cpu. This makes intuitive sense. The serialization operations starting from the 4th row are the main red-flag.
- perf is probably not the right tool to use here. I’ve been playing with automatically inserting timing log messages at the start and end of every function. I think that will generate more useful output than perf. It also allows logging within a consistent network-wide time context, which facilitates profiling the impact of messaging and bandwidth.
- cpu is probably not the main bottleneck. It probably will be when the network is entirely in a vm using shared resources, but on a ‘live’ network this is probably not the case.
It would be great to hear your thoughts if you have experience with profiling or ideas about how to tackle this. Performance is important for user experience, but it’s no use optimizing the wrong things.