Implementation details for the Dec 2020 testnet

With the latest testnet available it’s a good time to look at the code and see how stuff is implemented under the hood.

This is intended to be an exploration so there’s probably a lot of opportunity to improve and add to this.

Reward Amount

The reward amount is scaled by the network size and the age of each node (source).

/// Calculates the reward for a node
/// when it has reached a certain age.
pub async fn reward(&self, age: Age) -> Money {
    let prefix = self.network.our_prefix().await;
    let prefix_len = prefix.bit_count();
    RewardCalc::reward_from(age, prefix_len)
}

fn reward_from(age: Age, prefix_len: usize) -> Money {
    let time = 2_u64.pow(age as u32);
    let nanos = 1_000_000_000;
    let network_size = 2_u64.pow(prefix_len as u32);
    let steepness_reductor = prefix_len as u64 + 1;
    Money::from_nano(time * nanos / network_size * steepness_reductor)
}

A minor subtlety - the visual style of the last line at a glance feels like there’s 2 terms in the numerator and 2 in the denominator, but it’s actually 3 in the numerator and 1 in the denominator. It’s clearer when expressed as time * nanos * steepness_reductor / network_size (however this is not strictly the same expression due to the way integers interact, eg 3*4/5*6 is 12 and 3*4*6/5 is 14).

Since network_size and steepness_reductor both depend on the same variable prefix_len it might be clearer as

+const NANOS = 1_000_000_000;
...
-let time = 2_u64.pow(age as u32);
-let nanos = 1_000_000_000;
-let network_size = 2_u64.pow(prefix_len as u32);
-let steepness_reductor = prefix_len as u64 + 1;
-Money::from_nano(time * nanos / network_size * steepness_reductor)
+let node_up_time = 2_u64.pow(age as u32);
+let network_size = 2_u64.pow(prefix_len as u32);
+let steepness_reductor = prefix_len as u64 + 1;
+let adjusted_network_size = network_size / steepness_reductor;
+Money::from_nano(NANOS * node_up_time / adjusted_network_size)

This gives us the following table for adjusted_network_size:

prefix length network size steepness reductor adjusted network size
0 1 1 1
1 2 2 1
2 4 3 1
3 8 4 2
4 16 5 3
5 32 6 5
6 64 7 9
7 128 8 16
8 256 9 28
9 512 10 51
10 1024 11 93
11 2048 12 170
12 4096 13 315

And the reward amount for various node ages and prefix lengths (values in milli-SNT) eg age 3 prefix length 7 gives reward amount 500 milli-SNT.

prefix length node age 0 1 2 3 4 5 6 7
0 1000 2000 4000 8000 16000 32000 64000 128000
1 1000 2000 4000 8000 16000 32000 64000 128000
2 750 1500 3000 6000 12000 24000 48000 96000
3 500 1000 2000 4000 8000 16000 32000 64000
4 312 625 1250 2500 5000 10000 20000 40000
5 187 375 750 1500 3000 6000 12000 24000
6 109 218 437 875 1750 3500 7000 14000
7 62 125 250 500 1000 2000 4000 8000
8 35 70 140 281 562 1125 2250 4500
9 19 39 78 156 312 625 1250 2500
10 10 21 42 85 171 343 687 1375
11 5 11 23 46 93 187 375 750
12 3 6 12 25 50 101 203 406
13 1 3 6 13 27 54 109 218

Reward Event

Rewards are accumulated in the Rewards data structure, and paid to the node wallets upon relocation. Payment is from the new section wallet which happens in activate_node_rewards (source:

/// 3. The old section will send back the wallet id, which allows us
/// to activate it.
/// At this point, we payout a standard reward based on the node age,
/// which represents the work performed in its previous section.
async fn activate_node_rewards(
    &mut self,
    wallet: PublicKey,
    node_id: XorName,
) -> Result<NodeMessagingDuty> {
...
// Send the reward counter to the new section.
// Once received over there, the new section
// will pay out the accumulated rewards to the wallet.

Storecost

Happens here

/// Get latest StoreCost for the given number of bytes.
/// Also check for Section storage capacity and report accordingly.
async fn get_store_cost(
    &self,
    bytes: u64,
    msg_id: MessageId,
    origin: Address,
) -> Result<NodeOperation> {
...

But is implemented as rate limit here

/// Calculates the rate limit of write operations,
/// as a cost to be paid for a certain number of bytes.
pub async fn from(&self, bytes: u64) -> Money {
let prefix = self.network.our_prefix().await;
let prefix_len = prefix.bit_count();
let section_supply_share = MAX_SUPPLY as f64 / 2_f64.powf(prefix_len as f64);
let full_nodes = self.capacity.full_nodes();
let all_nodes = self.network.our_adults().await.len() as u8;
...
let available_nodes = (all_nodes - full_nodes) as f64;
let supply_demand_factor = 0.001 
    + (1_f64 / available_nodes).powf(8_f64)
    + (full_nodes as f64 / all_nodes as f64).powf(88_f64);
let data_size_factor = (bytes as f64 / MAX_CHUNK_SIZE as f64).powf(2_f64)
    + (bytes as f64 / MAX_CHUNK_SIZE as f64);
let steepness_reductor = prefix_len as f64 + 1_f64;
let token_source = steepness_reductor * section_supply_share.powf(0.5_f64);
let rate_limit = (token_source * data_size_factor * supply_demand_factor).round() as u64; 
Money::from_nano(rate_limit)

A consideration - maybe using ceil instead of round would be more suitable? Less risk of floating point discrepancies this way. Or maybe using floor since it benefits the uploader and is closer to an integer-style truncation? Something just feels very off about using both float and round in a money operation. Is there a way to calculate storecost reliably using integers? It’s critical for this to be foolproof across all implementations, languages, architectures etc.

The payment is handled in node/elder_duties/key_section/payment/mod.rs.

/// An Elder in a KeySection is responsible for
/// data payment, and will receive write
/// requests from clients.
/// At Payments, a local request to Transfers module
/// will clear the payment, and thereafter the node forwards
/// the actual write request to a DataSection,
/// which would be a section closest to the data
/// (where it is then handled by Elders with Metadata duties).

Section Split

The const RECOMMENDED_SECTION_SIZE is currently set to 10.

The const ELDER_SIZE is currently set to 5.

source

/// Recommended section size. sn_routing will keep adding nodes until the
/// section reaches this size.
/// More nodes might be added if requested by the upper layers.
/// This number also detemines when split happens - if both post-split
/// sections would have at least
/// this number of nodes.
pub const RECOMMENDED_SECTION_SIZE: usize = 10;

/// Number of elders per section.
pub const ELDER_SIZE: usize = 5;

This is used in try_split to see if two valid sections can be created from splitting the current section (source).

// Tries to split our section.
// If we have enough mature nodes for both subsections, returns the elders
// infos of the two
// subsections. Otherwise returns `None`.
fn try_split(&self, our_name: &XorName) -> Option<(EldersInfo, EldersInfo)> {
...

There’s a helper structure SplitBarrier in split_barrier.rs for ensuring splits go smoothly.

/// Helper structure to make sure that during splits, our and the sibling
/// sections are updated
/// consistently.
///
/// # Usage
///
/// Each mutation to be applied to our `Section` or `Network` must pass
/// through this barrier
/// first. Call the corresponding handler (`handle_our_section`,
/// `handle_their_key`) and then call
/// `take`. If it returns `Some` for our and/or sibling section, apply it
/// to the corresponding
/// state, otherwise do nothing.

Disallow Rule

Checks if more than 50% of nodes are full (source):

const MAX_NETWORK_STORAGE_RATIO: f64 = 0.5;
...
pub async fn check_network_storage(&self) -> bool {
    info!("Checking network storage");
    let all_nodes = self.network.our_adults().await.len() as f64;
    let full_nodes = self.capacity.full_nodes() as f64;
    let usage_ratio = full_nodes / all_nodes;
    info!("Total number of adult nodes: {:?}", all_nodes);
    info!("Number of Full adult nodes: {:?}", full_nodes);
    info!("Section storage usage ratio: {:?}", usage_ratio);
    usage_ratio > MAX_NETWORK_STORAGE_RATIO
}

Feels like this it might be safer and more efficient (slightly!) if it’s done with integers, eg this float expression

full_nodes / all_nodes > 1 / 2

has an equivalent integer expression of

full_nodes * 2 > all_nodes

Might also be worth renaming it to what the result is, eg exceeded_max_storage_ratio.

To Do

I’m still finding the code snippets for:

Relocations

Token Transactions

Data Mutations

Age Increase

Anything else?

Versions

I’ll put versions and commit hashes here once the testnet is considered stable.

bls_dkg
bls_signature_aggregator
crdts
qp2p
resource_proof
self_encryption
sn_api
sn_client
sn_data_types
sn_node
sn_routing
sn_transfers
threshold_crypto
xor_name

crdts and threshold_crypto are not maidsafe repositories but they’re a big part of the network so I included them here.

sn_node is compiled using musl. Install instructions for musl on linux can be found in this post.

$ cargo build --release --target x86_64-unknown-linux-musl

sn_api does not compile with musl yet so use gcc (more info on the dev forum in sn_api and openssl dependency).

37 Likes

Great read, @mav! It is good to see the analysis and the associated code pinpointed.

13 Likes

Really useful @mav, bookmarked!

11 Likes

Thank you @mav
Do you think that speed will help you to have bigger reward? It looks like 24/7 runtime is the key measurment to grow over time.

7 Likes

Very insightful post, thank you for sharing this! I’m looking forward to see how the age is calculated.

8 Likes

In the testnet, no, it doesn’t look like speed will help get a bigger reward. But I have not looked into the event that causes reward to be accumulated or whether speed matters for it (I expect the reward accumulation event will be triggered by PUT as described in rfc0057).

It is important but only because it affects age (too much downtime leads to reduced age), which also strongly affects reward amount. I should also try to look for where age is penalized and the conditions that cause it.


After a bit of spreadsheeting with the storecost calculations I found something unusual.

Uploading N lots of X KB can be quite a lot cheaper than uploading 1 lot of (N*X) KB - up to 50% cheaper.

It seems to me intuitively that these situations should lead to the same storecost (or maybe larger chunks are slightly cheaper to give an incentive towards efficiency).

This situation seems to arise because the data_size_factor is not linearly scaled and is pegged to the MAX_CHUNK_SIZE.

Working through the algebra, when splitting a 1 MB chunk into N uploads of size 1/N MB the saving is 1-(1+N)/2N; eg a chunk split into 2 half sized pieces gives 1-(1+2)/(2*2) = 0.25 ie 25% discount. Splitting into very many parts causes this ratio to approach 50% saving on storecost.

When splitting a 1 MB chunk into parts the % saved on upload storecost is:

Parts % saved
1 0
2 25
3 33
4 38
5 40
6 42
7 43
8 44
9 44
10 45
20 48
50 49
100 50

To generalize further for chunks less than 1MB - the equation for splitting a chunk into N bits each of size B the saving comes to 1-(B+MAXB)/(N*B+MAXB) where MAXB is 1 MB in bytes, ie 1048576. For example a 500 KiB chunk split into 5 x 100 KiB bits (N x B) gives a saving of 1-(102400+1048576)/(5*102400+1048576) = 26% less cost.

Small files gain no real saving by splitting it into even smaller parts.

When splitting a 10 KiB chunk into parts the % saved on upload storecost is:

Parts % saved
1 0
2 0
3 1
4 1
5 1
6 1
7 1
8 1
9 1
10 1
20 1
50 1
100 1

Of course splitting too much creates more inconvenience and overhead for both uploading and downloading and eventually the inconvenience of too much splitting probably isn’t worth the savings. Even if wrapped in a library there will always end up being some point where splitting becomes too annoying. But I could easily see it being worthwhile to split a 1 MB chunk into 10 parts if it’s 45% cheaper to do that and a library manages it all for the uploader/downloader.

I’d encourage anyone interested to put the storecost calculation into a spreadsheet and confirm this.

Maybe there can be some storecost tests that double as an expression of the underlying intentions, eg

// psuedocode

// splitting chunks into multiple parts is more expensive
let parts = 5;
let part_bytes = 10_000;
assert(parts * store_cost(part_bytes) >= store_cost(parts * part_bytes));

// storage is cheaper for larger networks
let small_network_prefix = 7;
let large_network_prefix = small_network_prefix + 10;
assert(store_cost(large_network_prefix) < store_cost(small_network_prefix));

// storage is more expensive when there's not much spare capacity
let all_nodes = 200;
let many_available_nodes = 110; // 90 full nodes
let few_available_nodes = 10; // 190 full nodes
assert(store_cost(few_available_nodes) > store_cost(many_available_nodes));

// etc...

Also worth pointing out this doesn’t invalidate or reduce the value of the testnet, it will still work fine and achieve the purpose it was intended for with the existing storecost algorithm.

13 Likes

Unusual and illogical. That small chunks pay less than big ones makes sense but that it’s cheaper to split a big chunk doesn’t.

9 Likes

There is more changes than I expected. @oetyng Can you please clarify why spliting chunk should reduce StoreCost?

@mav about speed there should remain profit from cache uploaders and if reward and distribution will works well it could be once possible to stream to millions of users in a same time.

2 Likes

Very interesting to see this and I for one will need a few more iterations before I understand it.

A few queries then

  • Expecting that “Network size” is number of nodes; will volume and then used/unused space be known or too much overhead?
  • Above is mention of more simply full_nodes and all_nodes but would we expect to fill nodes or tend towards seeing data distributed evenly across all?..
  • Why would a node become full… how is xorspace distributed relative to nodes?
  • What reward for a full node?
  • Would a network with the same number and age of nodes but less spare capacity not see cost rise and then also reward?
3 Likes

This is the info for these aspects.

When the section experiences churn it triggers a relocation event. The relocated nodes get an increment to their age. The code is in sn_routing/src/relocation.rs.

/// Find all nodes to relocate after a churn event and create the relocate
/// actions for them.
...
// Find the peers that pass the relocation check and take only the oldest
// ones to avoid
// relocating too many nodes at the same time.
let candidates: Vec<_> = section
    .members()
    .joined()
    .filter(|info| check(info.peer.age(), churn_signature))
    .collect();

The nodes for relocation are chosen by the ones of age that matches the number of trailing zeros in the signature. Note there is no tiebreaker. All nodes matching that age are relocated. Maybe some risk here but probably not (source).

// Relocation check - returns whether a member with the given age is a
// candidate for relocation on
// a churn event with the given signature.
pub(crate) fn check(age: u8, churn_signature: &bls::Signature) -> bool {
    // Evaluate the formula: `signature % 2^age == 0` Which is the same
    // as checking the signature
    // has at least `age` trailing zero bits.
    trailing_zeros(&churn_signature.to_bytes()[..]) >= age as u32
}

The number of churn events needed to relocate is approximately double for each age, but may be more or may be less depending on chance.

On relocation those nodes age increments by 1 (source).

impl RelocateDetails {
    pub(crate) fn new(
        ...
        peer.age().saturating_add(1),

Each node from the relocation candidates are sent to a different section. This is chosen by the hash of the node name and the churned node (source)

// Compute the destination for the node with `relocating_name` to be
// relocated to. `churn_name` is
// the name of the joined/left node that triggered the relocation.
fn destination(relocating_name: &XorName, churn_name: &XorName) -> XorName {
    let combined_name = xor(relocating_name, churn_name);
    XorName(crypto::sha3_256(&combined_name.0))
}

Personally I would prefer to see this include the hash of the churn signature since it cannot be known in advance. If we use just these two node names then nodes can precompute their possible future relocation destinations and to me that seems like an unnecessary risk which could be avoided. Better to have their future destination remain a mystery until the churn happens.

The higher level details of relocations can be found in approved.rs, eg relocate_peers is called whenever there is a handle_online_event or handle_offline_event.

It will be possible to know to some degree, but it’s not like it will be known network-wide as some sort of ‘globally available piece of information’. Nodes could calculate approximations.

eg elders could look at their list of full vaults and their not-full vaults and do some calcs on how much each vault is storing, what their section prefix is, assume other sections look roughly the same… so it’d be an approximate calc.

I don’t think adults or clients would be able to do this but maybe could derive some info from storecost? There may be a related question here about how uploaders can verify the storecost is legit, some analogy with bitcoin tx fees maybe… not sure and getting a bit far from the original point now!

We expect to see just less than 50% of nodes full most of the time. If there’s more than 50% full nodes new nodes will continuously join until it’s lower. If there’s significantly less than 50% full nodes then the existing nodes will (hopefully) fill fairly quickly since storage will be quite cheap. So it should always be very close to 50% full nodes.

I personally don’t like this mechanism, but for the testnet it’s simple enough and will allow tests to happen.

Data is distributed fairly evenly (more in chunk distribution within sections).

A node will become full if it’s smaller than half the other nodes in the section. If a node can always stay in the top half of the nodes it will probably never become full.

Depends a lot on how to interpret this. Spare capacity isn’t measured. Full vs not-full is measured. So if you mean both networks have 50% full nodes but one network has 1 PB spare but the other network has 100 PB spare, they would both have the same storecost and reward since the spare space is not incuded in the reward or storecost algorithm.

Spare space (which is not measured directly) has an indirect affect only on new membership, since a lot of spare space will take longer to reach 50% full nodes, it also takes longer before new nodes can join.

19 Likes

mav , my hat’s off to you!

1 Like

Age is halved when a node leaves then later returns to the section. The node is also relocated when they rejoin (source).

let new_age = cmp::max(MIN_AGE, old_info.value.peer.age() / 2);

if new_age > MIN_AGE { 
    // TODO: consider handling the relocation inside the
    // bootstrap phase, to avoid
    // having to send this `NodeApproval`.
    commands.push(self.send_node_approval(old_info.clone(), their_knowledge)?);
    commands.extend(self.relocate_rejoining_peer(&old_info.value.peer, new_age)?); 

    return Ok(commands);
}

I think there’s a reasonable debate to be had whether age should be reduced by 2 (or even 1) rather than halved. Halving the age penalizes for much more than half the total work done. Halving the age seems like a fairly big punishment.

eg from the table below, a node age 15 has done a total of 4095 ‘units’ of work. If it’s penalized it goes down to age 7. It must do 4080 units of work to get back to age 15, which is 99% of the work they had previously done.

Initial Age Total Work Halved Age Work Lost Portion Lost (%)
4 1 4 0 0
5 3 4 2 66
6 7 4 6 85
7 15 4 14 93
8 31 4 30 96
9 63 4 62 98
10 127 5 124 97
11 255 5 252 98
12 511 6 504 98
13 1023 6 1016 99
14 2047 7 2032 99
15 4095 7 4080 99
16 8191 8 8160 99
17 16383 8 16352 99
18 32767 9 32704 99
19 65535 9 65472 99

My feeling is the network would be entirely nodes age 4-8, since it takes only a single penalty to massively set a node back. Get to age 9 after 63 ‘days’ of work with no penalty, then a single mistake takes the node back to age 4 removing all 63 ‘days’ of work. A node at age 19 with 65535 ‘days’ of work with no penalty has two penalties in a row, suddenly is back to age 4. This obviously makes age not a very accurate measure for our purposes and only allows extremely high uptime nodes to participate in a meaningful way.

There could be less work than doubling to increase age, or the penalty could be less extreme than halving age, or the conditions to trigger a penalty could be quite lenient. This is also discussed in this post: “virtually all age will be roughly within the range 7-20”.

When is a member subject to the half age penalty? After 30s of downtime. This is not as clear in the code as other features, but the path is like this:

Nodes declare another node has gone offline using the Event::MemberLeft event, which is broadcast in handle_offline_event (source)

self.send_event(Event::MemberLeft {
    name: *peer.name(),
    age,
});

Agreement among elders is established using Vote::Offline in handle_peer_lost (source)

let info = info.clone().leave()?;
self.vote(Vote::Offline(info))

So when do those bits of code actually get run?

The comment for send_message_to_targets explains it (source)

/// Sends a message to multiple recipients. Attempts to send
/// to `delivery_group_size`
/// recipients out of the `recipients` list. If a send fails, attempts to
/// send to the next peer
/// until `delivery_goup_size` successful sends complete or there
/// are no more recipients to
/// try.
///
/// Returns `Ok` if all of `delivery_group_size` sends succeeded
/// and `Err` if less that
/// `delivery_group_size` succeeded. Also returns all the failed
/// recipients which can be used
/// by the caller to identify lost peers.
pub async fn send_message_to_targets(
...

If there’s any error in sending messages to another node, that node is declared as ‘lost’ and we start a vote for them to be declared as Vote::Offline. There’s not really a clean code snippet for this process to paste here, but this is where a failed node is recorded in the failed_recipients variable (source):

Err(_) => {
    failed_recipients.push(*addr);

    if next < recipients.len() {
        tasks.push(send(&recipients[next], msg.clone()));
        next += 1;      
    }
}

What sort of errors can get us to execute this code block? I’m not sure of all of them, we’d need to dig into qp2p and quinn errors to find what can go wrong in the send(recipient, msg) function. But one thing for sure that’s in there is a connection timeout.

Timeout is set to 30s in qp2p/src/peer_config.rs

pub const DEFAULT_IDLE_TIMEOUT_MSEC: u64 = 30_000; // 30secs

But this is a configurable value so nodes can set it to whatever they want. There may be some comms noise if nodes are tweaking this value. Can’t really do anything about people changing it, but we can a) set a sensible default and don’t make it really easy for people to change it, b) maybe introduce some anti-spam measures for Vote::Offline measures. Could spamming be punished by consensus, or is it something we can only react to locally? How about when a node disconnects repeatedly from just one other node to induce spam to all the others? It’s tricky…

It seems as long as nodes don’t drop out for longer than 30s then they’re safe from age demotion. Longer than that and they’ll be voted as being offline and subject to the rejoining penalty with age halved. Maybe there’s some wiggle room if the node can return before the voting reaches consensus, but I can’t imagine that would give much extra time, I wouldn’t imagine any more than 10s between starting and completing the vote.

13 Likes

Agreed. A bit harsh due to the way age is earned. I’m not in love with the integer division. I like this alternative when age is an u8:

let new_age = cmp::max(MIN_AGE, old_info.value.peer.age() - 1);

Edit: To ensure that demotion of one age unit is a significant enough penalty, the node rewards just need to scale linearly with “work performed” instead of age. I also like the idea of making the age an u64 for finer grain control. An u64 nodal age would just be equal to your “work performed” metric based on the number of consensus/decision rounds the node has participated in and the integer division would remain. Representing an event based nodal age via an u32 might offer a natural lifespan for nodes and minimize entrenchment.

4 Likes

Dependencies like self_update and reqwest are only used by install/update commands which are not essential.

I have forked sn_api crate to control these commands by a cargo feature. This is an opt-out feature so that the PR I have issued gets a chance to be accepted by Maidsafe.

To be clear:

  • by default install/update commands are enabled and behaviour is unchanged,

  • these commands can be disabled and in this case the code doesn’t depend on openssl anymore and can be build for MUSL. Additionally a size reduction up to 30% is observed in generated binaries:

With self-update Without self-update Gain MUSL version Gain
safe 31021920 22771952 27% 21825984 30%
sn_authd 21318280 16586400 22% 15902848 25%
5 Likes

Just curious about this because I still get an error building sn_api with musl. ring is not building with musl and is a dependency of more than just self-update so cannot be removed.

$ cargo build --release --target x86_64-unknown-linux-musl
error: failed to run custom build command for `ring v0.16.19`
$ cargo tree -i ring
ring v0.16.19
├── quinn-proto v0.6.1
│   └── quinn v0.6.1
...
├── rustls v0.17.0
│   ├── qp2p v0.9.6 (*)
...

I notice in the PR it says “And then just launch cargo build --release.”

Are you building with the flag --target x86_64-unknown-linux-musl?

Yes, this is one of the many possible targets.

Complete session building this target from scratch (note that I use cross instead of cargo for cross compilation):

$ git clone https://github.com/Thierry61/sn_api.git
Cloning into 'sn_api'...
remote: Enumerating objects: 14, done.
remote: Counting objects: 100% (14/14), done.
remote: Compressing objects: 100% (13/13), done.
remote: Total 9938 (delta 2), reused 4 (delta 1), pack-reused 9924
Receiving objects: 100% (9938/9938), 4.21 MiB | 3.10 MiB/s, done.
Resolving deltas: 100% (7181/7181), done.
$ cd sn_api
$ sed -i -e 's/default = \["simulated-payouts", "self-update"\]/default = ["simulated-payouts"]/' sn_authd/Cargo.toml sn_cli/Cargo.toml
$ cross build --release --target x86_64-unknown-linux-musl
   Compiling libc v0.2.81
   Compiling proc-macro2 v1.0.24
   ...
   Compiling sn_authd v0.0.13 (/project/sn_authd)
   Compiling sn_cli v0.17.0 (/project/sn_cli)
    Finished release [optimized] target(s) in 13m 28s
$ ls -l $(find target/x86_64-unknown-linux-musl/release/ -maxdepth 1 -executable -type f)
-rwxr-xr-x 2 ubuntu0 ubuntu0 21825984 janv.  2 10:05 target/x86_64-unknown-linux-musl/release/safe
-rwxr-xr-x 2 ubuntu0 ubuntu0 15902848 janv.  2 10:05 target/x86_64-unknown-linux-musl/release/sn_authd
1 Like

Tough one though, we do want it easy for folks so update/install is very helpful. Musl is also (for me) a must. The confusion folk will have would be too much. So perhaps the opt-out works well and your PR is (again) very well done and professional. I am a bit torn on this one. What I mean is I wonder if we are not just better fixing self_update to use musl (I thought we had). I suspect that was just a case of using a rusttls flag somewhere. If I recall @joshuef sent a PR to this effect, it may have been @lionel.faber though.

Can we try that approach first? Sorry man, I hope that does not detract from your work there, but I would prefer a fix if possible first.

5 Likes

Yeah it looks like self-update is configured to use rustls

https://github.com/maidsafe/sn_api/blob/6e4ea368fdcedb10042b5d8dc94ab02eece47003/sn_authd/Cargo.toml#L37-L40

https://github.com/maidsafe/sn_api/blob/6e4ea368fdcedb10042b5d8dc94ab02eece47003/sn_cli/Cargo.toml#L55-L58

But reqwest needs a feature flag added (more info in sn_api and openssl).

-reqwest = "~0.9.22"
...
+[dependencies.reqwest]
+version = "~0.9.22"
+default-features = false
+features = ["rustls-tls"]

In my experience this removes openssl dependency (great!) but I am still trying to get it to build with musl… great to see you’ve got it working @tfa I’ll keep persisting with it.

5 Likes

Making “self-update” opt-in would be easy:

  • remove it from default feature list in sn_authd/Cargo.toml and sn_cli/Cargo.toml: this would open up more possibilities for developer forking the repo.

  • to keep Maidsafe builds unchanged: dynamically add it back in the default features by adding a command like
    sed -i -e 's/default = \["simulated-payouts"\]/default = ["simulated-payouts", "self-update"]/' sn_authd/Cargo.toml sn_cli/Cargo.toml
    just before each cargo build command in Maidsafe workflows (to work around cargo build --features "self-update" not working in the root directory of a cargo workspace)

I can add this in my PR if you like.

Please note that I have added a supplementary commit in my PR to correct new warnings generated by clippy latest version.

6 Likes

My hesitation is this. I am hoping for a working testnet very soon. It will get a load of tweaks quickly (version numbering, wire format, serialisation format and much more). I expect these would be a ton (and I mean a ton) of fast updates, even to check the update feature. So folk running without auto updates will get really peed off.

Then add an option to the config (human settable) to allow auto updates as we get V1, before that though (and before safecoin official) I feel we will update like mad people. Updates will be so important. Therefor I would really much prefer to fix the update and make it run time settable if that makes sense?

We could add this PR as a temp solution perhaps, but I have seen so many of those tweaks and they are sometimes hard to remove (folk go mental if they think we remove something). At a minimum, the default should always have self_update I feel.

15 Likes