Are there specific ports to port-forward (udp/tcp)?
For now I will just try to use the hard-coded default port 12000/tcp & 12000/udp?
Are there specific ports to port-forward (udp/tcp)?
For now I will just try to use the hard-coded default port 12000/tcp & 12000/udp?
I think you might be best to wait for testnet V6.1 which sounds like it could arrive tomorrow.
Is there a mechanism in place to handle network uploads when uploading a file is stopped halfway through?
E.g. I can imagine some people upload file of several GBās and then cancel because it takes too long.
Will this data be floating around or will the elders know to issue a timeout and then remove it from their storage/memory usage?
Would be best if any nodes whom are handling it will get paid for their work nonetheless to prevent spam attacks and put the responsibility at the user for maintaining a stable connection until file is uploaded.
Option to pay more for higher priority? People who are more patient could pay less, similar to BTC fees (fast/normal/slow).
atm each chunk is paid and stored forever.
This is what we do, but we are looking to have storage contracts. So when created you can upload the data at any time, even repeatedly and with the contract, itās proof of payment. Then we can batch that really nicely.
Some context for future reference, it looks like there are two main commits for this
Is there anything else thatās worth pointing out for changes to logging?
I tried running a local network with RUST_LOG=off and no logs were generated (not even an empty file), but the performance was not noticeably different to running with logs. Is there anything particular that should be done to the environment or is all the logging benefits there in the code?
Actually, I think we just need udp protocol,
During the previous testnet, on IGD, I saw port forwarding on udp protocol only
The log issue was more like a memory leak, so over time the logs grew unbounded. The biggest improvements in memory have been getting rid of unbounded containers and reducing message numbers required. Right now the whole messaging is being again simplified (a lot) to the point of telling the story or describing the algorithms. The hope is looking at messages will allow anyone to see all data flows easily. Especially devs ![]()
The mempool concept with upload stack (highest fee is next in line after most current action) would stimulate the use of SafeNetwork Tokens.
Basically FIFO unless there is a higher attached fee in the current stack.
Yes, just udp and you can use any ports you want to port forward, just need to make sure router is set with same ports and ip as your computer. The port numbers can be the same or different on the router, but the combination of internal and external ports must be the same on router and computer.
i.e. These are all Ok
local_ip::12000 : remote_ip::12000
local_ip::20000 : remote_ip::23333
As long as both router and computer are set with the same mapping of local/external it is ok.
Interesting to know more details here, seems very strange for logging to make that much difference.
I tried uploading a 10MiB file, 5 tests with and without logging, new testnet for each test (so it never got very large logs). Maybe this particular test doesnāt reproduce the test where you saw the improvements?
With logging (seconds for upload to complete)
38.333
37.182
37.360
38.614
38.768
Without logging - RUST_LOG=off
37.659
29.017 (yes 29, not a typo)
38.064
38.971
38.677
Does appending to large files on disk cause slowdown over time? Seems like it would be a very quick operation, not one that would slow down as the file grows. Would be interesting to get more details and reproduce the no-logging speedup again. 2x-3x is a big improvement.
I had a quick look around to see if there was anyone else with similar experience, didnāt find much. The Rust Performance Book - Logging and Debugging doesnāt have much to say. Anyone know of any other third-party resources for logging performance (rust or otherwise)?
This reddit thread on env:logger was my source for log related slowdowns there. (they talk about env logger being sync and needing to retake write lock for each log write). Seems like weāre already using `flexi-logger in node; which Iām not sure how it stacks up on this front.
What we were seeing in general with internal testnets prior (and thus what may have happened here) was the network get progressively slower. And then we would also be generating massive logfiles (100-200mb of logs sometimes; depending on log levels). Itās still not clear if this is definitely something thatās been an issue, but it seems to have had an impact.
My 2-3x number comes from upping a droplet network with no logging turned on (removing -vvvvv from our droplet update) and I saw the increase in speed. Before I was seeing all client tests (cargo test --release in sn_client on a 51 node, post-split network) take ~200-300s on a fresh network; with this change I was getting a reasonably consistent ~100s-130s.
The log rotation used on our last few internal networks has felt more stable over long runs (though without such a wild improvement).
Hopefully weāll see soon enough when we get 6.1 up! ![]()
edit: may also be worth noting when testing on a DO, weād normally be hitting it from a few computers simultaneously. Itās difficult to guage the numbers / impact here as itās not (yet) super scientific (would be some nice tests to get in⦠running a set of tests with ābackground useā to get more standardised results).
Weāre slowly building up some more standard checks which will run against stand alone networks. Iāll hopefully get to looking at hooking our churn test script into the CI flow this week, and then we can start expanding that to get more āreal worldā (WAN) confidence for each PR (something we couldnāt realistically have set up when we had 5/6/7 different repos for what is going into the mono repo now.
And beyond that we can hopefully start to look at other testing frameworks (not sure what will be most relevant here).
Itās fun stuff trying to figure out how to test this network tbh! ![]()
another edit: our current churn tests for the curious. Super simple, we start a small network of 11 nodes, upload some data (1->7mb files here), then increase the node count to 50 or so nodes, and we verify we can retrieve our data.
We then do a few loops dropping two nodes at a time (2 as they could theoretically be elders, so we cant loose more than 2 at one time), checking at each stage for our data again.
Itās not the most elegant setup, but it increases our confidence in data retention over churn.
And as we get this hooked up weāll start expanding and testing more scenarios.
jepson and other simulation frameworks have been mentioned in the last week, but iām not super familiar with such things yet. But all interesting stuff!
Ah, very very interesting, thanks for clarifying!
This has really piqued my curiosity, I want to have a crack at reproducing when I get some time.
I did some early poking around on my laptop, 4 core hyperthreaded 16 GB memory 256 GB ssd.
Start 11 node baby-fleming.
Then run:
sn_client (v0.61.1) $ time cargo test --release
Took 5m32s
After the test the node log sizes were between 300 to 800 KiB
Using RUST_LOG 'sn_node=debug'
Repeat with RUST_LOG=off
Took 5m36s
No node log files were created
Not reading any meaning from the test, was just a first toe in the water type of thing. For example during the test my cpu was 100% the whole time, so could possibly be misleading due to the node:core ratio.
Creating a 300 MiB file took just over 8s on my laptop:
dd if=/dev/urandom of=/tmp/data300.bin bs=1M count=300
8s is not long in the scale of a 5 minute test, and in that test period the logs were not significant, nowhere near 300 MiB total.
Iām not really seeing how logs of hundreds of megabytes could cause significant slowdown. Although as you say repeatedly taking a write lock could be an issue.
I came across this java logging performance post where theyāre regularly talking about millions of messages per second: Log4j ā Performance
Anyways⦠really fascinating stuff and Iāll fire up my DO account when I get a chance and see if I can reproduce it because itās very curious to me that logging would have that much effect.
Massive logging may create large amount of small objects, which will slow down future memory allocations even after being deleted (just a guess).
I do not believe in large log files as a cause too. If whole file is not read, appended in memory and then written at each log operation of course ![]()
The problem is that lags are increasing in duration since network start. So somewhere square/exponential complexity is hiding.
Aye. It may not be the logging itself at all. Could be a red herring indeed.
Weāll see with the log rotation working properly soon enough! (Which is assuming flexi works like env logger), The ātracingā lib seems to be the gold standard these days in Rust and is built for async code. So switching there for the node code will likely help (I believe routing was already using this).
Sounds like music!
Ditto to that. As long as it sounds positive it pacifies a little. But its just two of us so weāre the minority here. ![]()
![]()
Adding a bit more context to the -vvvvv flag
-vvvvv log level for nodes is defined in sn_testnet_tool provider L37 and used in sn_testnet_tool node L53 as a flag to the sn_node binary.
sn_node uses the verbosity flag in sn_node utils L112 to create a RUST_LOG filter sn_node=trace
The deployment to DigitalOcean uses terraform to start the nodes, not safe node run-baby-fleming.
The equivalent local command to what is running on DO (from a logging point of view) is
export RUST_LOG=sn_node=trace; safe node run-baby-fleming
When I tried running baby-fleming using RUST_LOG=sn_node=trace I saw only sn_node logs up to the trace level (as expected). All other libraries (eg routing, quinn) do not log (as expected). This is important to clarify since quinn is especially verbose at trace level and itās important that quinn is never set to trace level logging because it really destroys the performance. I tried running baby-fleming with RUST_LOG=trace (ie all libraries logging at trace level) and quinn accounts for more than 99% of log messages.
I did a quick run of sn_client tests with nodes logging set to sn_node=trace and it took 5m22s (compared to ~5m30s with no logging) so it looks like Iām not getting a big slowdown from trace level logging (or speedup from disabling logging).
Still only looking at the surface here, but itās good to keep track of the details along the way.
Yep, good info there.
Also worth mentioning that prior to friday (just got a fix in the eve there), the droplet deploy script was writing to stdout and been piped to a logfile, not using nodes baked in logging. Iām not sure if that may account for something there or not.
I was trying a few other testnet things yesterday. I didnāt get any wold perf results, was able to (I think) see a few issues that may have contributed to our Put/Get woes though. Starting some deeper log analysis this morn.
Ah yeh, forgot to highlight this yesterday.
This is definitely something thatās been heavy in the network. Before we squashed a lot of unecessary messages, WireMsg::serialize was our source of memory growth. Massively. Alongside the reduction in messages and some perf tweaks here (we were also re-serialising the whole message to send essentially the same thing to a different node; now we use a separated out payload + header) which we already got in and had a huge impact on per-node mem-usaged; thereās more changes coming now weāre in a mono-repo , unifying messaging across crates and adding a few more useful bits to the the header, so we can use this more often.
This should hopefully avoid even more unessesary deserialize calls.
I wish there was a way I could tell whether e.g. a put has hung or is still working. Is there?
ps aux | grep safe
isnāt enough.