Joshnet [May 4th Testnet 2023 ; Offline]

SmoothOperatorGR · May 7, 2023, 4:48pm

the router seems cool but its like they messed with it so nothing works? vodafone gr

joshuef · May 7, 2023, 11:23pm

That’s as atm the client requires 100% success from the 8 receiving nodes. If we relax that to a majority, it’s much cleaner (and let libp2p do data republish as it does). @qi_ma already has an initial draft sorting this, I believe.

edit:

So looks like our nodes died. Not sure why as yet, perhaps OOM. They were running on the lowest sized droplet so that could well be it.

edit edit:

Hmm, those Dialing 12D3KooWK7wF4J look loopy to me. We’ve a

edit edit edit:

So def OOM. Which probably led to lots of repub and all the dialling attempst as it cascaded down the way.

Time to get the disk backed record store on the go it seems.

Vort · May 8, 2023, 6:39am

Large disk storages will just make process of bug repoduction slower.
Storage of the same size (1 GB) will free some RAM, but without fixing memory leak and memory spikes it may not give much benefits.

tobbetj · May 8, 2023, 6:55am

What is OOM?

dirvine · May 8, 2023, 7:00am

Out Of Memory

It’s not a surprise as folk will always flood ou testnets with as much data as they possibly can. It’s a human thing.

My thinking for a while has been to get pay for data uploads in place to:

Show the real network (uploads will be much slower)
Prevent spamming data to it

Of course it’s Monopoly money to start and some folk will still go wild in the country, but the network should limit how quickly that happens and with a full network, we will see the payments happen.

It is on us now to get all that in place as well as NAT traversal, as without NAT traversal folk will sill try and connect, which will fail, but can cause some issues. That’s no problem in a much larger network, but in small testnets (another of my woes as I believe testing with less than 2000 nodes to be almost futile).

Toivo · May 8, 2023, 7:00am

Out Of Memory.

I just tried to check some logs, but my whole safenode folder, that should be located in tmp is gone. Is this normal? Last thing I did with the node was this, yesterday afternoon:

dirvine · May 8, 2023, 7:01am

Some distress wipe unused tmp data frequently. That may have happened here.

Vort · May 8, 2023, 7:12am

Doesn’t anyone think that reliable network must work correctly even when nodes are full?
Intentionally designing such scenario as undefined behaviour looks like dangerous thing.

dirvine · May 8, 2023, 7:34am

I would imagine we all do. This was a network where nodes stored data in memory and used very small nodes to do so. When men filled then they collapsed.

Storing on disk is a different beast though.

In terms of the real network getting full then that means the payment/incentive system must have failed, not because there is some magic number somewhere that says hey stop uploading we are full.

These nuances are subtle but very important. We cannot code our way past the issue that more data requires more space. That is worth seeing forefront in your mind here.

Vort · May 8, 2023, 7:51am

It is important to define if such scenario is bug or not.
If bug, then it needs fixing: by reproducing it more and more, instead of avoiding it with larger storage.

Of course.
At the same time, having read only network is better than having crashed network.

Theoretically, there can be a chance of recovering from read only mode.
Or at least it would be good to allow people to move their data to other network (or other version of this network) in case of failure.

dirvine · May 8, 2023, 8:15am

This was a test. I think there is confusion here, somehow.

This test used small nodes with 2Gb RAM, that’s all no disk use for data. It was a small number of nodes and had to cope with a lot of folk trying to connect from behind NAT (that part is a bug) that caused a lot of messages/churn.

This was not a situation where disks can fill and the node keeps going. In this situation when the virtual disk was full there was no room left for processing data. So this was akin to a tiny network holding some data and getting pummelled by messages.

It did great, but was expected to fill up memory, that happened (after a long time) and that was a good test.

So we used RAM, we filled RAM and then we killed the machines, that was correct and right to do, it was the test.

MrKeepingItSafe · May 8, 2023, 8:30am

Makes sense!

Writing to disk - will this be a brand new piece of work?

or has it been figured out before and now needs integration?

Thanks

stout77 · May 8, 2023, 8:31am

In relation to NAT traversal , are we still in the situation where we require TCP for it to work? Or has libp2p made progress on the QUIC rust implementation?
That sounds to me like a perfect case for a bamboo garden fund application btw… someone that can help libp2p to implement quic nat traversal in the rust lib, especially considering that that’s already implemented in GO (right?)

Vort · May 8, 2023, 8:34am

What I can’t understand is if such scenario is planned at all or not.

For me, RAM and disk storages are not that much different.
Operating systems can make disks look like RAM for example (swapping/paging).
So if problem can happen with one type of storage, it will happen with other type too.
However it may be that I’m missing something, so it may be better to discuss it later, after more obvious problems like NAT are gone.

dirvine · May 8, 2023, 9:22am

We have that code already but did not use it. There is a small piece of work integrating with how libp2p would like it, but not complex at all.

So far it would be tcp (unfortunately) but we will jump to QUIC as soon as libp2p allows or we can help there with engineering resources.

Yes AFAIK that is the case.

For a computer program they are though

YEs that is true and yo can fill that up much faster than disk too. The point is we used ram only so created a situation where all data had to fit in ram. There is much more disk available than RAM and when using ram for storage you eat into processing capability. etc. etc. etc.

NAT is not a problem for us and not on the radar as such. We are focussed on the Safe network parts now. So

DBC proper
Pay for data
Upgrades

NAT will happen, it’s a solved [roblem and needs some rust catchup work in rust libp2p.

dirvine · May 8, 2023, 9:33am

I think that is a very good application of BGF. Nice thinking there and perfect sense.

nos · May 8, 2023, 10:41am

In case it helps

fedora with vpn, i9 with 32 ram

When starting no problem with uploading or downloading with files of maximum 1mg

after several hours without being able to upload or download, the error message seemed to come from a loss of my section

as for the node, no problems connecting and working
ram consumption ~9mg, I traffic a lot peaks of 7 and 8 G both download and upload, cpu without problems

router wired connection, connection to the direct vpn of the same router

After an hour I began to experience delays in the 2 browsers ending up without being able to find the pages

After a few minutes they reconnected without problems.
the problem was intermittent and seemingly random

few errors in logs

Too bad I didn’t take screenshots of it all.

if my experience is of perfect help.

Cheer up team everything will end up working FINE

✿♅

JPL · May 8, 2023, 11:03am

When I was running three nodes from a laptop my family complained the internet was slow. Sure it was just a coincidence. Anyway I blamed BT.

Toivo · May 8, 2023, 11:05am

If that would be something that takes more than a couple days to sort out, may I suggest another testnet on this week, with this issue fixed:

github.com/maidsafe/safe_network

Improve Client store chunk performance

opened 09:38AM - 08 May 23 UTC

closed 04:13PM - 16 May 23 UTC

maqi

Current, when a client storing a chunk, it first get closest peers, send send ou…t request to each and wait on all ALL responses (Ok or non-Ok) till complete and return. This will make the flow slow when connection to one of the nodes is in problem, i.e. a response only got returned after TimeOut. The performance can be improved by using the similar function as to `expect_closest_majority_ok` with the dbc spend flow, which will erturn earlier whenever received majority OK responses. OR even being more agressive to return on the first OK response.

If I understand it right, this already would make the up- and downloading much smoother experience. The stability of Joshnet was kind of incredible to witness already as it was, but was a bit abstract and not so fun when we couldn’t really throw any usual files to each other. I wouldn’t mind if the testnet was a bit short-lived now when the reason for that seems more or less known and solvable.

I think that at this point the PR -value of somewhat usable, if short lived, testnets are considerable, and would do ton of good for “marketing department”.

Josh · May 8, 2023, 7:37pm

When do you expect we will see a test with these numbers.
Not asking for a date but what needs to be accomplished before we see a 2000 node test.

Topic		Replies	Views
DiskNet [May 16th Testnet 2023] Now Offline Releases	77	2284	May 20, 2023
Update 11 May, 2023 Updates	29	1979	May 21, 2023
Update 27 April, 2023 Updates	25	2178	May 7, 2023
HeapNet2 [Testnet 12/10/23] [Offline] Releases	514	6228	December 3, 2023
DataDebugNet [ 28/09/23 Testnet ] [Offline] Releases	184	2752	October 4, 2023

Joshnet [May 4th Testnet 2023 ; Offline]

Related topics