the router seems cool but its like they messed with it so nothing works? vodafone gr
Thatâs as atm the client requires 100% success from the 8 receiving nodes. If we relax that to a majority, itâs much cleaner (and let libp2p do data republish as it does). @qi_ma already has an initial draft sorting this, I believe.
edit:
So looks like our nodes died. Not sure why as yet, perhaps OOM. They were running on the lowest sized droplet so that could well be it.
edit edit:
Hmm, those Dialing 12D3KooWK7wF4J
look loopy to me. Weâve a
edit edit edit:
So def OOM. Which probably led to lots of repub and all the dialling attempst as it cascaded down the way.
Time to get the disk backed record store on the go it seems.
Large disk storages will just make process of bug repoduction slower.
Storage of the same size (1 GB) will free some RAM, but without fixing memory leak and memory spikes it may not give much benefits.
What is OOM?
Out Of Memory
Itâs not a surprise as folk will always flood ou testnets with as much data as they possibly can. Itâs a human thing.
My thinking for a while has been to get pay for data uploads in place to:
- Show the real network (uploads will be much slower)
- Prevent spamming data to it
Of course itâs Monopoly money to start and some folk will still go wild in the country, but the network should limit how quickly that happens and with a full network, we will see the payments happen.
It is on us now to get all that in place as well as NAT traversal, as without NAT traversal folk will sill try and connect, which will fail, but can cause some issues. Thatâs no problem in a much larger network, but in small testnets (another of my woes as I believe testing with less than 2000 nodes to be almost futile).
Out Of Memory.
I just tried to check some logs, but my whole safenode
folder, that should be located in tmp
is gone. Is this normal? Last thing I did with the node was this, yesterday afternoon:
Some distress wipe unused tmp data frequently. That may have happened here.
Doesnât anyone think that reliable network must work correctly even when nodes are full?
Intentionally designing such scenario as undefined behaviour looks like dangerous thing.
I would imagine we all do. This was a network where nodes stored data in memory and used very small nodes to do so. When men filled then they collapsed.
Storing on disk is a different beast though.
In terms of the real network getting full then that means the payment/incentive system must have failed, not because there is some magic number somewhere that says hey stop uploading we are full.
These nuances are subtle but very important. We cannot code our way past the issue that more data requires more space. That is worth seeing forefront in your mind here.
It is important to define if such scenario is bug or not.
If bug, then it needs fixing: by reproducing it more and more, instead of avoiding it with larger storage.
Of course.
At the same time, having read only network is better than having crashed network.
Theoretically, there can be a chance of recovering from read only mode.
Or at least it would be good to allow people to move their data to other network (or other version of this network) in case of failure.
This was a test. I think there is confusion here, somehow.
This test used small nodes with 2Gb RAM, thatâs all no disk use for data. It was a small number of nodes and had to cope with a lot of folk trying to connect from behind NAT (that part is a bug) that caused a lot of messages/churn.
This was not a situation where disks can fill and the node keeps going. In this situation when the virtual disk was full there was no room left for processing data. So this was akin to a tiny network holding some data and getting pummelled by messages.
It did great, but was expected to fill up memory, that happened (after a long time) and that was a good test.
So we used RAM, we filled RAM and then we killed the machines, that was correct and right to do, it was the test.
Makes sense!
Writing to disk - will this be a brand new piece of work?
or has it been figured out before and now needs integration?
Thanks
In relation to NAT traversal , are we still in the situation where we require TCP for it to work? Or has libp2p made progress on the QUIC rust implementation?
That sounds to me like a perfect case for a bamboo garden fund application btw⌠someone that can help libp2p to implement quic nat traversal in the rust lib, especially considering that thatâs already implemented in GO (right?)
What I canât understand is if such scenario is planned at all or not.
For me, RAM and disk storages are not that much different.
Operating systems can make disks look like RAM for example (swapping/paging).
So if problem can happen with one type of storage, it will happen with other type too.
However it may be that Iâm missing something, so it may be better to discuss it later, after more obvious problems like NAT are gone.
We have that code already but did not use it. There is a small piece of work integrating with how libp2p would like it, but not complex at all.
So far it would be tcp (unfortunately) but we will jump to QUIC as soon as libp2p allows or we can help there with engineering resources.
Yes AFAIK that is the case.
For a computer program they are though
YEs that is true and yo can fill that up much faster than disk too. The point is we used ram only so created a situation where all data had to fit in ram. There is much more disk available than RAM and when using ram for storage you eat into processing capability. etc. etc. etc.
NAT is not a problem for us and not on the radar as such. We are focussed on the Safe network parts now. So
- DBC proper
- Pay for data
- Upgrades
NAT will happen, itâs a solved [roblem and needs some rust catchup work in rust libp2p.
I think that is a very good application of BGF. Nice thinking there and perfect sense.
In case it helps
fedora with vpn, i9 with 32 ram
When starting no problem with uploading or downloading with files of maximum 1mg
after several hours without being able to upload or download, the error message seemed to come from a loss of my section
as for the node, no problems connecting and working
ram consumption ~9mg, I traffic a lot peaks of 7 and 8 G both download and upload, cpu without problems
router wired connection, connection to the direct vpn of the same router
After an hour I began to experience delays in the 2 browsers ending up without being able to find the pages
After a few minutes they reconnected without problems.
the problem was intermittent and seemingly random
few errors in logs
Too bad I didnât take screenshots of it all.
if my experience is of perfect help.
Cheer up team everything will end up working FINE
âżâ
When I was running three nodes from a laptop my family complained the internet was slow. Sure it was just a coincidence. Anyway I blamed BT.
If that would be something that takes more than a couple days to sort out, may I suggest another testnet on this week, with this issue fixed:
If I understand it right, this already would make the up- and downloading much smoother experience. The stability of Joshnet was kind of incredible to witness already as it was, but was a bit abstract and not so fun when we couldnât really throw any usual files to each other. I wouldnât mind if the testnet was a bit short-lived now when the reason for that seems more or less known and solvable.
I think that at this point the PR -value of somewhat usable, if short lived, testnets are considerable, and would do ton of good for âmarketing departmentâ.
When do you expect we will see a test with these numbers.
Not asking for a date but what needs to be accomplished before we see a 2000 node test.