Hi everyone. Thank you for your participation in the last testnet. As mentioned earlier today, the Vaults in the testnet were killed by the operating system because of high memory usage. We have been studying the logs and there are no signs that show that there a sudden spike in memory usage led to the out of memory error. We suspect that certain components are using more memory than expected.
To help us study the memory usage of the Vault we will be doing memory profiling on the vault processes. We would like all of you to join the network once again. Both client and vaults are welcome. The goal is to replicate the OOM error once again so we can study the results of memory profiling. In other words, help us take down the network again
Yesterday could connect, today not. (All router configuration are the same)
INFO 2020-07-17T15:21:54.185662158+02:00 [/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/quic-p2p-0.7.0/src/lib.rs:335] IGD request failed: Could not find the gateway device for IGD - IgdSearch(IoError(Custom { kind: TimedOut, error: “search timed out” }))
ERROR 2020-07-17T15:21:54.185787171+02:00 [src/bin/safe_vault.rs:169] Cannot start vault due to error: Routing(Network(IgdNotSupported))
ERROR 2020-07-17T15:21:54.185811574+02:00 [src/bin/safe_vault.rs:170] Automatic Port forwarding Failed. Check if UPnP is enabled in your router’s settings and try again. Note that not all routers are supported in this testnet. Visit https://forum.autonomi.community for more information.
I am successfully connected.
I uploaded genesis.json of cosmos network … just for test
Its size is about 60M
if you want to test get speed, please use it. It took about 20sec in my case.
safe files get safe://hnyynyxjgtidjwufr5hiodnkrn33yrq9pco6xoczknq67n3dw55kk4y6wgbnc
In small network it’s likely they will, whether we notice it or not though is not clear. Larger network then not so much. A lot depends on number of sections and concurrent accesses.
Expecting we know when the last network booted and fell over.
Do we have a count down for the same this occasion?.. wonder some issues might just be time limited.
I’ll bet all the testsafecoin I have on 3.24AM BST!
I got this message from network
[2020-07-17T15:14:47Z TRACE safe_core::connection_manager::connection_group] 0: Recvd connection failure for 165.22.115.111:12000, Connection cancelled
$ safe cat safe://V
Files of FilesContainer (version 0) at "safe://V":
+---------------+-----------------+------+----------------------+----------------------+-------------------------------------------------------------------+
| Name | Type | Size | Created | Modified | Link |
+---------------+-----------------+------+----------------------+----------------------+-------------------------------------------------------------------+
| /V | inode/directory | 0 | 2020-07-17T14:22:01Z | 2020-07-17T14:22:01Z | |
+---------------+-----------------+------+----------------------+----------------------+-------------------------------------------------------------------+
| /V/index.html | text/html | 1375 | 2020-07-17T14:22:01Z | 2020-07-17T14:22:01Z | safe://hbhybydr9toxojot8i56j183bghdynb59p5i6gwpctrddg1pn69h5jm74m |
+---------------+-----------------+------+----------------------+----------------------+-------------------------------------------------------------------+
| /V/style.css | text/css | 387 | 2020-07-17T14:22:01Z | 2020-07-17T14:22:01Z | safe://hbhyyydcftn8474rp9fdobjr98mdegqh3gba1hurez5ohdwauoeiusiww7 |
+---------------+-----------------+------+----------------------+----------------------+-------------------------------------------------------------------+
| /V/v.gif | image/gif | 3583 | 2020-07-17T14:22:01Z | 2020-07-17T14:22:01Z | safe://hbwyyodwgh3qzadykmety6cjtupc4djfok3kwh9e7eidgu5sdf7oanezk1 |
+---------------+-----------------+------+----------------------+----------------------+-------------------------------------------------------------------+
+---------------+-----------------+------+----------------------+----------------------+-------------------------------------------------------------------+
$ safe files get safe://V
File [1 of 4]: V
File [1 of 4]: V
File [1 of 4]: V
File [1 of 4]: V
File [1 of 4]: V
File [1 of 4]: V
File [1 of 4]: V
File [1 of 4]: V
File [1 of 4]: V
File [1 of 4]: V
File [1 of 4]: V
File [1 of 4]: V
[00:00:00] [----------------------------------------] 0B/1.34KB (0B/s, 0s) FiFile [1 of 4]: V
[00:00:00] [----------------------------------------] 0B/1.34KB (0B/s, 0s) FiFile [1 of 4]: V
[00:00:00] [----------------------------------------] 0B/1.34KB (0B/s, 0s) FiFile [1 of 4]: V
⠁ [00:00:00] [----------------------------------------] 0B/1.34KB (0B/s, 0s) FiFile [2 of 4]: V/index.html
⠁ [00:00:00] [----------------------------------------] 0B/1.34KB (0B/s, 0s) FiFile [2 of 4]: V/index.html
⠁ [00:00:00] [----------------------------------------] 0B/1.34KB (0B/s, 0s) FiFile [2 of 4]: V/index.html
⠉ [00:00:00] [----------------------------------------] 0B/1.34KB (0B/s, 0s) FiFile [2 of 4]: V/index.html
⠉ [00:00:00] [----------------------------------------] 0B/1.34KB (0B/s, 0s) File
⠉ [00:00:01] [----------------------------------------] 0B/5.22KB (0B/s, 0s) Transfer
still waiting for index.html
but
$ safe cat safe://V/index.html
[2020-07-17T15:33:05Z ERROR safe] safe-cli error: [Error] ContentError - No data found for path "/index.html/" on the FilesContainer at "safe://hnyynyqkjnqqigng479bb9uz6e8751yn5e5hcsk7mcugxg3nnriygt5h8cbnc/index.html?v=0"
suggests a problem.
So, safe files get should have picked up on that??..
also safe files get progress, is not well presented… spawning many lines.
Thanks for all the help guys! And great job I must say. We have been doing quite the bashing and reproduced the OOM issue on some of the nodes. We should have what we need to continue investigating the issue. I’ll leave the remaining nodes running, however with a reduced number of elders there is no certainity that all requests will go through. The logs after nodes fail, will help plan ahead for the next steps.
Thanks once again for all your help. Have a good weekend.
Is IP 157.245.43.31:12000 correct?
Can’t connect to it: INFO 2020-07-18T14:57:25.405639900+03:00 [C:\Users\runneradmin\.cargo\registry\src\github.com-1ecc6299db9ec823\quic-p2p-0.7.0\src\lib.rs:543] Node 157.245.43.31:12000 is unresponsive, removing it from bootstrap contacts; 0 contacts left
Is this iteration also crashed like previous two?