Another point here, that is newly discovered.
We actually had client calls that were FIND_CLOSEST that timed out and returned nodes THAT WERE NOT CLOSEST and that is a definite issue. It’s horrible, there are reasons, excuses and complex use case defences for it, but it’s just wrong. That one will be fixed and hopefully improve things. @Anselme 's analyser app will also help checking close nodes are close nodes and so on with random sampling of these things now in our grasp.
As above there’s a couple of issues that are critical, but as I said internally the amazing thing is the network holds at all right now, given these issues. So yes I think the retry logic is great @happybeing but we are leaning on app devs to go way too far beyond where they have to and we will address it.
There’s a significant effort to refactor and simplify ant-networking (check out those mega functions in there, it’s horrible). A couple of the guys, who shall remain nameless (it was @Anselme and Victor ) did a wee test of a new client in 100 lines of code instead of the 12000 lines (I did not typo that
) So there’s lots to play with and for here and we will get it done.