Distributed SAFE OS

In the distributed OS much of the processing can still be done by the clients since they have direct assess to the SAFE network. Processing that needs more security can be moved to backend apps who have higher security than the client apps. One drawback with backend apps is that the tasks run in two separate random nodes. This duplication is for security reasons. It may seem like a waste of computing resources, but it’s similar to how the data storage is duplicated (at least 4 copies) for safety reasons.

To break the backend security both farmers need to be hacked in the same way. One farmer may be in Asia and the other in Europe because of the pseudorandom XOR addressing. When the code is running in a client, only that client software needs to be hacked. In this way the backend apps are more secure than the client apps.

Another security in the backend apps is that the farmers only run snippets of code (tasks) whereas in frontend apps the client has access to the entire app code. So even if both random farmers happened to be hacked in the same way, they wouldn’t know what changes to make since the next task would run in two other randomly distributed farmers.

@anders your stuff is wonderful. Don’t stop keep it up.

1 Like

FYI: I think you will find that a minimum of 3 is needed. If the 2 disagree, then who is right? Three allows you to have good confidence if at least 2 of the 3 agree.

It may seem that if the hash for the task code is used for determining which random nodes should run the backend code, then the same nodes will always run the same code since the hash remains the same. What makes a difference is that the hash is a result of both the backend task code and the JSON object provided as input. The input will always contain different information such as unique client id, different input data etc.

And it’s important to have the addressing pseudorandom instead of truly random. Otherwise nodes could make fraud addressing. Because of the deterministic addressing determined by the hash it’s possible to verify that the correct nodes have processed the task at each step.

My initial idea was that if the two nodes generate different results, the TaskManager tries with two other random nodes. And if that also fails then an error code is returned.

I’m not sure three nodes will increase the security and it also means more computation resources needed (on average). Even with three nodes there is a problem, since if (EDIT: at least) two of the nodes are hacked and produce the same result, then that fraud result will win (majority) over the correct result (minority) from the third node.

The timeout limit of 1 minute can be enough since longer jobs can be distributed into several tasks. And with a fixed reward for each task regardless of execution time, the incentive for farmers is to execute the tasks as fast as possible in order to maximize the aggregated farming reward. And the incentive for developers is to make the tasks run as fast as possible to optimize the performance of the application for the users.

There probably has to be a cost of running a task, because it will be very easy to consume a lot of computing power on the distributed OS. Tasks can call other tasks. There has to be a check that ensures that the tasks form a tree structure and not arbitrary graphs. Still, even with a tree model, one task can call hundreds of other tasks who in turn call hundreds of new tasks each. In this way computing resources can quickly be used up by a single user.

Preventing loops (task called recursively) and having a cost of running tasks is a security against abuse and mistakes. The cost of running a task can be adjusted in the same way the cost of storing data on the SAFE network is adjusted to balance supply and demand of resources.

@Anders if you haven’t already, you should look at how Ethereum handle these issues - charging, loops etc. It’s quite easy to understand how they do this, but I don’t have a link to hand unfortunately.

Yes can be worth examining. I don’t know if contracts in Ethereum can call other contracts and how they solve recursive loops. I will do some searches about it.

When it comes to code execution in Ethereum they prevent infinite loops in the contracts by having “gas” that limits the execution of the code:

“In order to prevent accidental or hostile infinite loops or other computational wastage in code, each transaction is required to set a limit to how many computational steps of code execution it can use.” – White Paper · ethereum/wiki Wiki · GitHub

In JavaScript web workers can be terminated I think with worker.terminate() and then it’s possible to set a deadline even when the code has an infinite loop. I haven’t tested this yet though to check if this is a hard enough termination of the thread.

EDIT: Yes, the terminate() method seems to totally kill the thread: Distributed OS Simulation - JSFiddle - Code Playground And that could be important, since it’s tricky to start to modify a JavaScript engine like V8, and from a security perspective that’s really, really difficult since V8 has been heavily battle tested for years.

I only found that in Ethereum a contract can call another contract by compiling the code together with the other contracts. If that’s the only way, then it seems limited and static.

In the distributed OS tasks can call other tasks dynamically. And infinite loops are prevented by only allowing tasks being called in a tree structure. In practice this can be done by taking the hashes of the tasks and preventing the task call sequence (for each branch in the tree) from containing duplicate hashes. The tasks are stored on the SAFE network as the encrypted JavaScript code for the task plus meta information such as the hash of the task code, owner information etc.

And the 1 minute deadline prevents tasks from running infinite loops. Note that the task code can contain infinite loops and it’s just that after 1 minute the task is force-killed by the TaskEngine. And the client will still have to pay for running the task.

Is it unfair that a task taking 1 millisecond to run costs the same as running a task for 1 minute? Not necessarily. It’s similar to how (I assume) it costs the same to store 5 bytes of information as storing 1 megabyte of information on the SAFE network. Just as the storage has a 1 MB granularity, the computing has 1 minute granularity (or 30 seconds or whatever deadline is chosen).

The running time for tasks overall will probably form a bell curve and the tasks are (pseudo)randomly distributed among the farmers. And the farmers have to run all the tasks given to them regardless of running time, or else their ranking will drop. The developers need to make sure that their task code runs within the deadline limit. Farmers will run on different equipment with different computation speed, so the developers need to take that into account when estimating the running time for their tasks. That simply means that the developers need to have enough margin to allow most of the slower farmers to be able to run the task within the deadline limit. (The deadline can remain fixed and the only effect in the future is that technological progress will make it possible to squeeze in more and more computations over the years.)

2 Likes

Here is a simple simulation of how tasks can be implemented in JavaScript:

// Task code as a text file from the SAFE network.
var taskCode = "function(event) { \
    var i, x = 0, maxLoop = event.data; \
    for (i = 0; i < maxLoop; i++) { \
        x += 1 / (i + 1); \
    } \
    postMessage(x); \
}";

// JSON inputs from clients
var input1 = '100';
var input2 = '10000000000';

Live demo: Distributed OS Simulation - JSFiddle - Code Playground

The demo dynamically starts web workers that make the task code run in separate threads.

3 Likes

Here is a version where the task code is free from worker specific details. The task code is now simply an ordinary JavaScript function that returns the result:

function(maxLoop) {
    var i, x = 0;
    for (i = 0; i < maxLoop; i++) {
        x += 1 / (i + 1);
    } 
    return x; 
}

Source: Distributed OS Simulation - JSFiddle - Code Playground

Note that the parameter maxLoop can be named anything and the value can be anything from a simple numerical value as in this simple example to a large complex data structure (sent by the client as a JSON string).

1 Like

Cron jobs in the distributed OS can be stored similar to how safecoins are stored on the SAFE network. Users can create, start, stop, edit and delete cron jobs. A cron job is a scheduled task that is run periodically at fixed times, dates, or intervals.

1 Like

If the amount of tasks a client can run is unlimited, then there has to be a certain cost in safecoins to run each task. Otherwise it’s way too easy to abuse computing resources.

If on the other hand running tasks is rate limited, then they could, perhaps, be made free! One such rate limit can be that each user account is only allowed to run up to a maximum number of tasks per hour. Then a botnet wouldn’t be much of use since it’s more beneficial for attackers and abusers to use the botnet for spam and other such things on the Web. Remember, the tasks on the distributed OS can only run on the SAFE network and there are no external communications possible to web servers or email servers or things like that.

1 Like

There is an automatic balance for both farmers and developers when it comes to the running time for a task. Farmers can run many simultaneous tasks on a single CPU, and the more tasks they run in parallel the slower each task will run. If a farmer tries to run too many tasks simultaneously on a single CPU then the deadline will start to be reached for many tasks and the farming ranking will drop. So the incentive for farmers is to limit spawning too many threads to prevent that their farming rank drops.

Also, the faster the farmers will be able to run the tasks, the more farming reward they will get since each task has a fixed farming reward. And spawning a lot of threads to run many simultaneous tasks on a single CPU will not necessarily be faster than running the tasks one by one in sequence because of the time-sharing of the single CPU. There is probably a performance optimum for the number of tasks that a farmer can run on a single CPU depending on memory, number of cores and bandwidth etc.

And as I mentioned in another comment, developers need to make sure their task code runs within the deadline limit on most farmers.