I think trying the worst performing option on the test nets will be a great way to quantify this. I think part of the problem is, we just don’t know how well the network will cope with sending 1000s of tiny coins about the place.
Ultimately, the data being moved is tiny. We are literally only changing ownership of the coin. I know there is a substantial overhead in doing the ‘simple’ thing and repeating it over and over isn’t ideal. However, if it is at all feasible, imo it should be strongly considered.
Simple designs are always easier to understand and maintain. That may prove to be more important in the long run.
Maybe this is actually pointing to a use case which could do with attention in general. That is, changing ownership of 1000s of data elements simultaneously. Perhaps the network could group some of these transactions in bundles, allowing some of the work to be done once for the whole bundle? If the calculations can take advantage of parallelism, it could scale much better.
In fact, maybe such transaction bundling would help in other areas, where many files/chunks need to be changed frequently. If they could be buffered into bundles too, then he same benefits may avail themselves (using SIMD features in CPUs/GPUs too, etc).
Just thinking aloud without understanding what is technically possible, but maybe worth considering.