@davidpbrown I’ve taken some liberties with your script…
#!/bin/bash
#Simple script to upload random data of a fixed size many times and log result
# thanks to @davidpbrown on https://safenetworkforum.org for the vast bulk of this work
## Setup
#Expects safe baby-fleming to be setup and running
TEST_SIZE=0
RUNS=0
clear
echo "----------------------------------------------------------------------"
echo ""
echo " -- Test baby-fleming network and provide reports --"
echo ""
echo " @davidpbrown and @southside of the SAFE community March 2020"
echo " https://safenetwork.org"
echo ""
echo " Is your baby-fleming network running?"
echo ""
echo " If not press Ctrl-C, start your network and run this script again."
echo ""
echo "----------------------------------------------------------------------"
echo ""
echo ""
echo ""
read -p 'How much random data do you want to put to the network (kb) ? : ' TEST_SIZE
echo ""
read -p 'How many test runs do you want? : ' RUNS
echo ""
echo ""
echo "PUTing " $TEST_SIZE "kb of random data to the network " $RUNS "times"
echo "--------------------------------------------------------------------------"
#
# set up logging location
mkdir ./zzz_log 2>/dev/null
mkdir ./to-upload 2>/dev/null
## Base state
#log base state
echo "### START" > ./zzz_log/report
date >> ./zzz_log/report
lscpu | grep -P 'Model name|^CPU\(s\)' >> ./zzz_log/report
vmstat -s | grep -P 'total memory|total swap' >> ./zzz_log/report
echo "# initial vault size" >> ./zzz_log/report
du -sh ~/.safe/vault/baby-fleming-vaults/* | sed 's#^\([^\t]*\).*/\([^/]*\)#\2\t\1#' | sed 's/genesis/1/' | sort >> ./zzz_log/report
## Start
COUNTER=0
while [ $COUNTER -lt $RUNS ]; do
let COUNTER=COUNTER+1
dd if=/dev/urandom of=./to-upload/file.dat bs=1k count=$TEST_SIZE 2>/dev/null
echo "file: "$COUNTER
echo "############" >> ./zzz_log/report
echo "file: "$COUNTER >> ./zzz_log/report
echo "size: "$(ls -hs ./to-upload/file.dat | sed 's/^\([^ ]*\).*/\1/') >> ./zzz_log/report
echo "# upload" >> ./zzz_log/report
(time safe files put ./to-upload/file.dat ) &>> ./zzz_log/report
echo >> ./zzz_log/report
echo "# vault size" >> ./zzz_log/report
du -sh ~/.safe/vault/baby-fleming-vaults/* | sed 's#^\([^\t]*\).*/\([^/]*\)#\2\t\1#' | sed 's/genesis/1/' | sort >> ./zzz_log/report
echo "upload: "$COUNTER" complete"
done
date >> ./zzz_log/report
echo "### END" >> ./zzz_log/report
## Summary pivot
echo -ne "\tfile:\t0\tsize: 0\t#\t\t\t\treal\t0\tuser\t0\tsys\t0\t\t" > ./zzz_log/summary_table_report; tail -n +7 ./zzz_log/report | tr '\n' '@' | sed 's/############/\n/g' | sed 's/@/\t/g' | sed 's/file: /file:\t/' >> ./zzz_log/summary_table_report
echo ""
echo "----------------------------------------------------------------"
echo ""
echo " The logs for your test run are located in ./zzz_log/"
echo ""
echo " Thank you for helping to test the SAFE network."
echo ""
echo "----------------------------------------------------------------"
exit
Small files provide no surprises…
3hrs45mins for 12930 files of 4K
typical result :
real
0m0.773s
user
0m0.098s
sys
0m0.004s
though I’ve yet to understand the difference betwwen real; user; and sys… I guess system perhaps is the real time that it was processing that task; user is the sum of all jobs for the user; and real is real world ?
### START
Sun 22 Mar 15:25:44 GMT 2020
CPU(s): 4
Model name: Intel(R) Core(TM) i3-7100U CPU @ 2.40GHz
16254420 K total memory
10239996 K total swap
### STOPPED
Sun 22 Mar 19:10:36 GMT 2020
/usr/bin/time -f "\t%E elapsed time"command
instead of the straight time command in your script.
As so often, reading the man pages is like try to get a drink from a firehose…
the ‘/usr/bin/’ is important so you use the actual time comand not the bash shell builtin. Otherwise all that formatting stuff will just get ignored
" Users of the bash shell need to use an explicit path in order to run the external time command and not the shell builtin variant. On
system where time is installed in /usr/bin, the first example would become
/usr/bin/time wc /etc/hosts"
E Elapsed real (wall clock) time used by the process, in
[hours:]minutes:seconds.
S Total number of CPU-seconds used by the system on behalf
of the process (in kernel mode), in seconds.
U Total number of CPU-seconds that the process used
directly (in user mode), in seconds.
which will be easier to condense down into something we can build a graph from.
I also want to look at R and get it to provide mean elapsed times and standard deviations for long test runs. I used to do this a looong time ago with Excel and Im sure its possible with Calc but might as well make the effort to work with the proper tool
EDIT: Hopefully we should be able to get the mean elapsed time and SD for each run and also report the no of failures if any. I was getting quite a few earlier with upload sizes >100Mb but that was hardly rigorous testing, att that stage, testing the script rather than looking for valid test results
Obviously there’s option to hack the script for the fields you do want, rather than work on the back of the output that is dumped atm. For example, adding lines for another file that is perhaps just file number and %E would be source for a mean calculation and graph.
Not sure if it helps, but VisLab is setup to create charts from CSV data. I can build it to do this from a specific local file or web URI. Just a thought. Example at http://vlab.happybeing.com/
### START
Mon 23 Mar 13:30:05 GMT 2020
CPU(s): 4
Model name: Intel(R) Core(TM) i5-2500K CPU @ 3.30GHz
16378512 K total memory
14648316 K total swap
tail: ./zzz_log/report.csv: file truncated
TEST RUN, FILE SIZE, TIME ELAPSED
So far I can output the test run number and the file size reliably but the actual time elapsed is causing me grief, I have an incredibly ugly sed hack that gives me what I want on the command line but annoyingly prints to the console, not the .csv when I put it in the script.
Any help welcome…
the line that works on the CLI is
` echo COUNTER "," (ls -sh ./to-upload/file.dat| sed ‘s/^([^ ])./\1/’),(/usr/bin/time -f "\t%E " safe files put ./to-upload/file.dat |sed -n '10000, p’)
Of course COUNTER has to be set and ./to_upload/file.dat needs to exist for this to work from the command line.
I can output only the elapsed time and skip the XOR-URL and safe file container info.
is that tail perhaps?.. but the preceding put surely won’t generate that much output for one upload.
time’s output, is a odd one to capture, which is the line in the script above (time safe files put ./to-upload/file.dat ) &>> ./zzz_log/report
if you’re having the same issue with its passing the detail through |
then perhaps put to a log file and work from that. If the output was to be large, putting into a file, will save having it store to memory until done.
Surprised that suggested works but contexts differ. Still, I wonder then fix for what you’ve suggested as script file, is $ for the COUNTER variable and backslashes as
if you did sum the output from many uploads >> a-log-file then cat a-log-file | tail -n 10000 makes more sense for grabbing the end of that large file.
I persuaded sed to only output the last line from any length of input ( less than 10000) because what I wanted to capture was only the elapsed time
Of course there are more elegant and correct ways to do it.
Just for clarity, I was looking for a one-liner that can then be wrapped in a loop.
I wanted the one liner to output TEST_RUN_NO, DATA_FILESIZE, ELAPSEDTIME as comma seperated values for each run we do.
The motivation was to get some csv data to throw at R to get mean times and std deviations on a moderately large no of runs. Then to work up a script that others could use so we can look for patterns across a variety of hardware and OS versions
The more I think about it now though, all I really need is a csv of the times for each test. The file size will be constant for each run and I’m not certain now that there is any value in capturing the index of the test run. I just want the times for say 100 puts of n kb of random data. Perhaps @joshuef or the other devs can cofirm that this would be useful. ANd if not, what would be most useful. Naturally each test run would have a header with hardware specs, probably we should capture OS version and actual safe executable info as well.
TBH I would have done better to think this through a little harder before diving in. This was just a simple batch file that got out of hand. Nobody ever employed me as a test engineer…
Hard to say what’s most useful to be honest. Seeing the limitations is one very useful thing.
Right now, eg, I was basing benchmarks off the the tests you were working on. So uploading x kb y times. Having this data is good for reference when things are going well, and debugging if we’re seeing something off. So collating that is good, I think.
The spread of data in vaults is an interesting one. At least I’m not sure why that would be happening unevenly. So if you can get data of filesize vs vault spread that’d be good to chew on.
Otherwise, I’d say just keep pushing things. Keep what you can in case you come across something interesting. Otherwise we have the start of some benchmarking regression checks if nothing else! (Which is in itself quite valuable )