|
Post by Pjot on Feb 27, 2023 20:03:52 GMT 1
All, In BaCon, incremental string assignments of the type "a$ = a$ + b$" can be executed in just a few milliseconds, no matter if the program repeated this a million times. But the performance of the type "a$ = b$ + a$" remained a bit slow. However, with the latest improvements in commit cc583a95b7 this has changed. a$ = "hello"
FOR x = 1 TO 100000 a$ = a$ & "benchmark" NEXT PRINT "Length: ", LEN(a$)
b$ = "hello"
FOR x = 1 TO 100000 b$ = "benchmark" & b$ NEXT PRINT "Length: ", LEN(b$)
This simple benchmark program now completes in only 0.58 seconds on my machine. For fun, I decided to take a look at other languages and their performance on these concatenations. So I made similar programs for the other languages and ran them all, to see how well BaCon did. Interestingly, almost all languages perform (reasonably) well in case of "a$ = a$ + b$" but show a dramatic slow performance in case of the "a$ = b$ + a$" type. Particularly GNU Awk and the FreePascal compiler were interesting to me, as these are known for their extremely good string performance. I had to skip BASH, because it never seemed to complete the test script. Below the results on my laptop, the number indicating the amount of seconds (lower = better): BaCon: 0.58 Perl: 0.60 FreePASCAL: 1.21 FreeBASIC: 6.79 Python: 7.35 AWK: 7.36 newLisp: 7.69 TCL: 10.70 Kornshell: 16.73 ZSH: 96.66
Amazingly, together with Perl, BaCon now ranks 1st place in the list The benchmark programs are attached to this posting, in case anybody wants to repeat this on his system. Feel free to share your results. Regards, Peter Attachments:benchmark.tar.gz (1.68 KB)
|
|
|
Post by bigbass on Feb 28, 2023 16:33:31 GMT 1
Hello Peter
any and all improvements for speed is always very welcomed and using your demo from fossil it reduced 2 minutes off the RPI time before code changes 2 minutes 30 seconds after your code changes 30 seconds!
please note that the rpi suffers with any benchmarks but a notable improvement with your changes
*and thanks for the benchmarks for other languages very useful for testing ideas
Joe
note that manjaro needs time installed pacman -S time
UPDATED with code output [joe@joe-pc benchmark]$ bacon bench3 Converting 'bench3.bac'... done, 23 lines were processed in 0.010 seconds. Creating lexical analyzer using flex... done. Compiling 'bench3.bac'... cc -c bench3.bac.c cc -o bench3 bench3.bac.o -lm Done, program 'bench3' ready. [joe@joe-pc benchmark]$ time ./bench3 Length: 900005 Length: 900005
real 0m36.784s user 0m36.357s sys 0m0.174s
[joe@joe-pc benchmark]$ bacon -v
BaCon version 4.6.1 on Linux aarch64 - (c) Peter van Eerten - MIT License.
|
|
|
Post by Pjot on Feb 28, 2023 20:49:56 GMT 1
Thanks Joe! I added a PHP program as well, and ran everything again (tar package ^^^ updated). Here my latest output. BaCon and Perl are very close, but the other languages are significantly slower. BaCon: 0.59 Perl: 0.60 FreePASCAL: 1.25 FreeBASIC: 7.09 AWK: 7.61 Python: 7.82 newLisp: 8.18 TCL: 11.03 PHP: 11.65 Kornshell: 18.12 ZSH: 99.57
Clearly, benchmarking is not an exact science of course. And this benchmark is not literally about "a$ = a$ + b$" vs "a$ = b$ + a$". It is about how various string arguments are placed in an expression, and the test just looks at the most basic expression types. It's just interesting to see the difference in performances. I find it very curious that almost all other languages have issues with the second type "a$ = b$ + a$"
Regards Peter
|
|
|
Post by bigbass on Mar 1, 2023 2:46:32 GMT 1
Hello Peter
thanks for the update!
I thought to add one more to your list had to rework the logic a little for the language hope I got the idea right that you were using for the other demos if not please feel free to modify it
bench3.js
//sudo pacman -S nodejs
// You can also use +=, where a += b is a shorthand for a = a + b. // added javascript to the benchmark testing by bigbass // javascript wont allow b = "benchmark" + b because b is defined immutable // so I worked around it with z=b which was the best I could do
let a = "Hello";
for (let index=0;index<100000;index++) { a += "benchmark"; }
let b = "Hello"; let z = b; for (let index=0;index<100000;index++) { b += "benchmark" + z; } console.log("length a",a.length); console.log("length b",b.length);
* install nodejs with your package manager
time node ./bench3.js
length a 900005 length b 1400005
real 0m1.587s user 0m1.170s sys 0m0.195s
not bad for the rpi3
Joe
|
|
|
Post by bigbass on Mar 1, 2023 18:05:07 GMT 1
Hello Peter
I added to your run_benchmark.bash with the new bench3.js listed above
echo "javascript..." /usr/bin/time --output ${LOGFILE} -a -f "javascript: %e" node ./bench3.js
RESULTS are in seconds using a raspberry pi3 with the latest bacon as of today lower numbers means less time to run the string concat test of 100,000 loops
========================= javascript: 3.41 Perl: 27.21 BaCon: 27.26 PHP: 69.55 Python: 124.72 AWK: 125.02 TCL: 178.93 Kornshell: 283.16 ZSH: 1604.99
|
|
|
Post by Pjot on Mar 1, 2023 18:17:52 GMT 1
Hi Joe, Your script did not output the correct result for the length, so I changed it a little bit: let a = "Hello"; for (let index = 0; index<10000; index++) { a = a + "benchmark"; } console.log("length a:", a.length);
let b = "Hello"; for (let index = 0; index<10000; index++) { b = "benchmark" + b; } console.log("length b:", b.length);
My results are: Javascript: 0.06 BaCon: 0.58 Perl: 0.62 FreePASCAL: 1.20 FreeBASIC: 7.01 AWK: 7.45 Python: 7.50 newLisp: 7.70 TCL: 10.83 PHP: 11.31 Kornshell: 17.84 ZSH: 95.84
That's impressive! Javascript is about 10x faster. Together with Perl we're pushed back to the 2nd place now... BR, Peter
|
|
|
Post by bigbass on Mar 1, 2023 18:43:43 GMT 1
Hello Peter
thanks for modifying the js code it follows your format better to make it easy to compare the syntax using a = a + "benchmark"; and b = "benchmark" + b; and faster! thanks!
special note close all running apps and browser before running the tests it is based on free memory
for a quick test with your updated js some people will wonder why would you ever use js outside of a browser well it was made for parsing strings
time node ./bench3.js length a: 90005 length b: 90005
real 0m0.949s user 0m0.826s sys 0m0.113s
|
|
|
Post by Pjot on Mar 1, 2023 19:20:12 GMT 1
Hi Joe, I did another test with 'time' to see the memory usage. Result for the "bench3" binary compiled by BaCon, skipping the uninteresting lines: $ /usr/bin/time --verbose ./bench3 Command being timed: "./bench3" User time (seconds): 0.58 Percent of CPU this job got: 99% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.59 Maximum resident set size (kbytes): 3884 <------------------ Minor (reclaiming a frame) page faults: 553 .....
Now the same for NodeJS: $ /usr/bin/time --verbose node ./bench3.js Command being timed: "nodejs ./bench3.js" User time (seconds): 0.05 Percent of CPU this job got: 101% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.06 Maximum resident set size (kbytes): 47004 <------------------- Minor (reclaiming a frame) page faults: 4675 .....
It seems that NodeJS allocates a lot more memory compared to BaCon. Also the output of Valgrind seems to confirm this: BaCon: total heap usage: 40 allocs, 35 frees, 14,156,321 bytes allocated NodeJS: total heap usage: 25,447 allocs, 25,420 frees, 39,699,106 bytes allocated
Also, starting Node in Valgrind takes almost 6 seconds, which is understandable when it needs to allocate so much memory before starting the work. On the other hand, starting the binary created by BaCon within Valgrind takes less than a second. Food for thought anyway Regards Peter
|
|
|
Post by bigbass on Mar 1, 2023 22:28:26 GMT 1
Hello Peter thanks for your detailed investigation the only way to move forward is to test ideas and never get comfortable things change and new things come out all the time I find it fun to see which tool wins when you want to build something its a lot easier to test first than write a whole lot of code and have to rewrite it again I say that because it happened to me more than once if you have the extra memory at hand javascript does well bacon does very well also it is a trade off size and speed I think that all improvements and tests are a positive step and by the way I just wrote only a few things in javascript but since you were testing many languages I added it to the mix I was wondering why your edit to the js made it so fast well after some very careful looking index<10000 needs to be index<10000 0ten thousand changed to one hundred thousand all the zeros can be blurry to read anyway (without using my glasses) thanks for the feedback and the new bacon release! Joe
|
|
|
Post by Pjot on Mar 2, 2023 7:11:20 GMT 1
I was wondering why your edit to the js made it so fast well after some very careful looking index<10000 needs to be index<10000 0ten thousand changed to one hundred thousand all the zeros can be blurry to read anyway (without using my glasses) Good catch! Apparently, my glasses were elsewhere too But after changing the loop to 100000, it hardly made any difference to the resulting performance... the people from NodeJS sure did a good job! Thanks, Peter
|
|
|
Post by bigbass on Mar 11, 2023 15:55:59 GMT 1
Hello Peter I wanted to see if we could convert the javascript benchmark code to pure c code and was looking for a simple way to do it to produce the fastest translation possible for pure c code only (more bacon friendly) which led me to github.com/andrei-markeev/ts2cwhy would I want to go that road? well to see if we could remove about 20 seconds on the RPI3! it was a gamble but it paid off a big jackpot ts2c bench3.js produced bench3.c the javascript code transpiled *I used geany to compile the c code time ./bench3 Killed real 0m6.790s user 0m1.169s sys 0m1.037s don't let more code make you think slower I am just looking for speed I had to make a minor modify for the index code for it to compile cleanly in c but that was very easy to do here is the transpiled and modified javascript code if you don't want to use npm bench3.c
#include <string.h> #include <stdlib.h> #include <assert.h> #include <stdio.h> #include <limits.h> typedef short int16_t; #define ARRAY(T) struct {\ int16_t size;\ int16_t capacity;\ T *data;\ } * #define ARRAY_CREATE(array, init_capacity, init_size) {\ array = malloc(sizeof(*array)); \ array->data = malloc((init_capacity) * sizeof(*array->data)); \ assert(array->data != NULL); \ array->capacity = init_capacity; \ array->size = init_size; \ } #define ARRAY_PUSH(array, item) {\ if (array->size == array->capacity) { \ array->capacity *= 2; \ array->data = realloc(array->data, array->capacity * sizeof(*array->data)); \ assert(array->data != NULL); \ } \ array->data[array->size++] = item; \ } #define STR_INT16_T_BUFLEN ((CHAR_BIT * sizeof(int16_t) - 1) / 3 + 2) int16_t str_len(const char * str) { int16_t len = 0; int16_t i = 0; while (*str) { i = 1; if ((*str & 0xE0) == 0xC0) i=2; else if ((*str & 0xF0) == 0xE0) i=3; else if ((*str & 0xF8) == 0xF0) i=4; str += i; len += i == 4 ? 2 : 1; } return len; } void str_int16_t_cat(char *str, int16_t num) { char numstr[STR_INT16_T_BUFLEN]; sprintf(numstr, "%d", num); strcat(str, numstr); } static ARRAY(void *) gc_main; int16_t gc_i;
static const char * a; char * tmp_result = NULL; static int16_t index1; static const char * b; char * tmp_result_2 = NULL; static int16_t index2; int main(void) { ARRAY_CREATE(gc_main, 2, 0);
a = "Hello"; index1 = 0; for (;index1 < 100000;index1++) { tmp_result = malloc(strlen(a) + strlen("benchmark") + 1); assert(tmp_result != NULL); tmp_result[0] = '\0'; strcat(tmp_result, a); strcat(tmp_result, "benchmark"); ARRAY_PUSH(gc_main, tmp_result); a = tmp_result; } printf("length a:"); printf(" %d\n", str_len(a)); b = "Hello"; index2 = 0; for (;index2 < 100000;index2++) { tmp_result_2 = malloc(strlen("benchmark") + strlen(b) + 1); assert(tmp_result_2 != NULL); tmp_result_2[0] = '\0'; strcat(tmp_result_2, "benchmark"); strcat(tmp_result_2, b); ARRAY_PUSH(gc_main, tmp_result_2); b = tmp_result_2; } printf("length b:"); printf(" %d\n", str_len(b)); for (gc_i = 0; gc_i < gc_main->size; gc_i++) free(gc_main->data[gc_i]); free(gc_main->data); free(gc_main);
return 0; }
so we can use javascript to produce c code and have it run faster than expected if you could find a way to port that back into the bacon conversion it would be blazing fast Joe oh to be 100% transparent here is the js code bench3.js (modified index code) to compile cleanly
let a = "Hello"; for (let index1 = 0; index1<100000; index1++) { a = a + "benchmark"; } console.log("length a:", a.length);
let b = "Hello"; for (let index2 = 0; index2<100000; index2++) { b = "benchmark" + b; } console.log("length b:", b.length);
|
|
|
Post by bigbass on Mar 11, 2023 19:37:17 GMT 1
hello Peter so we compare everything on the same machine #using bench3.bacjoe@raspberrypi:~/Downloads/benchmark $ time ./bench3 Length: 900005 Length: 900005 real 0m26.756s user 0m26.715s sys 0m0.030s joe@raspberrypi:~/Downloads/benchmark $ cd .. #using bench3.c transpiled converted from javascriptjoe@raspberrypi:~/Downloads $ time ./bench3 Killed real 0m5.749s user 0m1.521s sys 0m1.613s about 20 seconds improvement in speed would like to see what it does on an x86_64 with more RAM I don't know how bacon handles the code in the conversion process but there may be a way to use something there ? just a thought I have no idea how to implement it with bacon.bac Joe
|
|
|
Post by Pjot on Mar 11, 2023 20:33:12 GMT 1
Thanks Joe, As a matter of fact, I am looking into this issue for the last 2 weeks, and I have found a way to implement (incremental) string concatenation in a more efficient manner.
Looking at the generated C code you have posted, it seems NodeJS works with dynamic arrays. However, my algorithm is completely different and also way faster than NodeJS.
First tests look good, I hope to finalize the updated BaCon version shortly! BR Peter
|
|
|
Post by bigbass on Mar 12, 2023 15:54:14 GMT 1
Hello Peter
great to see you have been working on it for some time I wont distract you in the process but will be happy to test the results!
by the way one main reason I use c++ and javascript for strings is Dynamic Arrays are very easy to work with compared to a lot of manual memory management code you have to do in c even after all that was said . I still think it would be a good Idea to port dynamic arrays to bacon just for fun or at least have a simple working demo will make a simple demo (maybe) will give it a try ...
Joe
|
|
|
Post by Pjot on Mar 14, 2023 17:33:36 GMT 1
great to see you have been working on it for some time I wont distract you in the process but will be happy to test the results! A small update on the string concatenation: The problem which I am facing now is that the data model of strings needs to be changed fundamentally. When this needs to be done in a structural way throughout the BaCon code, it will basically come down to rewriting BaCon as a whole
I am looking into several alternatives.
BR Peter
|
|