String concatenation in BaCon

bigbass
God

.

Posts: 1,978

String concatenation in BaCon Mar 28, 2023 23:09:34 GMT 1

Quote

Post by bigbass on Mar 28, 2023 23:09:34 GMT 1

Hello Peter
The way I see it is 'growth' because
of the testing bacon has improved a lot in
the area of string concating
so the end result is things get much better

I see that as a win win
bacon wins in speed and in development

I Would like to
Continue with some other commands

One that comes to mind is REPLACE$

thats a command
I use a lot

Maybe some type of test for it with different languages?

Things can only get better

Joe

Last Edit: Mar 29, 2023 1:43:43 GMT 1 by bigbass

Pjot
Administrator

Posts: 2,833

String concatenation in BaCon Mar 30, 2023 6:07:35 GMT 1

Quote

Post by Pjot on Mar 30, 2023 6:07:35 GMT 1

Hi Joe,

In release 4.6.1 already some performance improvements for REPLACE$ were applied, it should already be better than before.

But I think the core string delimitation function can get faster. It currently checks each and every single byte in a byte sequence, to determine where the string is to be split. This happens over and over again, for a full string, taking a lot of performance.

The BaCon core delimiting function is being used by SPLIT, TOKEN$, APPEND$ and many others, and these occur in many places and many programs. If the core engine can be made faster it sure would be beneficial.

BR
Peter

Last Edit: Mar 30, 2023 6:15:05 GMT 1 by Pjot

bigbass
God

.

Posts: 1,978

String concatenation in BaCon Mar 30, 2023 22:37:08 GMT 1

Quote

Post by bigbass on Mar 30, 2023 22:37:08 GMT 1

Hello Peter

I have a suggestion

You wrote some code a long time ago I think
it was called bin dump to hex
And the part that was very interesting was you can open
very large files and it gets read into a buffer a little at a time

found it www.basic-converter.org/bindump.bac.html

Reason I said that is
we can have
An open large file option that
Does not load the
Whole file into memory (plain text)

But does it in
Chunks at a time
In the same idea
Bindump was done

The RPI3 is not good at opening
Very large files

Just a thought
Because Many web pages get crawled and are so large they are not possible
To search and
That could be a great work around to the problem
commoncrawl.org/big-picture/

commoncrawl.org/the-data/get-started/

reading that again I mean something built into bacon
that we can have an option to load large files this way

Joe

Last Edit: Mar 31, 2023 1:17:17 GMT 1 by bigbass

bigbass
God

.

Posts: 1,978

String concatenation in BaCon Mar 31, 2023 4:34:21 GMT 1

Quote

Post by bigbass on Mar 31, 2023 4:34:21 GMT 1

Hello Peter

I am very sure you have more than enough work on your hands
with improving the internal core string delimitation function
and if that is where speed can be gained

it will benefit everyone
I think I can modify your bindump code without a need to add something to bacon
like a large file option

if you do come up with some type of code we all can test as a stand alone
function I would be happy to test it but it sounds more of
a bacon internal function that you would need to know how it all fits together

thanks
Joe

bigbass
God

.

Posts: 1,978

String concatenation in BaCon Apr 3, 2023 16:55:21 GMT 1

Quote

Post by bigbass on Apr 3, 2023 16:55:21 GMT 1

Hello Peter and Alex

I simplified the pure c code (removed malloc)
*starting the test we didn't have benchmark for C

and improved the c++ run time
the logic is optimized faster with

a += "benchmark";
(b += "benchmark") +b ;

the C++ code is much faster than C
when using this example

note rust from debian is old and very slow
get the latest version and installer

curl https://sh.rustup.rs -sSf | sh

after using the new rust it is much faster than C or C++
Please note that one test is not the definitive proof
but it is some type of data we can use if the code demos are the same
and used on the same machine with no added compiler options used

cport.c

#include <stdio.h>
#include <string.h>

#define BUFFER_SIZE 120000

int main() {
    char a[BUFFER_SIZE];
    strcpy(a, "Hello");
    for (int index1 = 0; index1 < 10000; index1++) {
        strcat(a, "benchmark");
    }
    printf("length a: %d\n", strlen(a));

    char b[BUFFER_SIZE];
    strcpy(b, "Hello");
    for (int index2 = 0; index2 < 10000; index2++) {
        char temp[BUFFER_SIZE];
        strcpy(temp, "benchmark");
        strcat(temp, b);
        strcpy(b, temp);
    }
    printf("length b: %d\n", strlen(b));

    return 0;
}

c++port.cxx


#include <iostream> 
#include <cstring>
#include <cstdlib>
#include <string>

int main() 
{ 
    std::string a = "Hello"; 
    for (int index1 = 0; index1 < 10000; index1++) 
        a += "benchmark"; 
    std::cout << "length a: " << a.length() << "\n"; 
  
    std::string b = "Hello"; 
    for (int index2 = 0; index2 < 10000; index2++) 
        (b += "benchmark") +b ; 
    std::cout << "length b: " << b.length() << "\n"; 
  
    return 0; 
}

being too serious has bad secondary effects on health

Last Edit: Apr 3, 2023 17:32:28 GMT 1 by bigbass

Pjot
Administrator

Posts: 2,833

String concatenation in BaCon Apr 4, 2023 5:55:19 GMT 1

Quote

Post by Pjot on Apr 4, 2023 5:55:19 GMT 1

Thanks Joe,

As with many other languages, also your examples show the problem with the "a$ = b$ + a$" type of concatenation, they are very slow. It's a bit weird that even popular compilers show this kind of behavior, and can be beaten by an interpreter like NodeJS.

Nice Dilbert BTW

Best regards
Peter

bigbass
God

.

Posts: 1,978

String concatenation in BaCon Apr 4, 2023 16:31:45 GMT 1

Quote

Post by bigbass on Apr 4, 2023 16:31:45 GMT 1

Hello Peter

I stepped back and looked at this in pure math first
the compiler does math very fast

a = b + a

we flip the equal sign to the right
b + a = a

we solve for b

b + a -a = a + -a
b = 0

a = 1

----------------------------

it on the surface it looks very easy
but it is not in any way

I know that you want to add strings or better said concat them
and this is where the big difference is in
processing strings with a compiler compared to an interpreter

someone needs to write a lot of code to concatenate multiple strings!

just for the sake of having more information on this
even though it is in c# you need to process the idea the same
just using a different language which ever you prefer to use
they use the StringBuilder function for speed and do all the dirty work for you

learn.microsoft.com/en-us/dotnet/csharp/how-to/concatenate-multiple-strings

I do think studying a problem is the only way to solve it
and the more info we have the better

all your work on bacon keeps making it better and faster!
thanks!

UPDATED lets have a go at in C!

nachtimwald.com/2017/02/26/efficient-c-string-builder/

www.joelonsoftware.com/2001/12/11/back-to-basics/

Joe

Last Edit: Apr 4, 2023 17:49:21 GMT 1 by bigbass

bigbass
God

.

Posts: 1,978

String concatenation in BaCon Apr 5, 2023 18:01:22 GMT 1

Quote

Post by bigbass on Apr 5, 2023 18:01:22 GMT 1

hello Peter

since you got the bacon time down to the fastest

I'm sure you will keep at it with the core parser
until you get the numbers you want

I will however work on the c++ code which can be used for fltk an Qt
and any other c++ based code

the last example I did improve the c++ code

time ./c++port
length a: 90005
length b: 90005

real 0m0.648s
user 0m0.643s
sys 0m0.000s

--------------------------

but I still wasn't happy with it
and used std::basic_ostringstream

yes it is the fastest (for a stand alone c++ example)
and not a lot of code by the way to get it done
later make it more bacon friendly this is just the first step

iostream.cxx

time ./iostream

length a: 90005
length b: 90009

real 0m0.329s
user 0m0.327s
sys 0m0.002s



#include <iostream>
#include <sstream>

int main()
{
    std::basic_ostringstream<char> a;
    a << "Hello";
    for (int index1 = 0; index1 < 10000; index1++)
    {
        a << "benchmark";
    }
    std::cout << "length a: " << a.str().length() << std::endl;

    std::basic_ostringstream<char> b;
    b << "Hello";
    for (int index2 = 0; index2 < 10000; index2++)
    {
        b.seekp(0, std::ios_base::beg);
        b << "benchmark" << b.str();
    }
    std::cout << "length b: " << b.str().length() << std::endl;

    return 0;
}

Last Edit: Apr 5, 2023 19:09:36 GMT 1 by bigbass

Pjot
Administrator

Posts: 2,833

String concatenation in BaCon Apr 6, 2023 16:42:19 GMT 1

Quote

Post by Pjot on Apr 6, 2023 16:42:19 GMT 1

Hi Joe,

That's interesting, but indeed it still is pretty slow, also compared to other languages.

We can generate C++ code with BaCon as well, for example for the benchmark program:

# bacon -c g++ -o -Wno-write-strings -o -Wno-pointer-arith bench3

The resulting binary also is very fast, in fact, a lot faster than the default C++ Stringstream implementation.

So it is possible to create C++ code for fast concatenation, but why was this not done for the default C++ strings? Most likely because then it is necessary to deviate from the standard

BR
Peter

alexfish
God

A world without windows

Posts: 3,060

String concatenation in BaCon Apr 6, 2023 19:19:20 GMT 1

Quote

Post by alexfish on Apr 6, 2023 19:19:20 GMT 1

Hi All

testing string vec * numeric

#include <numeric>
#include <string>
#include <iostream>
#include <vector>
int main () {
    std::string str = "Hello";
    std::vector<std::string> vec(10000,str);
    std::string a = std::accumulate(vec.begin(), vec.end(), std::string(""));
    std::cout << a << std::endl;
}

Best time with print on rpi4 1.8Ghz 8gig mem


real	0m0.124s
user	0m0.085s
sys	0m0.020s

BR
Alex

Attachments:

Two tins cans are better than an Iphone

bigbass
God

.

Posts: 1,978

String concatenation in BaCon Apr 7, 2023 15:25:32 GMT 1

Quote

Post by bigbass on Apr 7, 2023 15:25:32 GMT 1

Hello Peter and Alex

a lot of testing to get it smaller easier and faster
(than all my previous examples)

for a c++ standalone benchmark demo

using c++ also keeps the binary tiny
for example compared to rust

Joe

Note on a RPI 3 -- 32 bit OS

time ./speed++
length a: 90005
length b: 90005

real 0m0.203s
user 0m0.202s
sys 0m0.001s

#include <iostream>
#include <string>

int main()
{
    std::string a = "Hello";
    for (int index1 = 0; index1 < 10000; index1++)
    {
        a += "benchmark"; 
    }
    std::cout << "length a: " << a.length() << std::endl;
    std::string b = "Hello";
    for (int index2 = 0; index2 < 10000; index2++)
    {
        b = "benchmark" + b;
    }
    std::cout << "length b: " << b.length() << std::endl;

    return 0;
}

Last Edit: Apr 7, 2023 15:27:03 GMT 1 by bigbass

bigbass
God

.

Posts: 1,978

String concatenation in BaCon Apr 8, 2023 4:43:34 GMT 1

Quote

Post by bigbass on Apr 8, 2023 4:43:34 GMT 1

Hello guys

I thought about trying another language
the syntax for many examples seems very easy
if you look at the rosettacode.org/wiki/Julia

Julia for another benchmark (why not)

there are different ways to run julia but this is the way
I prefer to run scripts

first time using Julia but what I noticed
* is for concat in bacon we use &
and indexes start with 1 not zero
the : is TO in bacon

special tips : had to add a global prefix for the code to run cleanly

#!/usr/bin/julia

a = "Hello"
for index1 in 1:10000
 global   a = a * "benchmark"
end
println("length a:", length(a))

 b = "Hello"
for index2 in 1:10000
 global   b = "benchmark" * b
end
println("length b:", length(b))

time ./julia4.jl
length a:90005
length b:90005

real 0m2.328s
user 0m2.908s
sys 0m0.222s

Last Edit: Apr 8, 2023 4:51:03 GMT 1 by bigbass

bigbass
God

.

Posts: 1,978

String concatenation in BaCon Apr 15, 2023 16:06:19 GMT 1

Quote

Post by bigbass on Apr 15, 2023 16:06:19 GMT 1

Hello everyone

having some fun testing other language speeds
and seeing what the syntax looks like for the benchmark example demo

why not add one more to the list
Nim is the most interesting of all that I tried (outside of BaCon of course)
for many reasons (you can look it up yourself if interested)

after porting a lot code from C or C++ Nim seems easier
on the eyes for a simple syntax

just the demo here and speed test
and a little help with geany to compile

time ./nim4
length a:90005
length b:90005

real 0m1.537s
user 0m1.465s
sys 0m0.031s

faster than Julia (Julia is slow on the RPI3 not enough RAM memory)

nim4.nim

var a = "Hello"
for index1 in 0..<10000:
    a = a & "benchmark"
echo "length a:", a.len

var b = "Hello"
for index2 in 0..<10000:
    b = "benchmark" & b
echo "length b:", b.len

more details if interested
nim-lang.org/docs/backends.html

some quick tips for compiling in geany
geany has in the menu Build and a sub menu Set Build commands
place one these options on line 1 of the Command entry

*the -r in the command runs the code after it compiles it is optional
*-d:release gives you a smaller binary it is optional

Compile a C binary
nim c -d:release -r %e

Compile a C++ binary
nim cpp -d:release -r %e

Compile to javascript to be used with node
nim js -d:nodejs -r %e

Compile an object c binary
nim objc -d:release -r %e

Last Edit: Apr 15, 2023 16:28:52 GMT 1 by bigbass

Pjot Administrator Posts: 2,833	String concatenation in BaCon Apr 15, 2023 19:12:41 GMT 1 Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by Pjot on Apr 15, 2023 19:12:41 GMT 1 Hi Joe, Interesting, I've heard of Nim but never used it. From your example, its syntax doesn't look too difficult. Fortunately, it cannot beat the speed of BaCon Best regards Peter

bigbass
God

.

Posts: 1,978

String concatenation in BaCon Apr 16, 2023 15:10:41 GMT 1

Quote

Post by bigbass on Apr 16, 2023 15:10:41 GMT 1

Hello Peter

Fortunately, it cannot beat the speed of BaCon

yes ,I agree you did an excellent job at boosting BaCon's speed
for the string concat and several other areas recently

later will see if nim plays nicely with BaCon
or not and if there are any pros or cons with the mix
*trying to keep up with some of the new stuff because
still haven't figured out the old stuff yet

Joe

Post by bigbass on Mar 28, 2023 23:09:34 GMT 1

Post by Pjot on Mar 30, 2023 6:07:35 GMT 1

Post by bigbass on Mar 30, 2023 22:37:08 GMT 1

Post by bigbass on Mar 31, 2023 4:34:21 GMT 1

Post by bigbass on Apr 3, 2023 16:55:21 GMT 1

Post by Pjot on Apr 4, 2023 5:55:19 GMT 1

Post by bigbass on Apr 4, 2023 16:31:45 GMT 1

Post by bigbass on Apr 5, 2023 18:01:22 GMT 1

Post by Pjot on Apr 6, 2023 16:42:19 GMT 1

Post by alexfish on Apr 6, 2023 19:19:20 GMT 1

Post by bigbass on Apr 7, 2023 15:25:32 GMT 1

Post by bigbass on Apr 8, 2023 4:43:34 GMT 1

Post by bigbass on Apr 15, 2023 16:06:19 GMT 1

Post by Pjot on Apr 15, 2023 19:12:41 GMT 1

Post by bigbass on Apr 16, 2023 15:10:41 GMT 1