|
Post by Pjot on Mar 17, 2016 19:54:47 GMT 1
Thanks Alex, There was an issue with local strings in functions, for which I have uploaded a new (fixed) beta. As usual, please let me know if you find any issues. Also I have updated the string optimization story once more. So I guess this concludes my work on the string improvements. And I'm happy about it, because too much has been going on without me having time for it! Best regards, Peter
|
|
|
Post by vovchik on Mar 17, 2016 20:27:10 GMT 1
Dear Peter, Thanks very much. String processing is now EXTREMELY fast. I recommend all our devs and testers to try this beta out. You won't regret it. With kind regards, vovchik
|
|
|
Post by alexfish on Mar 18, 2016 1:15:17 GMT 1
Hi Peter
Can see what the fix meant..
looking good up to now , also tested Vochik's svg demo's
BR Alex
|
|
|
Post by Pjot on Jul 13, 2017 8:17:46 GMT 1
All, Thanks to forum member btiffin I was pointed to the Open Multi Processing library (OpenMP). This library is part of GCC since version 4.2 and allows parallel processing of parts of code. It seemed a good idea to use its possibilities to see how well the original program in this thread performs when using OpenMP. This is the original program in this thread: FOR i = 1 TO 200000 a$ = a$ & "@" NEXT
PRINT LEN(a$)
PRINT "Serial processing time: ", TIMER, " msecs."
Output: Not bad at all, and as mentioned earlier, BaCon now ranks 2nd position with its portable C implementation (after FreeBASIC which uses non-portable assembly). Nevertheless, the performance can be boosted by using OpenMP. To use OpenMP, we must explain to the C compiler which parts of the code should be processed in parallel. We need to place the '#pragma' compiler directive in BaCon code, and we need to indicate which parts of code are grouped together. The BaCon PRAGMA keyword was improved so it allows regular C compiler directives. Also, in the latest beta the construct DO/DONE now allows to group statements into one body. These were small improvements to allow OpenMP to work within BaCon, but the gain is spectacular. Please update to the latest BaCon 3.6 beta to verify the results below. In OpenMP, there are several ways to explain parallel programming to the C compiler, but in our demonstration program above, we are going to split up the task of appending a single character 200000 times into four different threads. That means we're putting 4 times a concatenation of 50000 characters into separate sections: PRAGMA OPTIONS -fopenmp PRAGMA LDFLAGS -lgomp
PRAGMA omp parallel sections private(i) :' Start parallel processing for the next body of code. Variable "i" is local to each thread. DO PRAGMA omp section :' Section which concatenates 50,000 times one character FOR i = 1 TO 50000 PRAGMA omp critical :' Strings in BaCon are not thread safe, therefore mark concatenation as critical b1$ = b1$ & "@" :' The actual concatenation NEXT
PRAGMA omp section :' Another section which concatenates 50,000 times one character FOR i = 1 TO 50000 PRAGMA omp critical b2$ = b2$ & "@" NEXT
PRAGMA omp section :' Another section which concatenates 50,000 times one character FOR i = 1 TO 50000 PRAGMA omp critical b3$ = b3$ & "@" NEXT
PRAGMA omp section :' Another section which concatenates 50,000 times one character FOR i = 1 TO 50000 PRAGMA omp critical b4$ = b4$ & "@" NEXT DONE
PRINT LEN(b1$ & b2$ & b3$ & b4$) :' Check; show total length
PRINT "Parallel processing time: ", TIMER, " msecs."
The program was modeled after this example. Without going into too much detail about OpenMP, it is important to realize that string processing in BaCon is not thread safe. The string functions use generic temporary buffers to store intermediate results. These buffers are used by the whole program, even when this program is split up into a team of threads. The key here is to specify PRAGMA omp critical just before the string function, to avoid memory clashes and race conditions. Also note that we must specify the compiler flag '-fopenmp' and a library called '-lgomp'. Let's compile and run: That's right, we have achieved a performance improvement which is just 25% of the original performance! This is expected, as the concatenation task was split into 4 different threads. Note that most modern C compilers have implemented the OpenMP specification, which is a generic and open API and therefore portable. More info about OpenMP is here and here. Regards Peter EDIT: simplified OpenMP code by merging omp parallel and sections.
|
|
|
Post by vovchik on Jul 13, 2017 9:43:33 GMT 1
Dear Peter, Remarkable. On my old desktop, the omp version completed the test in 19% of the time of the original (your test). Something to remember and use. With thanks and kind regards, vovchik
|
|
|
Post by Pjot on Jul 14, 2017 21:49:45 GMT 1
Thanks vovchik! I was able to tidy up the code a bit (see above) and even shorten it down. When we use "sections" each section must be specified separately. So in case we want to use 4 threads, each described in a section, then all threads need to be explicitly described. In the above program it leads to repetition of the same code. Also, what if our system allows even more than 4 threads, say, 16 threads? Would it not be nice to let our program handle the amount of parallel threads by itself? After some OpenMP studying, I found a way to do this: PRAGMA OPTIONS -fopenmp :' Compiler flag to enable the PRAGMA omp directives in our code PRAGMA LDFLAGS -lgomp :' Link with Gnu OpenMP (gomp) PRAGMA INCLUDE <omp.h> :' Header file to allow OpenMP library functions in our code
PRINT "Using OpenMP version: ", _OPENMP :' Show current version of OpenMP
Num_T = omp_get_max_threads() :' Get the maximum amount of threads on this system
DECLARE b$ ARRAY Num_T :' Declare an array dynamically
PRAGMA omp parallel for private(i,tid) :' Setup a parallel area for a FOR loop. Variables "i" and "tid" are private. FOR i = 1 TO 200000 :' The original number to append. OpenMP will divide the total load for us. tid = omp_get_thread_num() :' Get our current threadID. PRAGMA omp critical :' No memory clash when performing string operations. b$[tid] = b$[tid] & "@" :' The actual concatenation. NEXT
JOIN b$ BY "" TO result$ SIZE Num_T :' Joining the array elements into 1 variable.
PRINT LEN(result$) :' Check; show total length
PRINT "Time: ", TIMER, " msecs."
The idea is to determine the maximum amount of threads on the current system. This maximum is used to create an array, of which each element is used for string concatenation in each individual thread. This way, we can dynamically determine the best amount of threads for our current system, while avoiding redundant code. The nice thing is that OpenMP will split the FOR-loop for us! This parallel programming stuff, now compatible with BaCon code, is truly very nifty Regards Peter
|
|