Notes regarding binary trees
Apr 4, 2021 8:11:53 GMT 1
Post by Pjot on Apr 4, 2021 8:11:53 GMT 1
All,
As you may have noticed from the latest commits in fossil, BaCon now has the ability to use binary trees natively. The new statements rely on the common POSIX.2001 libc API for binary trees, documented here.
An overview of the BaCon functions can be found in the updated documentation.
To see the difference in performance between regular arrays and binary trees, I made a benchmark program. The regular array has 10.000.000 elements each of which has a value set. The binary tree also has the same amount of elements with the exact same values.
Setting up the arrays delivered the following performance results:
As is clear from these numbers, creating the values in the binary tree is slower than creating the same values in the serial array.
However, the real performance gain comes from looking up values. The next results show the performance difference in looking up values from a serial array versus a binary tree:
Looking up values from a serial arrays takes more than 495 seconds (8 minutes) while the binary tree needs less than half a second for looking up the same values. The difference in performance is huge. The binary tree is 1200 times faster.
As you may have noticed from the latest commits in fossil, BaCon now has the ability to use binary trees natively. The new statements rely on the common POSIX.2001 libc API for binary trees, documented here.
An overview of the BaCon functions can be found in the updated documentation.
To see the difference in performance between regular arrays and binary trees, I made a benchmark program. The regular array has 10.000.000 elements each of which has a value set. The binary tree also has the same amount of elements with the exact same values.
Setting up the arrays delivered the following performance results:
Populated a serial array in 17 msecs.
Populated a binary tree in 4790 msecs.
As is clear from these numbers, creating the values in the binary tree is slower than creating the same values in the serial array.
However, the real performance gain comes from looking up values. The next results show the performance difference in looking up values from a serial array versus a binary tree:
Lookups serial array: 495206 msecs in 1000000 hits.
Lookups binary tree: 408 msecs in 1000000 hits.
Looking up values from a serial arrays takes more than 495 seconds (8 minutes) while the binary tree needs less than half a second for looking up the same values. The difference in performance is huge. The binary tree is 1200 times faster.
Below the benchmark program. Hope the binary trees will prove to be useful to anybody (they are to me ).
Regards
Peter
CONST Dim = 10000000
CONST Total = 1000000
DECLARE array_1[Dim] TYPE int
DECLARE array_2 TREE int
Time = TIMER
FOR x = 0 TO Dim-1
array_1[x] = x
NEXT
PRINT "Populated a serial array in ", TIMER-Time, " msecs."
Time = TIMER
FOR x = 0 TO Dim-1
TREE array_2 NODE x
NEXT
PRINT "Populated a binary tree in ", TIMER-Time, " msecs."
Time = TIMER
result = 0
FOR element = 1 TO Total
FOR x = 0 TO Dim-1
IF array_1[x] = element THEN
INCR result
BREAK
ENDIF
NEXT
NEXT
PRINT "Lookups serial array: ", TIMER-Time, " msecs in ", result, " hits."
Time = TIMER
result = 0
FOR element = 1 TO Total
IF FIND(array_2, element) THEN INCR result
NEXT
PRINT "Lookups binary tree: ", TIMER-Time, " msecs in ", result, " hits."