BaCon performance considerations
Dec 15, 2019 20:12:40 GMT 1
Post by Pjot on Dec 15, 2019 20:12:40 GMT 1
All
Lately I have been busy improving the performance of BaCon. After some tweaks and optimizations, I decided to look at memory allocations.
BaCon always uses the malloc/calloc/realloc/free set of allocations, requesting the kernel for memory from the so-called "heap". Not surprisingly, the default allocation functions from libc seem to fit a general-purpose goal, and therefore, are not optimized, and slow.
Implementing a propriety based allocator is an art in itself, the internet is full of academic papers describing the "fastest" memory allocators.
Fortunately, in the world of Linux we can use the memory allocator of our choice for free. Below a list of allocators which I have tried by adding a simple link option, slowest first, fastest last.
1. The HOARD memory allocator
In my Linux Mageia system, I had to install libhoard from the repo as follows:
After this, I could build BaCon simply by referring to '-lhoard' which will override the default malloc/calloc/realloc/free functions from libc:
Subsequent runs of the newly compiled bacon binary do show a performance improvement, but not very significant (approx 3%). Besides, now a banner will pop-up displaying a text "Using the Hoard memory allocator" which is kind of annoying.
2. The jemalloc memory allocator
First I installed jemalloc from the repo:
In Mint or Ubuntu the package simply is called 'jemalloc'.
To build BaCon using jemalloc, we can do the following:
When I now use the newly compiled binary to create BaCon, my speed increase is slightly better, around 6-8%. And no annoying banners.
3. The tcmalloc allocator from Google Perf-Tools
In my Mageia system I had to install tcmalloc as follows:
In Mint or Ubuntu the package has to be installed with "sudo apt-get install google-perftools" or something similar.
To build BaCon using tcmalloc:
The performance improves up to approx 12%-14% which is pretty good.
4. The mimalloc memory allocator from Microsoft
Yes, that's right - Microsoft. To be fair, the mimalloc allocator is an impressive allocator with a really good performance. It was not available in my Mageia repo though (nor in Mint or Ubuntu repos), so I had to build the library from source code. This, however, went perfectly fine.
To build BaCon using mimalloc:
The performance gain is on average 15% or slightly more when compiling the bacon binary in multiple runs. Really impressive!
Conclusion
It is possible to gain a significant performance improvement in BaCon for free, using readily available libraries from our Linux repositories. It is worthwhile checking this out when every clock cycle is needed, for example in case of high performance programs.
Truth be said, the mimalloc library from Microsoft outperformed the others in this simple test. The only disadvantage however is its non-availability in the default Linux repositories.
However, if manual compilation of mimalloc fails, then falling back to tcmalloc is a good alternative, as its performance gets very close to mimalloc.
BR
Peter
Lately I have been busy improving the performance of BaCon. After some tweaks and optimizations, I decided to look at memory allocations.
BaCon always uses the malloc/calloc/realloc/free set of allocations, requesting the kernel for memory from the so-called "heap". Not surprisingly, the default allocation functions from libc seem to fit a general-purpose goal, and therefore, are not optimized, and slow.
Implementing a propriety based allocator is an art in itself, the internet is full of academic papers describing the "fastest" memory allocators.
Fortunately, in the world of Linux we can use the memory allocator of our choice for free. Below a list of allocators which I have tried by adding a simple link option, slowest first, fastest last.
1. The HOARD memory allocator
In my Linux Mageia system, I had to install libhoard from the repo as follows:
# dnf install lib64hoard0
After this, I could build BaCon simply by referring to '-lhoard' which will override the default malloc/calloc/realloc/free functions from libc:
# bacon -a -o -O2 -lhoard bacon
Subsequent runs of the newly compiled bacon binary do show a performance improvement, but not very significant (approx 3%). Besides, now a banner will pop-up displaying a text "Using the Hoard memory allocator" which is kind of annoying.
2. The jemalloc memory allocator
First I installed jemalloc from the repo:
# dnf install lib64jemalloc2
In Mint or Ubuntu the package simply is called 'jemalloc'.
To build BaCon using jemalloc, we can do the following:
# bacon -a -o -O2 -ljemalloc bacon
When I now use the newly compiled binary to create BaCon, my speed increase is slightly better, around 6-8%. And no annoying banners.
3. The tcmalloc allocator from Google Perf-Tools
In my Mageia system I had to install tcmalloc as follows:
# dnf install lib64tcmalloc4
In Mint or Ubuntu the package has to be installed with "sudo apt-get install google-perftools" or something similar.
To build BaCon using tcmalloc:
# bacon -a -o -O2 -ltcmalloc bacon
The performance improves up to approx 12%-14% which is pretty good.
4. The mimalloc memory allocator from Microsoft
Yes, that's right - Microsoft. To be fair, the mimalloc allocator is an impressive allocator with a really good performance. It was not available in my Mageia repo though (nor in Mint or Ubuntu repos), so I had to build the library from source code. This, however, went perfectly fine.
To build BaCon using mimalloc:
bacon -a -o -O2 -lmimalloc bacon
The performance gain is on average 15% or slightly more when compiling the bacon binary in multiple runs. Really impressive!
Conclusion
It is possible to gain a significant performance improvement in BaCon for free, using readily available libraries from our Linux repositories. It is worthwhile checking this out when every clock cycle is needed, for example in case of high performance programs.
Truth be said, the mimalloc library from Microsoft outperformed the others in this simple test. The only disadvantage however is its non-availability in the default Linux repositories.
However, if manual compilation of mimalloc fails, then falling back to tcmalloc is a good alternative, as its performance gets very close to mimalloc.
BR
Peter