By default the flag is enabled whenever libunwind is found on the
system, with the exception of static build on OSX (for which we can't
install the throw hook #932 due to lack of support for --wrap in OSX
ld64 linker).
This is an attempt to fix build with STATIC=ON on OSX (#932):
[ 95%] Linking CXX executable ../../bin/bitmonerod Undefined symbols for
architecture x86_64: "___real___cxa_throw", referenced from:
___wrap___cxa_throw in libcommon.a(stack_trace.cpp.o) ld: symbol(s) not found
for architecture x86_64
This fixes build of tests with STATIC=ON, which failed with:
/tmp/cc8lNtqY.ltrans12.ltrans.o: In function
`boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::thread_resource_error>
>::rethrow() const [clone .lto_priv.41]':
cc8lNtqY.ltrans12.o:(.text+0x4e): undefined reference to `__wrap___cxa_throw'
The hook is implemented in libcommon, which is not linked into some of the test
binaries. An alternative solution is to link all tests against libcommon,
but that seems worse because it introduces a false dependency (also,
I tried that and for some of the test binaries the linker still failed to
pick up the symol from libcommon, strangely.)
The tests currently issue a warning that
"warning: -fassociative-math disabled; other options take precedence"
The associative math optimization is turned on indirectly by -Ofast.
Apparently, the optimization is forced to be disabled, while compiling
test harnesses generated by Google Test framework.
Unfortunately, there is no -Wno-error=* flag to disable this warning
(see gcc --help=warnings).
An alternative to this patch is to disable the optimization explicitly
with -fno-associative-math, but that seems worse.
Another alternative is to not pass -Ofast for tests build, but we
want the tests to be built with exact same optimization flags as
the code being tested, otherwise the value of the tests is diminished.
Another alternative is to remove -Werror from the entire build, but
it's good to include that flag to preclude people leaving warnings.
A note regarding implementation of not passing -Werror for tests:
I considered filtering out -Werror from CMAKE_{C,CXX}_FLAGS but
that seems to be worse because it's surprizing behavior, to those
reading the code that adds -Werror. It is better to add it for
when it is used and not added otherwise. I also considered relying
on order, adding -Werror after inluding 'tests' subdir, but before
including the other subdirs, but that also seems cryptic to the
reader. So, I settled with the current solution, of explicitly
setting CMAKE_{C,CXX}_FLAGS to different values before including the
respective subdir.
Testing done: compared compiler invocation for non-tests source files
using `make VERBOSE=1` with and without this commit: the only difference
is the position of -Werror. So, this commit doesn't change the binary.
f07f120 cmake: don't try to link with atomic on Apple (redfish)
19349d7 cmake: ARM: clang: make warning non-fatal: inline asm (redfish)
f3e09f3 cmake: link with -latomic for clang (redfish)
f4b35ae cmake: include -ldl via cmake built-in var (redfish)
fa85cd8 common: stack trace: make clang happy with func ptrs (redfish)
4dce26b cmake: do not pass -stdlib=c++ to clang >=3.7 (redfish)
otherwise clang build fails with
../cryptonote_core/libcryptonote_core.a(miner.cpp.o): In function
`std::__atomic_base<unsigned long long>::load(std::memory_order) const':
/usr/bin/../lib/gcc/i686-pc-linux-gnu/6.1.1/../../../../include/c++/6.1.1/bits/atomic_base.h:396:
undefined reference to `__atomic_load_8'
This has no effect on the gcc build.
The one strange thing is that test code like
std::atomic<int> x;
int main() { return x; }
compiles and links without errors with clang, without -latomic. This
alone would suggest that this patch is unnecessary, but that is not the
case. It's not clear exactly why, though. The bitmonero code is
including the same header, but it must be doing something more complex
than in this test code snippet that causes the failure at link time
pasted above. In any case, passing -latomic fixes the problem and
seems safe.
.
This does two things:
1. fixes clang build, which otherwise errors with undefined symbol
'dlsym'.
2. simplifies the cmake script, delegating to cmake to figure
out platform-specific flags for linking against the dl library.
Tested on Linux (Arch) with clang 3.7 and 3.8 i686 and ARM:
if -stdlib=c++ is passed to clang, then the build errors
out with <string>,<iostrea>,etc. headers not found. Simply
not passing the arg fixes the problem.
**NOTE**: not tested on OSX.
Shorten the list of warnings that are reported, but
which are forced to NOT generate an error, via -Wno-error.
Unwhitelist these: strict-aliasing, sign-compare, type-limits
For example, ignoring strict-aliasing warning caused
lots of wasted time diagnosing Issue #847.
We need ARCH, because it needs to be set for ARM7, ARM6 to be
initialized.
Strangely, on different machines (both ARMv7, Arch), ${ARCH}
var is either empty or 'native'. Handle both cases.
The former was a faulty "fix" for gmtime_r not existing on Windows. The latter is needed only for dynamic builds, and is not included with msys2, which ends up fine because Windows is only built static at this time.
DL is empty and unused elsewhere.
The intention at one point may have been to use CMAKE_DL_LIBS, but that
would more likely apply in some situations involving static linking.
This allows the OpenSSL function checks to compile in unbound's CMake
configuration.
Otherwise, the functions SHA256() and EVP_sha512() won't be called from
libunbound as possible algorithms.
They had not been compiling because static OpenSSL libraries were being
used, along with lack of -ldl. The static library preference is
unnecessary for the checks, so use default suffixes ordering for
CMAKE_FIND_LIBRARY_SUFFIXES when building unbound.
Related files:
configure_checks.cmake
external/unbound/validator/val_secalgo.c
secalgo_ds_digest(), setup_key_digest()
Sample use:
BERKELEY_DB=0 make debug
This makes development with BlockchainLMDB easier when virtual methods
have changed and don't match BlockchainBDB.
CMake supports this through THREADS_PREFER_PTHREAD_FLAG.
Remove inclusion of pthread library in EXTRA_LIBRARIES, as the
individual CMakeLists.txt files which need pthread already require it
with CMAKE_THREAD_LIBS_INIT.
Bockchain:
1. Optim: Multi-thread long-hash computation when encountering groups of blocks.
2. Optim: Cache verified txs and return result from cache instead of re-checking whenever possible.
3. Optim: Preload output-keys when encoutering groups of blocks. Sort by amount and global-index before bulk querying database and multi-thread when possible.
4. Optim: Disable double spend check on block verification, double spend is already detected when trying to add blocks.
5. Optim: Multi-thread signature computation whenever possible.
6. Patch: Disable locking (recursive mutex) on called functions from check_tx_inputs which causes slowdowns (only seems to happen on ubuntu/VMs??? Reason: TBD)
7. Optim: Removed looped full-tx hash computation when retrieving transactions from pool (???).
8. Optim: Cache difficulty/timestamps (735 blocks) for next-difficulty calculations so that only 2 db reads per new block is needed when a new block arrives (instead of 1470 reads).
Berkeley-DB:
1. Fix: 32-bit data errors causing wrong output global indices and failure to send blocks to peers (etc).
2. Fix: Unable to pop blocks on reorganize due to transaction errors.
3. Patch: Large number of transaction aborts when running multi-threaded bulk queries.
4. Patch: Insufficient locks error when running full sync.
5. Patch: Incorrect db stats when returning from an immediate exit from "pop block" operation.
6. Optim: Add bulk queries to get output global indices.
7. Optim: Modified output_keys table to store public_key+unlock_time+height for single transaction lookup (vs 3)
8. Optim: Used output_keys table retrieve public_keys instead of going through output_amounts->output_txs+output_indices->txs->output:public_key
9. Optim: Added thread-safe buffers used when multi-threading bulk queries.
10. Optim: Added support for nosync/write_nosync options for improved performance (*see --db-sync-mode option for details)
11. Mod: Added checkpoint thread and auto-remove-logs option.
12. *Now usable on 32-bit systems like RPI2.
LMDB:
1. Optim: Added custom comparison for 256-bit key tables (minor speed-up, TBD: get actual effect)
2. Optim: Modified output_keys table to store public_key+unlock_time+height for single transaction lookup (vs 3)
3. Optim: Used output_keys table retrieve public_keys instead of going through output_amounts->output_txs+output_indices->txs->output:public_key
4. Optim: Added support for sync/writemap options for improved performance (*see --db-sync-mode option for details)
5. Mod: Auto resize to +1GB instead of multiplier x1.5
ETC:
1. Minor optimizations for slow-hash for ARM (RPI2). Incomplete.
2. Fix: 32-bit saturation bug when computing next difficulty on large blocks.
[PENDING ISSUES]
1. Berkely db has a very slow "pop-block" operation. This is very noticeable on the RPI2 as it sometimes takes > 10 MINUTES to pop a block during reorganization.
This does not happen very often however, most reorgs seem to take a few seconds but it possibly depends on the number of outputs present. TBD.
2. Berkeley db, possible bug "unable to allocate memory". TBD.
[NEW OPTIONS] (*Currently all enabled for testing purposes)
1. --fast-block-sync arg=[0:1] (default: 1)
a. 0 = Compute long hash per block (may take a while depending on CPU)
b. 1 = Skip long-hash and verify blocks based on embedded known good block hashes (faster, minimal CPU dependence)
2. --db-sync-mode arg=[[safe|fast|fastest]:[sync|async]:[nblocks_per_sync]] (default: fastest:async:1000)
a. safe = fdatasync/fsync (or equivalent) per stored block. Very slow, but safest option to protect against power-out/crash conditions.
b. fast/fastest = Enables asynchronous fdatasync/fsync (or equivalent). Useful for battery operated devices or STABLE systems with UPS and/or systems with battery backed write cache/solid state cache.
Fast - Write meta-data but defer data flush.
Fastest - Defer meta-data and data flush.
Sync - Flush data after nblocks_per_sync and wait.
Async - Flush data after nblocks_per_sync but do not wait for the operation to finish.
3. --prep-blocks-threads arg=[n] (default: 4 or system max threads, whichever is lower)
Max number of threads to use when computing long-hash in groups.
4. --show-time-stats arg=[0:1] (default: 1)
Show benchmark related time stats.
5. --db-auto-remove-logs arg=[0:1] (default: 1)
For berkeley-db only. Auto remove logs if enabled.
**Note: lmdb and berkeley-db have changes to the tables and are not compatible with official git head version.
At the moment, you need a full resync to use this optimized version.
[PERFORMANCE COMPARISON]
**Some figures are approximations only.
Using a baseline machine of an i7-2600K+SSD+(with full pow computation):
1. The optimized lmdb/blockhain core can process blocks up to 585K for ~1.25 hours + download time, so it usually takes 2.5 hours to sync the full chain.
2. The current head with memory can process blocks up to 585K for ~4.2 hours + download time, so it usually takes 5.5 hours to sync the full chain.
3. The current head with lmdb can process blocks up to 585K for ~32 hours + download time and usually takes 36 hours to sync the full chain.
Averate procesing times (with full pow computation):
lmdb-optimized:
1. tx_ave = 2.5 ms / tx
2. block_ave = 5.87 ms / block
memory-official-repo:
1. tx_ave = 8.85 ms / tx
2. block_ave = 19.68 ms / block
lmdb-official-repo (0f4a036437)
1. tx_ave = 47.8 ms / tx
2. block_ave = 64.2 ms / block
**Note: The following data denotes processing times only (does not include p2p download time)
lmdb-optimized processing times (with full pow computation):
1. Desktop, Quad-core / 8-threads 2600k (8Mb) - 1.25 hours processing time (--db-sync-mode=fastest:async:1000).
2. Laptop, Dual-core / 4-threads U4200 (3Mb) - 4.90 hours processing time (--db-sync-mode=fastest:async:1000).
3. Embedded, Quad-core / 4-threads Z3735F (2x1Mb) - 12.0 hours processing time (--db-sync-mode=fastest:async:1000).
lmdb-optimized processing times (with per-block-checkpoint)
1. Desktop, Quad-core / 8-threads 2600k (8Mb) - 10 minutes processing time (--db-sync-mode=fastest:async:1000).
berkeley-db optimized processing times (with full pow computation)
1. Desktop, Quad-core / 8-threads 2600k (8Mb) - 1.8 hours processing time (--db-sync-mode=fastest:async:1000).
2. RPI2. Improved from estimated 3 months(???) into 2.5 days (*Need 2AMP supply + Clock:1Ghz + [usb+ssd] to achieve this speed) (--db-sync-mode=fastest:async:1000).
berkeley-db optimized processing times (with per-block-checkpoint)
1. RPI2. 12-15 hours (*Need 2AMP supply + Clock:1Ghz + [usb+ssd] to achieve this speed) (--db-sync-mode=fastest:async:1000).