
- #Cmake options how to#
- #Cmake options code#
#Cmake options how to#
For information on how to build applications that use GLFW, see Building applications. we can replace SSE4_2 to POPCNT dispatch level in CMake file.This is about compiling the GLFW library itself.
#Cmake options code#
support NORM_HAMMING2 too, but this will require additional optimizations (probably for AVX2 code only).Removal of these loops before generation of the first report resolves this strange behavior (slowdown of SSE4.2 code).įor reference, similar report for code without patch: AVX-related issue is a compiler problem with processing of unrolled loops.We can see that there is no improvements from dispatched AVX optimization, so we can remove it from CMake file: "optimize size of binaries, drop AVX dispatching" "sse3.xml": mask optimizations via environment variable: OPENCV_CPU_DISABLE=AVX,AVX2,SSE4.2."sse42.xml": mask optimizations via environment variable: OPENCV_CPU_DISABLE=AVX2,AVX."avx.xml": mask AVX2 optimizations via environment variable: OPENCV_CPU_DISABLE=AVX2.without restrictions and save results to "avx2.xml".check for enabled dispatching levels: "-DCPU_DISPATCH=SSE4_2 AVX AVX2".To workaround this problem we should use vzeroupper instruction. Mixing of SSE/AVX code during runtime usually provides significant performance impact. We reusing popCountTable multiple times from different compilations units, so we need to make it "external".Also we pass list of enabled optimizations: SSE4_2 AVX AVX2.
We should register our dispatched code from. On the next step we should "register dispatched code, fix build". hpp file, because these platform dependent checks is done in compile-time (controlled via defines).
It is simple helper file without algorithm logic, but it contains entry-points for optimized functions and dispatch rules. hpp file w/o changes".Īfter that we should "create dispatch.cpp file". Refer to PR commit "move implementations into. This header file will be processed multiple times - so we will generate binary code with different optimization options using single source file. Let's extract implementations of interested functions into separate.
Dispatching for NORM_HAMMING2 will not increase speed, so we avoid it. Dispatching for SSE4.1 mode is useless. AVX2 ( ~3 total speedup, ~1.5 speedup after SSE4.2/AVX). AVX for norm function only ( 1.3 speedup over SSE4.2). SSE4.2 ( popcount instruction, 1.7-1.9 speedup). We would gain performance improvement on dispatched. On this report we can see on the second part (with -progress option): When run performance tests and build report like this: It is better to build these versions of OpenCV configuration in different folders. I select on x86-64 platform: SSE3 (minimal), SSE4.1, SSE4.2, AVX, AVX2 (max level on my platform), DETECT (with -march=native compiler option). Please don't waste your time and time of reviewer doing this without good performance tests.Ĭompile OpenCV performance test with different CPU baseline features with disabled dispatching (depends on your platform). This "How to" example is based on optimization of Hamming norm algorithm ( core module, file stat.cpp).Įnsure that you have performance tests for selected functionality. CPU_BASELINE - minimal set of required optimizations (if they are supported by C++ compiler). OpenCV uses these CMake variables to control supported optimization features: This configuration provides the best effort on wide range of users platforms. They will be executed on supported processors only.īy default, OpenCV on x86_64 uses SSE3 as basic instruction set and enables dispatched optimizations for SSE4.2, AVX, AVX2 instruction sets. Executable will not run if some of these options are not available on target processor.ĭispatched optimizations are additional code paths compiled into executable. Minimal is required set of processor features. These options are available since OpenCV 3.3 (released in Aug 2017).īuild options allow to specify minimal and dispatched optimization features sets: Note: Build options described here don't control behavior of CPU-based optimizations from Intel® Integrated Performance Primitives (Intel® IPP, ). Selection of executed code path is based on auto-detection of available processor features. Some OpenCV functions contains multiple code paths specialized for different processors features / instruction sets.
OpenCV goal is to provide effective processors support, including separate optimized code paths for newest instruction sets. There are many processors architectures: x86 / x86-64 family, ARMv7, aarch64, etc.Įach architecture may support different additional instruction sets (SSE/AVX for x86, NEON for ARMv7). Building more compact applications with OpenCV.TIM-VX Backend For Running OpenCV On NPU.