Linux opencl benchmark results

6/19/2023

Multiple input/output/temporary buffer split.Multiple feature/batch convolutions - one input, multiple kernels.WHDCN layout - data is stored in the following order (sorted by increase in strides): the width, the height, the depth, the coordinate (the number of feature maps), the batch number.Can specify the range of sequences filled with zeros and the direction where zero padding is applied (read or write stage) Native zero padding to model open systems (up to 2x faster than simply padding input array with zeros).1x1, 2x2, 3x3 convolutions with symmetric or nonsymmetric kernel (no register overutilization).R2R, R2C and C2R are optimized to run up to 2x times faster than C2C and take 2x less memory Complex to complex (C2C), real to complex (R2C), complex to real (C2R) transformations and real to real (R2R) Discrete Cosine Transformations of types I, II, III and IV.Doesn't matter for convolutions - they return to the input ordering (saves memory). Note: Data can be reshuffled after the Four Step FFT algorithm with an additional buffer (for big sequences). Out-of-place transforms are supported by selecting different input/output buffers. All transformations are performed in-place with no performance loss.Half precision still does all computations in single and only uses half precision to store data. Double precision uses CPU-generated LUT tables. Single, double and half precision support.Optimized to have as few memory transfers as possible by using zero padding and merged convolution support of VkFFT Full coverage of C2C range, single upload (2^12, 2^12, 2^12) for R2C/C2R/R2R. Bluestein's FFT algorithm for all other sequences.Inlined and done without additional memory transfers. Rader's FFT algorithm for primes from 17 up to max shared memory length (~10000).Sequences using radix 3, 5, 7, 11 and 13 have comparable performance to that of powers of 2. Depends on the amount of shared memory on the device. The white paper of VkFFT is out - if you use VkFFT and want to cite it: Currently supported features:

VkFFT is written in C language and supports Vulkan, CUDA, HIP, OpenCL, Level Zero and Metal as backends. VkFFT aims to provide the community with an open-source alternative to Nvidia's cuFFT library while achieving better performance. VkFFT is an efficient GPU-accelerated multidimensional Fast Fourier Transform library for Vulkan/CUDA/HIP/OpenCL/Level Zero/Metal projects. VkFFT - Vulkan/CUDA/HIP/OpenCL/Level Zero/Metal Fast Fourier Transform library

0 Comments

Linux opencl benchmark results

Leave a Reply.

Author

Archives

Categories