vkfft 1.3.4+ds2-1 source package in Ubuntu


vkfft (1.3.4+ds2-1) unstable; urgency=medium

  * New upstream version 1.3.4+ds2

 -- Picca Frédéric-Emmanuel <email address hidden>  Fri, 16 Feb 2024 14:38:24 +0100

Upload details

Uploaded by:
Debian PaN Maintainers
Uploaded to:
Original maintainer:
Debian PaN Maintainers
Medium Urgency

See full publishing history Publishing

Series Pocket Published Component Section
Noble release universe misc


Noble: [FULLYBUILT] amd64


File Size SHA-256 Checksum
vkfft_1.3.4+ds2-1.dsc 2.2 KiB 94a36757e0ca07706ac7fc96437bfb20051728b66e12fb4b089427fd5fb539d4
vkfft_1.3.4+ds2.orig.tar.xz 5.5 MiB 6417a0c59e83942ba33f758ff02bf54208adf93b7719195f73981a643b6cbbd0
vkfft_1.3.4+ds2-1.debian.tar.xz 5.3 KiB d34ca1106e052f2898a58c0d0f339ab173ea2d8990a5e62873aab2198e5744d7

Available diffs

No changes file available.

Binary packages built by this source

libvkfft-dev: Vulkan/CUDA/HIP/OpenCL Fast Fourier Transform library

 VkFFT is an efficient GPU-accelerated multidimensional Fast Fourier
 Transform library for Vulkan/CUDA/HIP/OpenCL projects. VkFFT aims to
 provide the community with an open-source alternative to Nvidia's
 cuFFT library while achieving better performance. VkFFT is written in
 C language and supports Vulkan, CUDA, HIP and OpenCL as backends.
  + 1D/2D/3D systems
  + Forward and inverse directions of FFT
  + Support for big FFT dimension sizes. Current limits: C2C or even
 C2R/R2C - (2^32, 2^32, 2^32). Odd C2R/R2C - (2^12, 2^32, 2^32). R2R -
 (2^12, 2^12, 2^12). Depends on the amount of shared memory on the
 device. (will be increased later).
  + Radix-2/3/4/5/7/8/11/13 FFT. Sequences using radix 3, 5, 7, 11 and
 13 have comparable performance to that of powers of 2.
  + Bluestein's FFT algorithm for all other sequences. Full coverage
 of C2C range, single upload (2^12, 2^12, 2^12) for
 R2C/C2R/R2R. Optimized to have as few memory transfers as possible by
 using zero padding and merged convolution support of VkFFT
  + Single, double and half precision support. Double precision uses
 CPU-generated LUT tables. Half precision still does all computations
 in single and only uses half precision to store data.
  + All transformations are performed in-place with no performance
 loss. Out-of-place transforms are supported by selecting different
 input/output buffers.
  + No additional transposition uploads. Note: Data can be reshuffled
 after the Four Step FFT algorithm with an additional buffer (for big
 sequences). Doesn't matter for convolutions - they return to the
 input ordering (saves memory).
  + Complex to complex (C2C), real to complex (R2C), complex to real
 (C2R) transformations and real to real (R2R) Discrete Cosine
 Transformations of types I, II, III and IV. R2R, R2C and C2R are
 optimized to run up to 2x times faster than C2C and take 2x less
  + 1x1, 2x2, 3x3 convolutions with symmetric or nonsymmetric kernel
 (no register overutilization)
  + Native zero padding to model open systems (up to 2x faster than
 simply padding input array with zeros). Can specify the range of
 sequences filled with zeros and the direction where zero padding is
 applied (read or write stage)
  + WHDCN layout - data is stored in the following order (sorted by
 increase in strides): the width, the height, the depth, the
 coordinate (the number of feature maps), the batch number
  + Multiple feature/batch convolutions - one input, multiple kernels
  + Multiple input/output/temporary buffer split. Allows using data
 split between different memory allocations and mitigates 4GB single
 allocation limit.
  + Works on Nvidia, AMD and Intel GPUs. And Raspberry Pi 4 GPU.
  + Works on Windows, Linux and macOS
  + VkFFT supports Vulkan, CUDA, HIP and OpenCL as backend to cover
 wide range of APIs
  + Header-only library with Vulkan interface, which allows appending
 VkFFT directly to user's command buffer. Kernels are compiled at