IEEE TRANSACTIONS ON COMPUTERS, cilt.71, sa.8, ss.1810-1823, 2021 (SCI-Expanded)
This paper proposes a discrete Gaussian sampling hardware design that can flexibly support different sampling parameters, that is more efficient (in area-delay product) compared to the majority of earlier proposals, and that has constant execution time. The proposed design implements a Cumulative Distribution Table (CDT) approach, reduces the table size with Gaussian convolutions, and adopts an innovative fusion tree search algorithm to achieve a compact and fast sampling technique-to our best knowledge, this is the first hardware implementation of fusion tree search algorithm. The proposed hardware can support all the discrete Gaussian distributions used in post-quantum digital signatures and key encapsulation algorithms (FALCON, qTESLA, and FrodoKEM), the homomorphic encryption library of SEAL, and other algorithms such BLISS digital signature and LP public-key encryption. Our proposed hardware can be configured at design-time to optimize a single configuration or at run-time to support multiple Gaussian distribution parameters. Our design, furthermore, has constant-time behavior by design, eliminating timing side-channel attacks-this is achieved by reading all table contents at the same time to also reduce the latency. The results on a Xilinx Virtex-7 FPGA show that our solution can outperform all prior proposals in area-delay product by 1.67-235.88x, only falling short to those designed for the LP encryption scheme.