Efficient, Flexible, and Constant-Time Gaussian Sampling Hardware for Lattice Cryptography


Karabulut E., Alkim E., Aysu A.

IEEE TRANSACTIONS ON COMPUTERS, cilt.71, sa.8, ss.1810-1823, 2021 (SCI-Expanded) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 71 Sayı: 8
  • Basım Tarihi: 2021
  • Doi Numarası: 10.1109/tc.2021.3107729
  • Dergi Adı: IEEE TRANSACTIONS ON COMPUTERS
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Aerospace Database, Applied Science & Technology Source, Business Source Elite, Business Source Premier, Communication Abstracts, Compendex, Computer & Applied Sciences, INSPEC, Metadex, zbMATH, Civil Engineering Abstracts
  • Sayfa Sayıları: ss.1810-1823
  • Anahtar Kelimeler: Hardware, Cryptography, Gaussian distribution, Standards, Timing, Optimization, Encryption, Discrete gaussian sampling, lattice cryptography, FPGA
  • Dokuz Eylül Üniversitesi Adresli: Evet

Özet

This paper proposes a discrete Gaussian sampling hardware design that can flexibly support different sampling parameters, that is more efficient (in area-delay product) compared to the majority of earlier proposals, and that has constant execution time. The proposed design implements a Cumulative Distribution Table (CDT) approach, reduces the table size with Gaussian convolutions, and adopts an innovative fusion tree search algorithm to achieve a compact and fast sampling technique-to our best knowledge, this is the first hardware implementation of fusion tree search algorithm. The proposed hardware can support all the discrete Gaussian distributions used in post-quantum digital signatures and key encapsulation algorithms (FALCON, qTESLA, and FrodoKEM), the homomorphic encryption library of SEAL, and other algorithms such BLISS digital signature and LP public-key encryption. Our proposed hardware can be configured at design-time to optimize a single configuration or at run-time to support multiple Gaussian distribution parameters. Our design, furthermore, has constant-time behavior by design, eliminating timing side-channel attacks-this is achieved by reading all table contents at the same time to also reduce the latency. The results on a Xilinx Virtex-7 FPGA show that our solution can outperform all prior proposals in area-delay product by 1.67-235.88x, only falling short to those designed for the LP encryption scheme.