Arithmetization in the polynomial ring
Generating arithmetic circuits that exploit batching and automorphisms
In a previous blog post, we presented methods for performing depth aware arithmetization in prime fields \(\mathbb{F}_p\). In other words, we showed how to realize high-level circuits in \(\mathbb{F}_p\), which only defines additions and multiplications. What made these methods depth-aware, is that we generated circuits that achieve a trade off between their multiplicative depth (the largest number of multiplications on any path through the circuit) and the total number of multiplications. These methods allowed us to generate more efficient circuits for e.g. comparisons (\(a < b\)) than previous works. However, these methods leave several things to be desired: they do not consider batching and automorphisms. In this blog post, we will show how to consider automorphisms by extending beyond prime fields, and how to further optimize circuits in the context of homomorphic encryption by exploiting batching in the polynomial ring.
Arithmetization in \(\mathbb{F}_{p^d}\): Exploiting automorphisms
In the finite fields \(\mathbb{F}p\) that we previously considered, all the useful operations are described by additions and multiplications. In those prime fields, the only automorphism is the identity map. However, when we consider non-prime fields \(\mathbb{F}{p^d}\), there are several useful automorphisms of the form \(x \mapsto x^{p^k}\), where \(0 < k < d\). When \(k=1\), we call this the Frobenius automorphism. As such, where arithmetic circuits over \(\mathbb{F}_p\) were only comprised of additions and multiplications, we now consider additions, multiplications, and automorphisms.
While these automorphisms may not seem all that useful by themselves, they can be used to efficiently compute the field norm and trace. These operations, in turn, can be used to (more) efficiently evaluate polynomials, as shown by Hiroki Okada, Rachel Player, and Simon Pohmann in this article, or to speed up matrix multiplications, as shown by Jai Hyun Park in this other article. In this blog post, we will focus on univariate polynomial evaluation, which allows us to compute comparison operations even more efficiently than before!
So how does this work? The trick is that there is a relation between the minimal polynomial of an element \(\alpha \in \mathbb{F}_{p^d}\) and its field norm: \(\operatorname{Norm}(\alpha - x) = \operatorname{MiPo}(\alpha)(x),\) where \(\operatorname{MiPo}(\alpha)\) is the monic (univariate) polynomial of the lowest degree that maps \(\alpha\) to zero. Because no lower degree polynomials exist that also map \(\alpha\) to zero, the minimal polynomial is irreducible. This relation allows us to evaluate any irreducible polynomial over \(\mathbb{F}_p\) by finding a suitable \(\alpha\) and computing the field norm of \(\alpha - x\). Note that the polynomial’s degree must be lower than \(D\)!
So how do we find such a suitable \(\alpha \in \mathbb{F}{p^d}\)? To evaluate an irreducible monic polynomial with coefficients in \( \mathbb{F}_p \), we want to find its root in the polynomial ring \(\mathbb{F}{p^d}[X]\). Our current approach is to factor the polynomial in this polynomial ring. For larger polynomials, this can be a costly process, but it only has to be computed once (\(\alpha \) does not depend on the inputs of the polynomial evaluation).
The reason that this approach leads to more efficient evaluations, is that whereas previous methods required \( O(\sqrt{D}) \) multiplications to compute a degree-\(D\) polynomial, the norm can be computed in only \( 2 \log_2(D) \) multiplications and automorphisms. As such, it leads to significantly more efficient circuits when evaluating large polynomials.
How can we evaluate reducible polynomials? Just like Hiroki Okada, Rachel Player, and Simon Pohmann, we add an easily computable term to the polynomial that makes it irreducible, and we remove this term after the polynomial evaluation. In our experiments, it has always been sufficient to add a simple constant to the polynomial.
Arithmetization in \( \mathbb{Z}_p/(X^N + 1) \): Exploiting batching
Note: we only consider \(N\) that are powers of two!
Batching (or the SIMD encoding) is a ubiquitous tool in fully homomorphic encryption schemes such as BFV and BGV because it allows one to pack multiple elements into one ciphertext and perform the same additions and multiplications on all of them. The oraqle compiler so far has not considered batching at all, so the only way to exploit batching was to evaluate the circuit many times in parallel. However, batching can also be used when we only want to compute one evaluation of an arithmetic circuit.
Batching relies on the fact that the quotient polynomial \(X^N + 1\) factors into multiple smaller equal-degree irreducible polynomials modulo \(p\). These sub-polynomials allow us to encode elements into separate slots, which are isomorphic to the finite fields we arithmetized over in the previous section. Previous work has shown how to achieve a grid layout for these slots, allowing rotating dimensions of the grid using automorphisms.
In the compiler, we introduce new nodes in the style of functional programming. For now, we are implementing maps, reduces, and splicing operations. When the sequences they work on are small enough, we can map them onto the grid layout mentioned above. The compiler tries many possible assignments to achieve a trade off between multiplications and automorphism operations (for rotations).
Polynomial evaluation in $\mathbb{Z}_p/(X^N + 1)$
In our article on depth-aware arithmetization, we proposed a simple divide-and-conquer technique for evaluating large-degree polynomials that often produces lower-depth solutions than previous work such as the baby-step giant-step method and the Paterson-Stockmeyer method. We can also use this technique to evaluate polynomials of larger degrees by splitting it up into multiple smaller univariate polynomial evaluations. This, in turn, allows us to use the efficient automorphism-based approach described above. We take the following steps:
- We split the large-degree polynomial up into evaluations of degree-\(D\) polynomials
- We use the Map node to evaluate all degree-\(D\) polynomials in parallel using automorphisms
- We use the Reduce node to combine the results, multiplying the appropriate terms with powers of \(x\)
By using the functional programming nodes defined above, the compiler can find the best way to pack these evaluations into multiple ciphertexts.
We are still implementing these techniques in the oraqle compiler compiler, and we expect to take a month or two to release the next version. Behind the scenes, there a lot of changes! For example, we now require supporting multiple plaintext algebras and the functional programming nodes add a completely new optimization step. We are also still experimenting with some heuristics for merging multiple functional programming nodes automatically.