Modular numbers

This document describes the use and implementation of the modulars within the numerix package.

1.General modulars and moduli

Moduli are implemented in modulus.hpp. An object of type modulus is declared as follows:

modulus<C,V> p;

V is an implementation variant. The default variant, named modulus_naive, is implemented in modulus_naive.hpp and supports a type C with an Euclidean division. More variants are described below. Modulars are implemented in modular.hpp. An object of type modular can be used as follows:

modular<modulus<C, V>, W> x, y, z;
modular<modulus<C, V>, W>::set_modulus (p);
x = y * z;

The variant W is used to define the storage of the modulus. Default storage is modular_local that allows the current modulus to be changed at any time. The default modulus is 0. For the integer type the default variant is set to modulus_integer_naive, so that modular integers can be used as follows:

#include<numerix/integer.hpp>
#include<numerix/modular.hpp>
#include<numerix/modulus_integer_naive.hpp>

typedef modulus<integer> M;
M p (9973);
modular<M>::set_modulus (p);
modular<M> a (1), b (3);
c = a * b;
cout << c << "\n";

If the modulus is known to be fixed at compilation time then the following declaration is preferable for the sake of efficiency:

fixed_value <int, 3> w;
modulus<integer> p (w);

If the modulus is not intended to be changed and known at compilation time then the variant modular_fixed can be used.

2.Moduli for hardware integers

From now on a div p and a mod p represent the quotient and the remainder in the long division of a by p respectively. Throughout this section C denotes a genuine integer C type. We write n for the bit-size of C, and p for a modulus that fits in C.

2.1.Naive implementation

The default variant, modulus_int_naive<m>, is defined in modular_int.hpp. Each modular product performs one product and one integer division. The parameter m is the maximum bit-size of the modulus allowed by the variant. This parameter can be set to n. Best performances are obtained with m = n - 1. Note that we necessarily have m⩽n - 1 if C is a signed integer type. By convention the modulus 0 actually means 2m.

2.2.Moduli with pre-computed inverse

In the variant modulus_int_preinverse<m> a suitable inverse of the modulus is pre-computed and stored as an element of type C, this yields faster computations when doing several operations with the same modulus. The behavior of the parameter m is as in the preceding naive variant. The best performances are attained for when m⩽n - 2.

Within the implementation of this variant the following auxiliary quantities are needed:

By convention we let r = n if p is zero. Note that q<2n, hence fits in C.

2.2.1.Modular product

Let a be an integer such that 0 ≤a<p2. The remainder a mod p can be obtained as follows:

  1. Decompose a into a0 and a1 such that a = a12s + a0, with a0<2s, and compute b = a1q div 2t. From a<p2 it follows that a1q < p2 q / 2sp2t. So a1q has size at most r + t, and we have b<p.
  2. Compute c = a - b p. From the definition of q we have:

    0≤

    2s + t
    p
    - q < 1 ⟹0≤ a - a 1 q p /2 t < a 1 p /2 t + a 0 .

    It follows that 0≤c<p + a1p/2t + a0.

  3. Let h = 22r /2s + t + 2s/2r - 1. From a0<2s we deduce that a1p/2t + a0h p. It thus suffices to substract at most h times p from a to obtain a mod p.

In general we can always take t = r and s = r - 1, so that h = 3. For when r + 1≤n, it is better to take t = r + 1 and s = r - 1 so that h≤2. Finally, for when r + 2≤n one can take t = r + 3 and s = r - 2 so that h≤1. These settings are summarized in the following table:

r + 2≤n r + 1≤n rn
s r - 2 r - 1 r - 1
t r + 3 r + 1 r
h 1 2 3

In these three cases remark that the integer a1q has size at most 2n.

It is clear that the case r + 2≤n leads to the fastest algorithm since step 3 reduces to one substraction/comparison that can be implemented as min(c,c - p) within type C alone. A special attention must be drawn to when r = 1, that corresponds to p = 2: we must suppose that s = r - 2 = - 1 is indeed computed within an unsigned int type, hence is sufficiently large to ensure a1 = 0 and c = a, hence the correctness of the algorithm.

2.2.2.Numerical inverse

Let us now turn to the computation of the numerical inverse q of p. The strategy presented here consists in performing one long division in C followed by one Newton iteration. Let w = 2r - 1/p be the numerical inverse of p, normalized so that 1/2 ≤w<1 holds. Let x0 denote the initial approximation, and x1 be its Newton iterate x1 = 2x0 - p x

2
0
/2r - 1. If 0≤w - x0<2 - s0 for some positive integer s0 then it is classical that 0≤w - x1<2 - s1 with s1 = 2s0 - 1.

Under the assumption that s0n/2, we can compute a suitable x0 as the floor part of 2s0w by the only use of operations within base type C. In other words we calculate q0 = 2s0 + r - 1 div p = 2s0x0 as follows:

  1. If rs0 then q0 is obtained by means of one division in C.
  2. Otherwise r>s0. We decompose p into p = p12r - s0 + p0, with p0<2r - s0. Note that 2s0 - 1p1≤2s0. Within C perform the long division 22s0 - 1 = a p1 + b. By multiplying both side of the latter equality by 2r - s0 we obtain that 2s0 + r - 1 = a p - a p0 + b2r - s0. From a and b, we easily deduce q0 since b2r - s0<p and a p0<22s0 - 12r - s0/2s0 - 1 = 2r.

The computation of q1 = 2s1 + r - 1 div p, that is the floor part of 2s1w, then continues as follows:

  1. From the definition of x1, one can write: x12s11 = q02s0 - p q
    2
    0
    /2r. We thus compute q02s0 and the ceil part of p q
    2
    0
    /2r in order to deduce the floor part c of x12s1. Note that c fits C.
  2. From the above inequalities it follows that 0≤2s1w - c<2. We thus obtain d = 2s1 + r - 1 - c p < 2p. If dp then return q1 = c + 1 else return q1 = c.

Finally, q is deduced in this way:

  1. Set s0 as (s + t - (r - 1)) div 2, and compute q1 = 2s1 + r - 1 div p as just described.
  2. Note that 1≤s + t - (r - 1) - s1≤2. If s1+1=s + t - (r - 1) then q = 2s1 + r div p otherwise q = 2s1 + r + 1 div p.

2.2.3.Modular product by a constant

Let b be a modulo p integer that is to be involved in several modular products. Then one can compute β = 2rb div p < 2r just once and obtain a speed-up within the products by means of the follows algorithm:

  1. Compute α = aβ div 2r. It follows that 0≤a b - α p < 2p.
  2. Compute c = a b - α p. If cp then c=c+1. Return c.

The computation of β can be done as follows from q:

  1. Compute γ = q b div 2t + s - r<2r. Note that 2rb/p - 2rb/2t + s - 1<γ≤2rb/p.
  2. Since 0≤2rb - γp<g p, with g = 2 if r + 1≤n and g = 3 otherwise, deduce 2rb div p from γ.

3.Further remarks

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU General Public License. If you don't have this file, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.