Crypto++  8.7
Free C++ class library of cryptographic schemes
integer.cpp
1 // integer.cpp - originally written and placed in the public domain by Wei Dai
2 // contains public domain code contributed by Alister Lee and Leonard Janke
3 
4 // Notes by JW: The Integer class needs to do two things. First, it needs
5 // to set function pointers on some platforms, like X86 and X64. The
6 // function pointers select a fast multiply and addition based on the cpu.
7 // Second, it wants to create Integer::Zero(), Integer::One() and
8 // Integer::Two().
9 // The function pointers are initialized in the InitializeInteger class by
10 // calling SetFunctionPointers(). The call to SetFunctionPointers() is
11 // guarded to run once using a double-checked pattern. We don't use C++
12 // std::call_once due to bad interactions between libstdc++, glibc and
13 // pthreads. The bad interactions were causing crashes for us on platforms
14 // like Sparc and PowerPC. Since we are only setting function pointers we
15 // don't have to worry about leaking memory. The worst case seems to be the
16 // pointers gets written twice until the init flag is set and visible to
17 // all threads.
18 // For Integer::Zero(), Integer::One() and Integer::Two(), we use one of three
19 // strategies. First, if initialization priorities are available then we use
20 // them. Initialization priorities are init_priority() on Linux and init_seg()
21 // on Windows. OS X and several other platforms lack them. Initialization
22 // priorities are platform specific but they are also the most trouble free
23 // with deterministic destruction.
24 // Second, if C++11 dynamic initialization is available, then we use it. After
25 // the std::call_once fiasco we moved to dynamic initialization to avoid
26 // unknown troubles platforms that are tested less frequently. In addition
27 // Microsoft platforms mostly do not provide dynamic initialization.
28 // The MSDN docs claim they do but they don't in practice because we need
29 // Visual Studio 2017 and Windows 10 or above.
30 // Third, we fall back to Wei's original code of a Singleton. Wei's original
31 // code was much simpler. It simply used the Singleton pattern, but it always
32 // produced memory findings on some platforms. The Singleton generates memory
33 // findings because it uses a Create On First Use pattern (a dumb Nifty
34 // Counter) and the compiler had to be smart enough to fold them to return
35 // the same object. Unix and Linux compilers do a good job of folding objects,
36 // but Microsoft compilers do a rather poor job for some versions of the
37 // compiler.
38 // Another problem with the Singleton is resource destruction requires running
39 // resource acquisition in reverse. For resources provided through the
40 // Singletons, there is no way to express the dependency order to safely
41 // destroy resources. (That's one of the problems C++11 dynamic
42 // initialization with concurrent execution is supposed to solve).
43 // The final problem with Singletons is resource/memory exhaustion in languages
44 // like Java and .Net. Java and .Net load and unload a native DLL hundreds or
45 // thousands of times during the life of a program. Each load produces a
46 // memory leak and they can add up quickly. If they library is being used in
47 // Java or .Net then Singleton must be avoided at all costs.
48 //
49 // The code below has a path cut-in for BMI2 using mulx and adcx instructions.
50 // There was a modest speedup of approximately 0.03 ms in public key Integer
51 // operations. We had to disable BMI2 for the moment because some OS X machines
52 // were advertising BMI/BMI2 support but caused SIGILL's at runtime. Also see
53 // https://github.com/weidai11/cryptopp/issues/850.
54 
55 #include "pch.h"
56 #include "config.h"
57 
58 #ifndef CRYPTOPP_IMPORTS
59 
60 #include "integer.h"
61 #include "secblock.h"
62 #include "modarith.h"
63 #include "nbtheory.h"
64 #include "smartptr.h"
65 #include "algparam.h"
66 #include "filters.h"
67 #include "stdcpp.h"
68 #include "asn.h"
69 #include "oids.h"
70 #include "words.h"
71 #include "pubkey.h" // for P1363_KDF2
72 #include "sha.h"
73 #include "cpu.h"
74 #include "misc.h"
75 
76 #include <iostream>
77 
78 #if (_MSC_VER >= 1400) && !defined(_M_ARM)
79  #include <intrin.h>
80 #endif
81 
82 #ifdef __DECCXX
83  #include <c_asm.h>
84 #endif
85 
86 // "Error: The operand ___LKDB cannot be assigned to",
87 // http://github.com/weidai11/cryptopp/issues/188
88 #if (__SUNPRO_CC >= 0x5130)
89 # define MAYBE_CONST
90 # define MAYBE_UNCONST_CAST(x) const_cast<word*>(x)
91 #else
92 # define MAYBE_CONST const
93 # define MAYBE_UNCONST_CAST(x) x
94 #endif
95 
96 // "Inline assembly operands don't work with .intel_syntax",
97 // http://llvm.org/bugs/show_bug.cgi?id=24232
98 #if CRYPTOPP_BOOL_X32 || defined(CRYPTOPP_DISABLE_MIXED_ASM)
99 # undef CRYPTOPP_X86_ASM_AVAILABLE
100 # undef CRYPTOPP_X32_ASM_AVAILABLE
101 # undef CRYPTOPP_X64_ASM_AVAILABLE
102 # undef CRYPTOPP_SSE2_ASM_AVAILABLE
103 # undef CRYPTOPP_SSSE3_ASM_AVAILABLE
104 #else
105 # define CRYPTOPP_INTEGER_SSE2 (CRYPTOPP_SSE2_ASM_AVAILABLE && (CRYPTOPP_BOOL_X86))
106 #endif
107 
108 // ***************** C++ Static Initialization ********************
109 
110 NAMESPACE_BEGIN(CryptoPP)
111 
112 // Function body near the middle of the file
113 static void SetFunctionPointers();
114 
115 // Use a double-checked pattern. We are not leaking anything so it
116 // does not matter if a pointer is written twice during a race.
117 // Avoid std::call_once due to too many problems on platforms like
118 // Solaris and Sparc. Also see
119 // http://gcc.gnu.org/bugzilla/show_bug.cgi?id=66146 and
120 // http://github.com/weidai11/cryptopp/issues/707.
121 InitializeInteger::InitializeInteger()
122 {
123  static bool s_flag;
124  MEMORY_BARRIER();
125  if (s_flag == false)
126  {
127  SetFunctionPointers();
128  s_flag = true;
129  MEMORY_BARRIER();
130  }
131 }
132 
133 template <long i>
134 struct NewInteger
135 {
136  Integer * operator()() const
137  {
138  return new Integer(i);
139  }
140 };
141 
142 // ***************** Library code ********************
143 
144 inline static int Compare(const word *A, const word *B, size_t N)
145 {
146  while (N--)
147  if (A[N] > B[N])
148  return 1;
149  else if (A[N] < B[N])
150  return -1;
151 
152  return 0;
153 }
154 
155 inline static int Increment(word *A, size_t N, word B=1)
156 {
157  CRYPTOPP_ASSERT(N);
158  word t = A[0];
159  A[0] = t+B;
160  if (A[0] >= t)
161  return 0;
162  for (unsigned i=1; i<N; i++)
163  if (++A[i])
164  return 0;
165  return 1;
166 }
167 
168 inline static int Decrement(word *A, size_t N, word B=1)
169 {
170  CRYPTOPP_ASSERT(N);
171  word t = A[0];
172  A[0] = t-B;
173  if (A[0] <= t)
174  return 0;
175  for (unsigned i=1; i<N; i++)
176  if (A[i]--)
177  return 0;
178  return 1;
179 }
180 
181 static void TwosComplement(word *A, size_t N)
182 {
183  Decrement(A, N);
184  for (unsigned i=0; i<N; i++)
185  A[i] = ~A[i];
186 }
187 
188 static word AtomicInverseModPower2(word A)
189 {
190  CRYPTOPP_ASSERT(A%2==1);
191 
192  word R=A%8;
193 
194  for (unsigned i=3; i<WORD_BITS; i*=2)
195  R = R*(2-R*A);
196 
197  CRYPTOPP_ASSERT(R*A==1);
198  return R;
199 }
200 
201 // ********************************************************
202 
203 #if !defined(CRYPTOPP_NATIVE_DWORD_AVAILABLE) || ((defined(__aarch64__) || defined(__x86_64__)) && defined(CRYPTOPP_WORD128_AVAILABLE))
204  #define TWO_64_BIT_WORDS 1
205  #define Declare2Words(x) word x##0, x##1;
206  #define AssignWord(a, b) a##0 = b; a##1 = 0;
207  #define Add2WordsBy1(a, b, c) a##0 = b##0 + c; a##1 = b##1 + (a##0 < c);
208  #define LowWord(a) a##0
209  #define HighWord(a) a##1
210  #ifdef _MSC_VER
211  #define MultiplyWordsLoHi(p0, p1, a, b) p0 = _umul128(a, b, &p1);
212  #ifndef __INTEL_COMPILER
213  #define Double3Words(c, d) d##1 = __shiftleft128(d##0, d##1, 1); d##0 = __shiftleft128(c, d##0, 1); c *= 2;
214  #endif
215  #elif defined(__aarch32__) || defined(__aarch64__)
216  #define MultiplyWordsLoHi(p0, p1, a, b) p0 = a*b; asm ("umulh %0,%1,%2" : "=r"(p1) : "r"(a), "r"(b));
217  #elif defined(__DECCXX)
218  #define MultiplyWordsLoHi(p0, p1, a, b) p0 = a*b; p1 = asm("umulh %a0, %a1, %v0", a, b);
219  #elif defined(__x86_64__)
220  #if defined(__SUNPRO_CC) && __SUNPRO_CC < 0x5100
221  // Sun Studio's gcc-style inline assembly is heavily bugged as of version 5.9 Patch 124864-09 2008/12/16, but this one works
222  #define MultiplyWordsLoHi(p0, p1, a, b) asm ("mulq %3" : "=a"(p0), "=d"(p1) : "a"(a), "r"(b) : "cc");
223  #elif defined(__BMI2__) && 0
224  #define MultiplyWordsLoHi(p0, p1, a, b) asm ("mulxq %3, %0, %1" : "=r"(p0), "=r"(p1) : "d"(a), "r"(b));
225  #define MulAcc(c, d, a, b) asm ("mulxq %6, %3, %4; addq %3, %0; adcxq %4, %1; adcxq %7, %2;" : "+&r"(c), "+&r"(d##0), "+&r"(d##1), "=&r"(p0), "=&r"(p1) : "d"(a), "r"(b), "r"(W64LIT(0)) : "cc");
226  #define Double3Words(c, d) asm ("addq %0, %0; adcxq %1, %1; adcxq %2, %2;" : "+r"(c), "+r"(d##0), "+r"(d##1) : : "cc");
227  #define Acc2WordsBy1(a, b) asm ("addq %2, %0; adcxq %3, %1;" : "+&r"(a##0), "+r"(a##1) : "r"(b), "r"(W64LIT(0)) : "cc");
228  #define Acc2WordsBy2(a, b) asm ("addq %2, %0; adcxq %3, %1;" : "+r"(a##0), "+r"(a##1) : "r"(b##0), "r"(b##1) : "cc");
229  #define Acc3WordsBy2(c, d, e) asm ("addq %5, %0; adcxq %6, %1; adcxq %7, %2;" : "+r"(c), "=&r"(e##0), "=&r"(e##1) : "1"(d##0), "2"(d##1), "r"(e##0), "r"(e##1), "r"(W64LIT(0)) : "cc");
230  #else
231  #define MultiplyWordsLoHi(p0, p1, a, b) asm ("mulq %3" : "=a"(p0), "=d"(p1) : "a"(a), "g"(b) : "cc");
232  #define MulAcc(c, d, a, b) asm ("mulq %6; addq %3, %0; adcq %4, %1; adcq $0, %2;" : "+r"(c), "+r"(d##0), "+r"(d##1), "=a"(p0), "=d"(p1) : "a"(a), "g"(b) : "cc");
233  #define Double3Words(c, d) asm ("addq %0, %0; adcq %1, %1; adcq %2, %2;" : "+r"(c), "+r"(d##0), "+r"(d##1) : : "cc");
234  #define Acc2WordsBy1(a, b) asm ("addq %2, %0; adcq $0, %1;" : "+r"(a##0), "+r"(a##1) : "r"(b) : "cc");
235  #define Acc2WordsBy2(a, b) asm ("addq %2, %0; adcq %3, %1;" : "+r"(a##0), "+r"(a##1) : "r"(b##0), "r"(b##1) : "cc");
236  #define Acc3WordsBy2(c, d, e) asm ("addq %5, %0; adcq %6, %1; adcq $0, %2;" : "+r"(c), "=r"(e##0), "=r"(e##1) : "1"(d##0), "2"(d##1), "r"(e##0), "r"(e##1) : "cc");
237  #endif
238  #endif
239  #define MultiplyWords(p, a, b) MultiplyWordsLoHi(p##0, p##1, a, b)
240  #ifndef Double3Words
241  #define Double3Words(c, d) d##1 = 2*d##1 + (d##0>>(WORD_BITS-1)); d##0 = 2*d##0 + (c>>(WORD_BITS-1)); c *= 2;
242  #endif
243  #ifndef Acc2WordsBy2
244  #define Acc2WordsBy2(a, b) a##0 += b##0; a##1 += a##0 < b##0; a##1 += b##1;
245  #endif
246  #define AddWithCarry(u, a, b) {word t = a+b; u##0 = t + u##1; u##1 = (t<a) + (u##0<t);}
247  #define SubtractWithBorrow(u, a, b) {word t = a-b; u##0 = t - u##1; u##1 = (t>a) + (u##0>t);}
248  #define GetCarry(u) u##1
249  #define GetBorrow(u) u##1
250 #else
251  #define Declare2Words(x) dword x;
252  #if _MSC_VER >= 1400 && !defined(__INTEL_COMPILER) && (defined(_M_IX86) || defined(_M_X64) || defined(_M_IA64))
253  #define MultiplyWords(p, a, b) p = __emulu(a, b);
254  #else
255  #define MultiplyWords(p, a, b) p = (dword)a*b;
256  #endif
257  #define AssignWord(a, b) a = b;
258  #define Add2WordsBy1(a, b, c) a = b + c;
259  #define Acc2WordsBy2(a, b) a += b;
260  #define LowWord(a) word(a)
261  #define HighWord(a) word(a>>WORD_BITS)
262  #define Double3Words(c, d) d = 2*d + (c>>(WORD_BITS-1)); c *= 2;
263  #define AddWithCarry(u, a, b) u = dword(a) + b + GetCarry(u);
264  #define SubtractWithBorrow(u, a, b) u = dword(a) - b - GetBorrow(u);
265  #define GetCarry(u) HighWord(u)
266  #define GetBorrow(u) word(u>>(WORD_BITS*2-1))
267 #endif
268 #ifndef MulAcc
269  #define MulAcc(c, d, a, b) MultiplyWords(p, a, b); Acc2WordsBy1(p, c); c = LowWord(p); Acc2WordsBy1(d, HighWord(p));
270 #endif
271 #ifndef Acc2WordsBy1
272  #define Acc2WordsBy1(a, b) Add2WordsBy1(a, a, b)
273 #endif
274 #ifndef Acc3WordsBy2
275  #define Acc3WordsBy2(c, d, e) Acc2WordsBy1(e, c); c = LowWord(e); Add2WordsBy1(e, d, HighWord(e));
276 #endif
277 
278 class DWord
279 {
280 public:
281 #if defined(CRYPTOPP_NATIVE_DWORD_AVAILABLE)
282  DWord() {std::memset(&m_whole, 0x00, sizeof(m_whole));}
283 #else
284  DWord() {std::memset(&m_halfs, 0x00, sizeof(m_halfs));}
285 #endif
286 
287 #ifdef CRYPTOPP_NATIVE_DWORD_AVAILABLE
288  explicit DWord(word low) : m_whole(low) { }
289 #else
290  explicit DWord(word low)
291  {
292  m_halfs.high = 0;
293  m_halfs.low = low;
294  }
295 #endif
296 
297 #if defined(CRYPTOPP_NATIVE_DWORD_AVAILABLE)
298  DWord(word low, word high) : m_whole()
299 #else
300  DWord(word low, word high) : m_halfs()
301 #endif
302  {
303 #if defined(CRYPTOPP_NATIVE_DWORD_AVAILABLE)
304 # if (CRYPTOPP_LITTLE_ENDIAN)
305  const word t[2] = {low,high};
306  std::memcpy(&m_whole, t, sizeof(m_whole));
307 # else
308  const word t[2] = {high,low};
309  std::memcpy(&m_whole, t, sizeof(m_whole));
310 # endif
311 #else
312  m_halfs.low = low;
313  m_halfs.high = high;
314 #endif
315  }
316 
317  static DWord Multiply(word a, word b)
318  {
319  DWord r;
320  #ifdef CRYPTOPP_NATIVE_DWORD_AVAILABLE
321  r.m_whole = (dword)a * b;
322  #elif defined(MultiplyWordsLoHi)
323  MultiplyWordsLoHi(r.m_halfs.low, r.m_halfs.high, a, b);
324  #else
325  CRYPTOPP_ASSERT(0);
326  #endif
327  return r;
328  }
329 
330  static DWord MultiplyAndAdd(word a, word b, word c)
331  {
332  DWord r = Multiply(a, b);
333  return r += c;
334  }
335 
336  DWord & operator+=(word a)
337  {
338  #ifdef CRYPTOPP_NATIVE_DWORD_AVAILABLE
339  m_whole = m_whole + a;
340  #else
341  m_halfs.low += a;
342  m_halfs.high += (m_halfs.low < a);
343  #endif
344  return *this;
345  }
346 
347  DWord operator+(word a)
348  {
349  DWord r;
350  #ifdef CRYPTOPP_NATIVE_DWORD_AVAILABLE
351  r.m_whole = m_whole + a;
352  #else
353  r.m_halfs.low = m_halfs.low + a;
354  r.m_halfs.high = m_halfs.high + (r.m_halfs.low < a);
355  #endif
356  return r;
357  }
358 
359  DWord operator-(DWord a)
360  {
361  DWord r;
362  #ifdef CRYPTOPP_NATIVE_DWORD_AVAILABLE
363  r.m_whole = m_whole - a.m_whole;
364  #else
365  r.m_halfs.low = m_halfs.low - a.m_halfs.low;
366  r.m_halfs.high = m_halfs.high - a.m_halfs.high - (r.m_halfs.low > m_halfs.low);
367  #endif
368  return r;
369  }
370 
371  DWord operator-(word a)
372  {
373  DWord r;
374  #ifdef CRYPTOPP_NATIVE_DWORD_AVAILABLE
375  r.m_whole = m_whole - a;
376  #else
377  r.m_halfs.low = m_halfs.low - a;
378  r.m_halfs.high = m_halfs.high - (r.m_halfs.low > m_halfs.low);
379  #endif
380  return r;
381  }
382 
383  // returns quotient, which must fit in a word
384  word operator/(word divisor);
385 
386  word operator%(word a);
387 
388  bool operator!() const
389  {
390  #ifdef CRYPTOPP_NATIVE_DWORD_AVAILABLE
391  return !m_whole;
392  #else
393  return !m_halfs.high && !m_halfs.low;
394  #endif
395  }
396 
397  // TODO: When NATIVE_DWORD is in effect, we access high and low, which are inactive
398  // union members, and that's UB. Also see http://stackoverflow.com/q/11373203.
399  word GetLowHalf() const {return m_halfs.low;}
400  word GetHighHalf() const {return m_halfs.high;}
401  word GetHighHalfAsBorrow() const {return 0-m_halfs.high;}
402 
403 private:
404  // Issue 274, "Types cannot be declared in anonymous union",
405  // http://github.com/weidai11/cryptopp/issues/274
406  // Thanks to Martin Bonner at http://stackoverflow.com/a/39507183
407  struct half_words
408  {
409  #if (CRYPTOPP_LITTLE_ENDIAN)
410  word low;
411  word high;
412  #else
413  word high;
414  word low;
415  #endif
416  };
417  union
418  {
419  #ifdef CRYPTOPP_NATIVE_DWORD_AVAILABLE
420  dword m_whole;
421  #endif
422  half_words m_halfs;
423  };
424 };
425 
426 class Word
427 {
428 public:
429  Word() : m_whole(0) {}
430  Word(word value) : m_whole(value) {}
431  Word(hword low, hword high) : m_whole(low | (word(high) << (WORD_BITS/2))) {}
432 
433  static Word Multiply(hword a, hword b)
434  {
435  Word r;
436  r.m_whole = (word)a * b;
437  return r;
438  }
439 
440  Word operator-(Word a)
441  {
442  Word r;
443  r.m_whole = m_whole - a.m_whole;
444  return r;
445  }
446 
447  Word operator-(hword a)
448  {
449  Word r;
450  r.m_whole = m_whole - a;
451  return r;
452  }
453 
454  // returns quotient, which must fit in a word
455  hword operator/(hword divisor)
456  {
457  return hword(m_whole / divisor);
458  }
459 
460  bool operator!() const
461  {
462  return !m_whole;
463  }
464 
465  word GetWhole() const {return m_whole;}
466  hword GetLowHalf() const {return hword(m_whole);}
467  hword GetHighHalf() const {return hword(m_whole>>(WORD_BITS/2));}
468  hword GetHighHalfAsBorrow() const {return 0-hword(m_whole>>(WORD_BITS/2));}
469 
470 private:
471  word m_whole;
472 };
473 
474 // do a 3 word by 2 word divide, returns quotient and leaves remainder in A
475 template <class S, class D>
476 S DivideThreeWordsByTwo(S *A, S B0, S B1, D *dummy=NULLPTR)
477 {
478  CRYPTOPP_UNUSED(dummy);
479 
480  // Assert {A[2],A[1]} < {B1,B0}, so quotient can fit in a S
481  CRYPTOPP_ASSERT(A[2] < B1 || (A[2]==B1 && A[1] < B0));
482 
483  // Profiling guided the flow below.
484 
485  // estimate the quotient: do a 2 S by 1 S divide.
486  S Q; bool pre = (S(B1+1) == 0);
487  if (B1 > 0 && !pre)
488  Q = D(A[1], A[2]) / S(B1+1);
489  else if (pre)
490  Q = A[2];
491  else
492  Q = D(A[0], A[1]) / B0;
493 
494  // now subtract Q*B from A
495  D p = D::Multiply(B0, Q);
496  D u = (D) A[0] - p.GetLowHalf();
497  A[0] = u.GetLowHalf();
498  u = (D) A[1] - p.GetHighHalf() - u.GetHighHalfAsBorrow() - D::Multiply(B1, Q);
499  A[1] = u.GetLowHalf();
500  A[2] += u.GetHighHalf();
501 
502  // Q <= actual quotient, so fix it
503  while (A[2] || A[1] > B1 || (A[1]==B1 && A[0]>=B0))
504  {
505  u = (D) A[0] - B0;
506  A[0] = u.GetLowHalf();
507  u = (D) A[1] - B1 - u.GetHighHalfAsBorrow();
508  A[1] = u.GetLowHalf();
509  A[2] += u.GetHighHalf();
510  Q++;
511  CRYPTOPP_ASSERT(Q); // shouldn't overflow
512  }
513 
514  return Q;
515 }
516 
517 // do a 4 word by 2 word divide, returns 2 word quotient in Q0 and Q1
518 template <class S, class D>
519 inline D DivideFourWordsByTwo(S *T, const D &Al, const D &Ah, const D &B)
520 {
521  // Profiling guided the flow below.
522 
523  if (!!B)
524  {
525  S Q[2];
526  T[0] = Al.GetLowHalf();
527  T[1] = Al.GetHighHalf();
528  T[2] = Ah.GetLowHalf();
529  T[3] = Ah.GetHighHalf();
530  Q[1] = DivideThreeWordsByTwo<S, D>(T+1, B.GetLowHalf(), B.GetHighHalf());
531  Q[0] = DivideThreeWordsByTwo<S, D>(T, B.GetLowHalf(), B.GetHighHalf());
532  return D(Q[0], Q[1]);
533  }
534  else // if divisor is 0, we assume divisor==2**(2*WORD_BITS)
535  {
536  return D(Ah.GetLowHalf(), Ah.GetHighHalf());
537  }
538 }
539 
540 // returns quotient, which must fit in a word
541 inline word DWord::operator/(word a)
542 {
543  #ifdef CRYPTOPP_NATIVE_DWORD_AVAILABLE
544  return word(m_whole / a);
545  #else
546  hword r[4];
547  return DivideFourWordsByTwo<hword, Word>(r, m_halfs.low, m_halfs.high, a).GetWhole();
548  #endif
549 }
550 
551 inline word DWord::operator%(word a)
552 {
553  #ifdef CRYPTOPP_NATIVE_DWORD_AVAILABLE
554  return word(m_whole % a);
555  #else
556  if (a < (word(1) << (WORD_BITS/2)))
557  {
558  hword h = hword(a);
559  word r = m_halfs.high % h;
560  r = ((m_halfs.low >> (WORD_BITS/2)) + (r << (WORD_BITS/2))) % h;
561  return hword((hword(m_halfs.low) + (r << (WORD_BITS/2))) % h);
562  }
563  else
564  {
565  hword r[4];
566  DivideFourWordsByTwo<hword, Word>(r, m_halfs.low, m_halfs.high, a);
567  return Word(r[0], r[1]).GetWhole();
568  }
569  #endif
570 }
571 
572 // ********************************************************
573 
574 // Use some tricks to share assembly code between MSVC, GCC, Clang and Sun CC.
575 #if defined(__GNUC__)
576  #define AddPrologue \
577  int result; \
578  __asm__ __volatile__ \
579  ( \
580  INTEL_NOPREFIX
581  #define AddEpilogue \
582  ATT_PREFIX \
583  : "=a" (result)\
584  : "d" (C), "a" (A), "D" (B), "c" (N) \
585  : "%esi", "memory", "cc" \
586  );\
587  return result;
588  #define MulPrologue \
589  __asm__ __volatile__ \
590  ( \
591  INTEL_NOPREFIX \
592  AS1( push ebx) \
593  AS2( mov ebx, edx)
594  #define MulEpilogue \
595  AS1( pop ebx) \
596  ATT_PREFIX \
597  : \
598  : "d" (s_maskLow16), "c" (C), "a" (A), "D" (B) \
599  : "%esi", "memory", "cc" \
600  );
601  #define SquPrologue MulPrologue
602  #define SquEpilogue \
603  AS1( pop ebx) \
604  ATT_PREFIX \
605  : \
606  : "d" (s_maskLow16), "c" (C), "a" (A) \
607  : "%esi", "%edi", "memory", "cc" \
608  );
609  #define TopPrologue MulPrologue
610  #define TopEpilogue \
611  AS1( pop ebx) \
612  ATT_PREFIX \
613  : \
614  : "d" (s_maskLow16), "c" (C), "a" (A), "D" (B), "S" (L) \
615  : "memory", "cc" \
616  );
617 #else
618  #define AddPrologue \
619  __asm push edi \
620  __asm push esi \
621  __asm mov eax, [esp+12] \
622  __asm mov edi, [esp+16]
623  #define AddEpilogue \
624  __asm pop esi \
625  __asm pop edi \
626  __asm ret 8
627  #define SaveEBX
628  #define RestoreEBX
629  #define SquPrologue \
630  AS2( mov eax, A) \
631  AS2( mov ecx, C) \
632  SaveEBX \
633  AS2( lea ebx, s_maskLow16)
634  #define MulPrologue \
635  AS2( mov eax, A) \
636  AS2( mov edi, B) \
637  AS2( mov ecx, C) \
638  SaveEBX \
639  AS2( lea ebx, s_maskLow16)
640  #define TopPrologue \
641  AS2( mov eax, A) \
642  AS2( mov edi, B) \
643  AS2( mov ecx, C) \
644  AS2( mov esi, L) \
645  SaveEBX \
646  AS2( lea ebx, s_maskLow16)
647  #define SquEpilogue RestoreEBX
648  #define MulEpilogue RestoreEBX
649  #define TopEpilogue RestoreEBX
650 #endif
651 
652 #ifdef CRYPTOPP_X64_MASM_AVAILABLE
653 extern "C" {
654 int Baseline_Add(size_t N, word *C, const word *A, const word *B);
655 int Baseline_Sub(size_t N, word *C, const word *A, const word *B);
656 }
657 #elif defined(CRYPTOPP_X64_ASM_AVAILABLE) && defined(__GNUC__) && defined(CRYPTOPP_WORD128_AVAILABLE)
658 int Baseline_Add(size_t N, word *C, const word *A, const word *B)
659 {
660  word result;
661  __asm__ __volatile__
662  (
663  INTEL_NOPREFIX
664  AS1( neg %1)
665  ASJ( jz, 1, f)
666  AS2( mov %0,[%3+8*%1])
667  AS2( add %0,[%4+8*%1])
668  AS2( mov [%2+8*%1],%0)
669  ASL(0)
670  AS2( mov %0,[%3+8*%1+8])
671  AS2( adc %0,[%4+8*%1+8])
672  AS2( mov [%2+8*%1+8],%0)
673  AS2( lea %1,[%1+2])
674  ASJ( jrcxz, 1, f)
675  AS2( mov %0,[%3+8*%1])
676  AS2( adc %0,[%4+8*%1])
677  AS2( mov [%2+8*%1],%0)
678  ASJ( jmp, 0, b)
679  ASL(1)
680  AS2( mov %0, 0)
681  AS2( adc %0, %0)
682  ATT_NOPREFIX
683  : "=&r" (result), "+c" (N)
684  : "r" (C+N), "r" (A+N), "r" (B+N)
685  : "memory", "cc"
686  );
687  return (int)result;
688 }
689 
690 int Baseline_Sub(size_t N, word *C, const word *A, const word *B)
691 {
692  word result;
693  __asm__ __volatile__
694  (
695  INTEL_NOPREFIX
696  AS1( neg %1)
697  ASJ( jz, 1, f)
698  AS2( mov %0,[%3+8*%1])
699  AS2( sub %0,[%4+8*%1])
700  AS2( mov [%2+8*%1],%0)
701  ASL(0)
702  AS2( mov %0,[%3+8*%1+8])
703  AS2( sbb %0,[%4+8*%1+8])
704  AS2( mov [%2+8*%1+8],%0)
705  AS2( lea %1,[%1+2])
706  ASJ( jrcxz, 1, f)
707  AS2( mov %0,[%3+8*%1])
708  AS2( sbb %0,[%4+8*%1])
709  AS2( mov [%2+8*%1],%0)
710  ASJ( jmp, 0, b)
711  ASL(1)
712  AS2( mov %0, 0)
713  AS2( adc %0, %0)
714  ATT_NOPREFIX
715  : "=&r" (result), "+c" (N)
716  : "r" (C+N), "r" (A+N), "r" (B+N)
717  : "memory", "cc"
718  );
719  return (int)result;
720 }
721 #elif defined(CRYPTOPP_X86_ASM_AVAILABLE) && CRYPTOPP_BOOL_X86
722 CRYPTOPP_NAKED int CRYPTOPP_FASTCALL Baseline_Add(size_t N, word *C, const word *A, const word *B)
723 {
724  AddPrologue
725 
726  // now: eax = A, edi = B, edx = C, ecx = N
727  AS2( lea eax, [eax+4*ecx])
728  AS2( lea edi, [edi+4*ecx])
729  AS2( lea edx, [edx+4*ecx])
730 
731  AS1( neg ecx) // ecx is negative index
732  AS2( test ecx, 2) // this clears carry flag
733  ASJ( jz, 0, f)
734  AS2( sub ecx, 2)
735  ASJ( jmp, 1, f)
736 
737  ASL(0)
738  ASJ( jecxz, 2, f) // loop until ecx overflows and becomes zero
739  AS2( mov esi,[eax+4*ecx])
740  AS2( adc esi,[edi+4*ecx])
741  AS2( mov [edx+4*ecx],esi)
742  AS2( mov esi,[eax+4*ecx+4])
743  AS2( adc esi,[edi+4*ecx+4])
744  AS2( mov [edx+4*ecx+4],esi)
745  ASL(1)
746  AS2( mov esi,[eax+4*ecx+8])
747  AS2( adc esi,[edi+4*ecx+8])
748  AS2( mov [edx+4*ecx+8],esi)
749  AS2( mov esi,[eax+4*ecx+12])
750  AS2( adc esi,[edi+4*ecx+12])
751  AS2( mov [edx+4*ecx+12],esi)
752 
753  AS2( lea ecx,[ecx+4]) // advance index, avoid inc which causes slowdown on Intel Core 2
754  ASJ( jmp, 0, b)
755 
756  ASL(2)
757  AS2( mov eax, 0)
758  AS1( setc al) // store carry into eax (return result register)
759 
760  AddEpilogue
761 
762  // http://github.com/weidai11/cryptopp/issues/340
763  CRYPTOPP_UNUSED(A); CRYPTOPP_UNUSED(B);
764  CRYPTOPP_UNUSED(C); CRYPTOPP_UNUSED(N);
765 }
766 
767 CRYPTOPP_NAKED int CRYPTOPP_FASTCALL Baseline_Sub(size_t N, word *C, const word *A, const word *B)
768 {
769  AddPrologue
770 
771  // now: eax = A, edi = B, edx = C, ecx = N
772  AS2( lea eax, [eax+4*ecx])
773  AS2( lea edi, [edi+4*ecx])
774  AS2( lea edx, [edx+4*ecx])
775 
776  AS1( neg ecx) // ecx is negative index
777  AS2( test ecx, 2) // this clears carry flag
778  ASJ( jz, 0, f)
779  AS2( sub ecx, 2)
780  ASJ( jmp, 1, f)
781 
782  ASL(0)
783  ASJ( jecxz, 2, f) // loop until ecx overflows and becomes zero
784  AS2( mov esi,[eax+4*ecx])
785  AS2( sbb esi,[edi+4*ecx])
786  AS2( mov [edx+4*ecx],esi)
787  AS2( mov esi,[eax+4*ecx+4])
788  AS2( sbb esi,[edi+4*ecx+4])
789  AS2( mov [edx+4*ecx+4],esi)
790  ASL(1)
791  AS2( mov esi,[eax+4*ecx+8])
792  AS2( sbb esi,[edi+4*ecx+8])
793  AS2( mov [edx+4*ecx+8],esi)
794  AS2( mov esi,[eax+4*ecx+12])
795  AS2( sbb esi,[edi+4*ecx+12])
796  AS2( mov [edx+4*ecx+12],esi)
797 
798  AS2( lea ecx,[ecx+4]) // advance index, avoid inc which causes slowdown on Intel Core 2
799  ASJ( jmp, 0, b)
800 
801  ASL(2)
802  AS2( mov eax, 0)
803  AS1( setc al) // store carry into eax (return result register)
804 
805  AddEpilogue
806 
807  // http://github.com/weidai11/cryptopp/issues/340
808  CRYPTOPP_UNUSED(A); CRYPTOPP_UNUSED(B);
809  CRYPTOPP_UNUSED(C); CRYPTOPP_UNUSED(N);
810 }
811 
812 #if CRYPTOPP_INTEGER_SSE2
813 CRYPTOPP_NAKED int CRYPTOPP_FASTCALL SSE2_Add(size_t N, word *C, const word *A, const word *B)
814 {
815  AddPrologue
816 
817  // now: eax = A, edi = B, edx = C, ecx = N
818  AS2( lea eax, [eax+4*ecx])
819  AS2( lea edi, [edi+4*ecx])
820  AS2( lea edx, [edx+4*ecx])
821 
822  AS1( neg ecx) // ecx is negative index
823  AS2( pxor mm2, mm2)
824  ASJ( jz, 2, f)
825  AS2( test ecx, 2) // this clears carry flag
826  ASJ( jz, 0, f)
827  AS2( sub ecx, 2)
828  ASJ( jmp, 1, f)
829 
830  ASL(0)
831  AS2( movd mm0, DWORD PTR [eax+4*ecx])
832  AS2( movd mm1, DWORD PTR [edi+4*ecx])
833  AS2( paddq mm0, mm1)
834  AS2( paddq mm2, mm0)
835  AS2( movd DWORD PTR [edx+4*ecx], mm2)
836  AS2( psrlq mm2, 32)
837 
838  AS2( movd mm0, DWORD PTR [eax+4*ecx+4])
839  AS2( movd mm1, DWORD PTR [edi+4*ecx+4])
840  AS2( paddq mm0, mm1)
841  AS2( paddq mm2, mm0)
842  AS2( movd DWORD PTR [edx+4*ecx+4], mm2)
843  AS2( psrlq mm2, 32)
844 
845  ASL(1)
846  AS2( movd mm0, DWORD PTR [eax+4*ecx+8])
847  AS2( movd mm1, DWORD PTR [edi+4*ecx+8])
848  AS2( paddq mm0, mm1)
849  AS2( paddq mm2, mm0)
850  AS2( movd DWORD PTR [edx+4*ecx+8], mm2)
851  AS2( psrlq mm2, 32)
852 
853  AS2( movd mm0, DWORD PTR [eax+4*ecx+12])
854  AS2( movd mm1, DWORD PTR [edi+4*ecx+12])
855  AS2( paddq mm0, mm1)
856  AS2( paddq mm2, mm0)
857  AS2( movd DWORD PTR [edx+4*ecx+12], mm2)
858  AS2( psrlq mm2, 32)
859 
860  AS2( add ecx, 4)
861  ASJ( jnz, 0, b)
862 
863  ASL(2)
864  AS2( movd eax, mm2)
865  AS1( emms)
866 
867  AddEpilogue
868 
869  // http://github.com/weidai11/cryptopp/issues/340
870  CRYPTOPP_UNUSED(A); CRYPTOPP_UNUSED(B);
871  CRYPTOPP_UNUSED(C); CRYPTOPP_UNUSED(N);
872 }
873 CRYPTOPP_NAKED int CRYPTOPP_FASTCALL SSE2_Sub(size_t N, word *C, const word *A, const word *B)
874 {
875  AddPrologue
876 
877  // now: eax = A, edi = B, edx = C, ecx = N
878  AS2( lea eax, [eax+4*ecx])
879  AS2( lea edi, [edi+4*ecx])
880  AS2( lea edx, [edx+4*ecx])
881 
882  AS1( neg ecx) // ecx is negative index
883  AS2( pxor mm2, mm2)
884  ASJ( jz, 2, f)
885  AS2( test ecx, 2) // this clears carry flag
886  ASJ( jz, 0, f)
887  AS2( sub ecx, 2)
888  ASJ( jmp, 1, f)
889 
890  ASL(0)
891  AS2( movd mm0, DWORD PTR [eax+4*ecx])
892  AS2( movd mm1, DWORD PTR [edi+4*ecx])
893  AS2( psubq mm0, mm1)
894  AS2( psubq mm0, mm2)
895  AS2( movd DWORD PTR [edx+4*ecx], mm0)
896  AS2( psrlq mm0, 63)
897 
898  AS2( movd mm2, DWORD PTR [eax+4*ecx+4])
899  AS2( movd mm1, DWORD PTR [edi+4*ecx+4])
900  AS2( psubq mm2, mm1)
901  AS2( psubq mm2, mm0)
902  AS2( movd DWORD PTR [edx+4*ecx+4], mm2)
903  AS2( psrlq mm2, 63)
904 
905  ASL(1)
906  AS2( movd mm0, DWORD PTR [eax+4*ecx+8])
907  AS2( movd mm1, DWORD PTR [edi+4*ecx+8])
908  AS2( psubq mm0, mm1)
909  AS2( psubq mm0, mm2)
910  AS2( movd DWORD PTR [edx+4*ecx+8], mm0)
911  AS2( psrlq mm0, 63)
912 
913  AS2( movd mm2, DWORD PTR [eax+4*ecx+12])
914  AS2( movd mm1, DWORD PTR [edi+4*ecx+12])
915  AS2( psubq mm2, mm1)
916  AS2( psubq mm2, mm0)
917  AS2( movd DWORD PTR [edx+4*ecx+12], mm2)
918  AS2( psrlq mm2, 63)
919 
920  AS2( add ecx, 4)
921  ASJ( jnz, 0, b)
922 
923  ASL(2)
924  AS2( movd eax, mm2)
925  AS1( emms)
926 
927  AddEpilogue
928 
929  // http://github.com/weidai11/cryptopp/issues/340
930  CRYPTOPP_UNUSED(A); CRYPTOPP_UNUSED(B);
931  CRYPTOPP_UNUSED(C); CRYPTOPP_UNUSED(N);
932 }
933 #endif // CRYPTOPP_INTEGER_SSE2
934 #else // CRYPTOPP_SSE2_ASM_AVAILABLE
935 int CRYPTOPP_FASTCALL Baseline_Add(size_t N, word *C, const word *A, const word *B)
936 {
937  CRYPTOPP_ASSERT (N%2 == 0);
938 
939  Declare2Words(u);
940  AssignWord(u, 0);
941  for (size_t i=0; i<N; i+=2)
942  {
943  AddWithCarry(u, A[i], B[i]);
944  C[i] = LowWord(u);
945  AddWithCarry(u, A[i+1], B[i+1]);
946  C[i+1] = LowWord(u);
947  }
948  return int(GetCarry(u));
949 }
950 
951 int CRYPTOPP_FASTCALL Baseline_Sub(size_t N, word *C, const word *A, const word *B)
952 {
953  CRYPTOPP_ASSERT (N%2 == 0);
954 
955  Declare2Words(u);
956  AssignWord(u, 0);
957  for (size_t i=0; i<N; i+=2)
958  {
959  SubtractWithBorrow(u, A[i], B[i]);
960  C[i] = LowWord(u);
961  SubtractWithBorrow(u, A[i+1], B[i+1]);
962  C[i+1] = LowWord(u);
963  }
964  return int(GetBorrow(u));
965 }
966 #endif
967 
968 static word LinearMultiply(word *C, const word *AA, word B, size_t N)
969 {
970  // http://github.com/weidai11/cryptopp/issues/188
972 
973  word carry=0;
974  for(unsigned i=0; i<N; i++)
975  {
976  Declare2Words(p);
977  MultiplyWords(p, A[i], B);
978  Acc2WordsBy1(p, carry);
979  C[i] = LowWord(p);
980  carry = HighWord(p);
981  }
982  return carry;
983 }
984 
985 #ifndef CRYPTOPP_DOXYGEN_PROCESSING
986 
987 #define Mul_2 \
988  Mul_Begin(2) \
989  Mul_SaveAcc(0, 0, 1) Mul_Acc(1, 0) \
990  Mul_End(1, 1)
991 
992 #define Mul_4 \
993  Mul_Begin(4) \
994  Mul_SaveAcc(0, 0, 1) Mul_Acc(1, 0) \
995  Mul_SaveAcc(1, 0, 2) Mul_Acc(1, 1) Mul_Acc(2, 0) \
996  Mul_SaveAcc(2, 0, 3) Mul_Acc(1, 2) Mul_Acc(2, 1) Mul_Acc(3, 0) \
997  Mul_SaveAcc(3, 1, 3) Mul_Acc(2, 2) Mul_Acc(3, 1) \
998  Mul_SaveAcc(4, 2, 3) Mul_Acc(3, 2) \
999  Mul_End(5, 3)
1000 
1001 #define Mul_8 \
1002  Mul_Begin(8) \
1003  Mul_SaveAcc(0, 0, 1) Mul_Acc(1, 0) \
1004  Mul_SaveAcc(1, 0, 2) Mul_Acc(1, 1) Mul_Acc(2, 0) \
1005  Mul_SaveAcc(2, 0, 3) Mul_Acc(1, 2) Mul_Acc(2, 1) Mul_Acc(3, 0) \
1006  Mul_SaveAcc(3, 0, 4) Mul_Acc(1, 3) Mul_Acc(2, 2) Mul_Acc(3, 1) Mul_Acc(4, 0) \
1007  Mul_SaveAcc(4, 0, 5) Mul_Acc(1, 4) Mul_Acc(2, 3) Mul_Acc(3, 2) Mul_Acc(4, 1) Mul_Acc(5, 0) \
1008  Mul_SaveAcc(5, 0, 6) Mul_Acc(1, 5) Mul_Acc(2, 4) Mul_Acc(3, 3) Mul_Acc(4, 2) Mul_Acc(5, 1) Mul_Acc(6, 0) \
1009  Mul_SaveAcc(6, 0, 7) Mul_Acc(1, 6) Mul_Acc(2, 5) Mul_Acc(3, 4) Mul_Acc(4, 3) Mul_Acc(5, 2) Mul_Acc(6, 1) Mul_Acc(7, 0) \
1010  Mul_SaveAcc(7, 1, 7) Mul_Acc(2, 6) Mul_Acc(3, 5) Mul_Acc(4, 4) Mul_Acc(5, 3) Mul_Acc(6, 2) Mul_Acc(7, 1) \
1011  Mul_SaveAcc(8, 2, 7) Mul_Acc(3, 6) Mul_Acc(4, 5) Mul_Acc(5, 4) Mul_Acc(6, 3) Mul_Acc(7, 2) \
1012  Mul_SaveAcc(9, 3, 7) Mul_Acc(4, 6) Mul_Acc(5, 5) Mul_Acc(6, 4) Mul_Acc(7, 3) \
1013  Mul_SaveAcc(10, 4, 7) Mul_Acc(5, 6) Mul_Acc(6, 5) Mul_Acc(7, 4) \
1014  Mul_SaveAcc(11, 5, 7) Mul_Acc(6, 6) Mul_Acc(7, 5) \
1015  Mul_SaveAcc(12, 6, 7) Mul_Acc(7, 6) \
1016  Mul_End(13, 7)
1017 
1018 #define Mul_16 \
1019  Mul_Begin(16) \
1020  Mul_SaveAcc(0, 0, 1) Mul_Acc(1, 0) \
1021  Mul_SaveAcc(1, 0, 2) Mul_Acc(1, 1) Mul_Acc(2, 0) \
1022  Mul_SaveAcc(2, 0, 3) Mul_Acc(1, 2) Mul_Acc(2, 1) Mul_Acc(3, 0) \
1023  Mul_SaveAcc(3, 0, 4) Mul_Acc(1, 3) Mul_Acc(2, 2) Mul_Acc(3, 1) Mul_Acc(4, 0) \
1024  Mul_SaveAcc(4, 0, 5) Mul_Acc(1, 4) Mul_Acc(2, 3) Mul_Acc(3, 2) Mul_Acc(4, 1) Mul_Acc(5, 0) \
1025  Mul_SaveAcc(5, 0, 6) Mul_Acc(1, 5) Mul_Acc(2, 4) Mul_Acc(3, 3) Mul_Acc(4, 2) Mul_Acc(5, 1) Mul_Acc(6, 0) \
1026  Mul_SaveAcc(6, 0, 7) Mul_Acc(1, 6) Mul_Acc(2, 5) Mul_Acc(3, 4) Mul_Acc(4, 3) Mul_Acc(5, 2) Mul_Acc(6, 1) Mul_Acc(7, 0) \
1027  Mul_SaveAcc(7, 0, 8) Mul_Acc(1, 7) Mul_Acc(2, 6) Mul_Acc(3, 5) Mul_Acc(4, 4) Mul_Acc(5, 3) Mul_Acc(6, 2) Mul_Acc(7, 1) Mul_Acc(8, 0) \
1028  Mul_SaveAcc(8, 0, 9) Mul_Acc(1, 8) Mul_Acc(2, 7) Mul_Acc(3, 6) Mul_Acc(4, 5) Mul_Acc(5, 4) Mul_Acc(6, 3) Mul_Acc(7, 2) Mul_Acc(8, 1) Mul_Acc(9, 0) \
1029  Mul_SaveAcc(9, 0, 10) Mul_Acc(1, 9) Mul_Acc(2, 8) Mul_Acc(3, 7) Mul_Acc(4, 6) Mul_Acc(5, 5) Mul_Acc(6, 4) Mul_Acc(7, 3) Mul_Acc(8, 2) Mul_Acc(9, 1) Mul_Acc(10, 0) \
1030  Mul_SaveAcc(10, 0, 11) Mul_Acc(1, 10) Mul_Acc(2, 9) Mul_Acc(3, 8) Mul_Acc(4, 7) Mul_Acc(5, 6) Mul_Acc(6, 5) Mul_Acc(7, 4) Mul_Acc(8, 3) Mul_Acc(9, 2) Mul_Acc(10, 1) Mul_Acc(11, 0) \
1031  Mul_SaveAcc(11, 0, 12) Mul_Acc(1, 11) Mul_Acc(2, 10) Mul_Acc(3, 9) Mul_Acc(4, 8) Mul_Acc(5, 7) Mul_Acc(6, 6) Mul_Acc(7, 5) Mul_Acc(8, 4) Mul_Acc(9, 3) Mul_Acc(10, 2) Mul_Acc(11, 1) Mul_Acc(12, 0) \
1032  Mul_SaveAcc(12, 0, 13) Mul_Acc(1, 12) Mul_Acc(2, 11) Mul_Acc(3, 10) Mul_Acc(4, 9) Mul_Acc(5, 8) Mul_Acc(6, 7) Mul_Acc(7, 6) Mul_Acc(8, 5) Mul_Acc(9, 4) Mul_Acc(10, 3) Mul_Acc(11, 2) Mul_Acc(12, 1) Mul_Acc(13, 0) \
1033  Mul_SaveAcc(13, 0, 14) Mul_Acc(1, 13) Mul_Acc(2, 12) Mul_Acc(3, 11) Mul_Acc(4, 10) Mul_Acc(5, 9) Mul_Acc(6, 8) Mul_Acc(7, 7) Mul_Acc(8, 6) Mul_Acc(9, 5) Mul_Acc(10, 4) Mul_Acc(11, 3) Mul_Acc(12, 2) Mul_Acc(13, 1) Mul_Acc(14, 0) \
1034  Mul_SaveAcc(14, 0, 15) Mul_Acc(1, 14) Mul_Acc(2, 13) Mul_Acc(3, 12) Mul_Acc(4, 11) Mul_Acc(5, 10) Mul_Acc(6, 9) Mul_Acc(7, 8) Mul_Acc(8, 7) Mul_Acc(9, 6) Mul_Acc(10, 5) Mul_Acc(11, 4) Mul_Acc(12, 3) Mul_Acc(13, 2) Mul_Acc(14, 1) Mul_Acc(15, 0) \
1035  Mul_SaveAcc(15, 1, 15) Mul_Acc(2, 14) Mul_Acc(3, 13) Mul_Acc(4, 12) Mul_Acc(5, 11) Mul_Acc(6, 10) Mul_Acc(7, 9) Mul_Acc(8, 8) Mul_Acc(9, 7) Mul_Acc(10, 6) Mul_Acc(11, 5) Mul_Acc(12, 4) Mul_Acc(13, 3) Mul_Acc(14, 2) Mul_Acc(15, 1) \
1036  Mul_SaveAcc(16, 2, 15) Mul_Acc(3, 14) Mul_Acc(4, 13) Mul_Acc(5, 12) Mul_Acc(6, 11) Mul_Acc(7, 10) Mul_Acc(8, 9) Mul_Acc(9, 8) Mul_Acc(10, 7) Mul_Acc(11, 6) Mul_Acc(12, 5) Mul_Acc(13, 4) Mul_Acc(14, 3) Mul_Acc(15, 2) \
1037  Mul_SaveAcc(17, 3, 15) Mul_Acc(4, 14) Mul_Acc(5, 13) Mul_Acc(6, 12) Mul_Acc(7, 11) Mul_Acc(8, 10) Mul_Acc(9, 9) Mul_Acc(10, 8) Mul_Acc(11, 7) Mul_Acc(12, 6) Mul_Acc(13, 5) Mul_Acc(14, 4) Mul_Acc(15, 3) \
1038  Mul_SaveAcc(18, 4, 15) Mul_Acc(5, 14) Mul_Acc(6, 13) Mul_Acc(7, 12) Mul_Acc(8, 11) Mul_Acc(9, 10) Mul_Acc(10, 9) Mul_Acc(11, 8) Mul_Acc(12, 7) Mul_Acc(13, 6) Mul_Acc(14, 5) Mul_Acc(15, 4) \
1039  Mul_SaveAcc(19, 5, 15) Mul_Acc(6, 14) Mul_Acc(7, 13) Mul_Acc(8, 12) Mul_Acc(9, 11) Mul_Acc(10, 10) Mul_Acc(11, 9) Mul_Acc(12, 8) Mul_Acc(13, 7) Mul_Acc(14, 6) Mul_Acc(15, 5) \
1040  Mul_SaveAcc(20, 6, 15) Mul_Acc(7, 14) Mul_Acc(8, 13) Mul_Acc(9, 12) Mul_Acc(10, 11) Mul_Acc(11, 10) Mul_Acc(12, 9) Mul_Acc(13, 8) Mul_Acc(14, 7) Mul_Acc(15, 6) \
1041  Mul_SaveAcc(21, 7, 15) Mul_Acc(8, 14) Mul_Acc(9, 13) Mul_Acc(10, 12) Mul_Acc(11, 11) Mul_Acc(12, 10) Mul_Acc(13, 9) Mul_Acc(14, 8) Mul_Acc(15, 7) \
1042  Mul_SaveAcc(22, 8, 15) Mul_Acc(9, 14) Mul_Acc(10, 13) Mul_Acc(11, 12) Mul_Acc(12, 11) Mul_Acc(13, 10) Mul_Acc(14, 9) Mul_Acc(15, 8) \
1043  Mul_SaveAcc(23, 9, 15) Mul_Acc(10, 14) Mul_Acc(11, 13) Mul_Acc(12, 12) Mul_Acc(13, 11) Mul_Acc(14, 10) Mul_Acc(15, 9) \
1044  Mul_SaveAcc(24, 10, 15) Mul_Acc(11, 14) Mul_Acc(12, 13) Mul_Acc(13, 12) Mul_Acc(14, 11) Mul_Acc(15, 10) \
1045  Mul_SaveAcc(25, 11, 15) Mul_Acc(12, 14) Mul_Acc(13, 13) Mul_Acc(14, 12) Mul_Acc(15, 11) \
1046  Mul_SaveAcc(26, 12, 15) Mul_Acc(13, 14) Mul_Acc(14, 13) Mul_Acc(15, 12) \
1047  Mul_SaveAcc(27, 13, 15) Mul_Acc(14, 14) Mul_Acc(15, 13) \
1048  Mul_SaveAcc(28, 14, 15) Mul_Acc(15, 14) \
1049  Mul_End(29, 15)
1050 
1051 #define Squ_2 \
1052  Squ_Begin(2) \
1053  Squ_End(2)
1054 
1055 #define Squ_4 \
1056  Squ_Begin(4) \
1057  Squ_SaveAcc(1, 0, 2) Squ_Diag(1) \
1058  Squ_SaveAcc(2, 0, 3) Squ_Acc(1, 2) Squ_NonDiag \
1059  Squ_SaveAcc(3, 1, 3) Squ_Diag(2) \
1060  Squ_SaveAcc(4, 2, 3) Squ_NonDiag \
1061  Squ_End(4)
1062 
1063 #define Squ_8 \
1064  Squ_Begin(8) \
1065  Squ_SaveAcc(1, 0, 2) Squ_Diag(1) \
1066  Squ_SaveAcc(2, 0, 3) Squ_Acc(1, 2) Squ_NonDiag \
1067  Squ_SaveAcc(3, 0, 4) Squ_Acc(1, 3) Squ_Diag(2) \
1068  Squ_SaveAcc(4, 0, 5) Squ_Acc(1, 4) Squ_Acc(2, 3) Squ_NonDiag \
1069  Squ_SaveAcc(5, 0, 6) Squ_Acc(1, 5) Squ_Acc(2, 4) Squ_Diag(3) \
1070  Squ_SaveAcc(6, 0, 7) Squ_Acc(1, 6) Squ_Acc(2, 5) Squ_Acc(3, 4) Squ_NonDiag \
1071  Squ_SaveAcc(7, 1, 7) Squ_Acc(2, 6) Squ_Acc(3, 5) Squ_Diag(4) \
1072  Squ_SaveAcc(8, 2, 7) Squ_Acc(3, 6) Squ_Acc(4, 5) Squ_NonDiag \
1073  Squ_SaveAcc(9, 3, 7) Squ_Acc(4, 6) Squ_Diag(5) \
1074  Squ_SaveAcc(10, 4, 7) Squ_Acc(5, 6) Squ_NonDiag \
1075  Squ_SaveAcc(11, 5, 7) Squ_Diag(6) \
1076  Squ_SaveAcc(12, 6, 7) Squ_NonDiag \
1077  Squ_End(8)
1078 
1079 #define Squ_16 \
1080  Squ_Begin(16) \
1081  Squ_SaveAcc(1, 0, 2) Squ_Diag(1) \
1082  Squ_SaveAcc(2, 0, 3) Squ_Acc(1, 2) Squ_NonDiag \
1083  Squ_SaveAcc(3, 0, 4) Squ_Acc(1, 3) Squ_Diag(2) \
1084  Squ_SaveAcc(4, 0, 5) Squ_Acc(1, 4) Squ_Acc(2, 3) Squ_NonDiag \
1085  Squ_SaveAcc(5, 0, 6) Squ_Acc(1, 5) Squ_Acc(2, 4) Squ_Diag(3) \
1086  Squ_SaveAcc(6, 0, 7) Squ_Acc(1, 6) Squ_Acc(2, 5) Squ_Acc(3, 4) Squ_NonDiag \
1087  Squ_SaveAcc(7, 0, 8) Squ_Acc(1, 7) Squ_Acc(2, 6) Squ_Acc(3, 5) Squ_Diag(4) \
1088  Squ_SaveAcc(8, 0, 9) Squ_Acc(1, 8) Squ_Acc(2, 7) Squ_Acc(3, 6) Squ_Acc(4, 5) Squ_NonDiag \
1089  Squ_SaveAcc(9, 0, 10) Squ_Acc(1, 9) Squ_Acc(2, 8) Squ_Acc(3, 7) Squ_Acc(4, 6) Squ_Diag(5) \
1090  Squ_SaveAcc(10, 0, 11) Squ_Acc(1, 10) Squ_Acc(2, 9) Squ_Acc(3, 8) Squ_Acc(4, 7) Squ_Acc(5, 6) Squ_NonDiag \
1091  Squ_SaveAcc(11, 0, 12) Squ_Acc(1, 11) Squ_Acc(2, 10) Squ_Acc(3, 9) Squ_Acc(4, 8) Squ_Acc(5, 7) Squ_Diag(6) \
1092  Squ_SaveAcc(12, 0, 13) Squ_Acc(1, 12) Squ_Acc(2, 11) Squ_Acc(3, 10) Squ_Acc(4, 9) Squ_Acc(5, 8) Squ_Acc(6, 7) Squ_NonDiag \
1093  Squ_SaveAcc(13, 0, 14) Squ_Acc(1, 13) Squ_Acc(2, 12) Squ_Acc(3, 11) Squ_Acc(4, 10) Squ_Acc(5, 9) Squ_Acc(6, 8) Squ_Diag(7) \
1094  Squ_SaveAcc(14, 0, 15) Squ_Acc(1, 14) Squ_Acc(2, 13) Squ_Acc(3, 12) Squ_Acc(4, 11) Squ_Acc(5, 10) Squ_Acc(6, 9) Squ_Acc(7, 8) Squ_NonDiag \
1095  Squ_SaveAcc(15, 1, 15) Squ_Acc(2, 14) Squ_Acc(3, 13) Squ_Acc(4, 12) Squ_Acc(5, 11) Squ_Acc(6, 10) Squ_Acc(7, 9) Squ_Diag(8) \
1096  Squ_SaveAcc(16, 2, 15) Squ_Acc(3, 14) Squ_Acc(4, 13) Squ_Acc(5, 12) Squ_Acc(6, 11) Squ_Acc(7, 10) Squ_Acc(8, 9) Squ_NonDiag \
1097  Squ_SaveAcc(17, 3, 15) Squ_Acc(4, 14) Squ_Acc(5, 13) Squ_Acc(6, 12) Squ_Acc(7, 11) Squ_Acc(8, 10) Squ_Diag(9) \
1098  Squ_SaveAcc(18, 4, 15) Squ_Acc(5, 14) Squ_Acc(6, 13) Squ_Acc(7, 12) Squ_Acc(8, 11) Squ_Acc(9, 10) Squ_NonDiag \
1099  Squ_SaveAcc(19, 5, 15) Squ_Acc(6, 14) Squ_Acc(7, 13) Squ_Acc(8, 12) Squ_Acc(9, 11) Squ_Diag(10) \
1100  Squ_SaveAcc(20, 6, 15) Squ_Acc(7, 14) Squ_Acc(8, 13) Squ_Acc(9, 12) Squ_Acc(10, 11) Squ_NonDiag \
1101  Squ_SaveAcc(21, 7, 15) Squ_Acc(8, 14) Squ_Acc(9, 13) Squ_Acc(10, 12) Squ_Diag(11) \
1102  Squ_SaveAcc(22, 8, 15) Squ_Acc(9, 14) Squ_Acc(10, 13) Squ_Acc(11, 12) Squ_NonDiag \
1103  Squ_SaveAcc(23, 9, 15) Squ_Acc(10, 14) Squ_Acc(11, 13) Squ_Diag(12) \
1104  Squ_SaveAcc(24, 10, 15) Squ_Acc(11, 14) Squ_Acc(12, 13) Squ_NonDiag \
1105  Squ_SaveAcc(25, 11, 15) Squ_Acc(12, 14) Squ_Diag(13) \
1106  Squ_SaveAcc(26, 12, 15) Squ_Acc(13, 14) Squ_NonDiag \
1107  Squ_SaveAcc(27, 13, 15) Squ_Diag(14) \
1108  Squ_SaveAcc(28, 14, 15) Squ_NonDiag \
1109  Squ_End(16)
1110 
1111 #define Bot_2 \
1112  Mul_Begin(2) \
1113  Bot_SaveAcc(0, 0, 1) Bot_Acc(1, 0) \
1114  Bot_End(2)
1115 
1116 #define Bot_4 \
1117  Mul_Begin(4) \
1118  Mul_SaveAcc(0, 0, 1) Mul_Acc(1, 0) \
1119  Mul_SaveAcc(1, 2, 0) Mul_Acc(1, 1) Mul_Acc(0, 2) \
1120  Bot_SaveAcc(2, 0, 3) Bot_Acc(1, 2) Bot_Acc(2, 1) Bot_Acc(3, 0) \
1121  Bot_End(4)
1122 
1123 #define Bot_8 \
1124  Mul_Begin(8) \
1125  Mul_SaveAcc(0, 0, 1) Mul_Acc(1, 0) \
1126  Mul_SaveAcc(1, 0, 2) Mul_Acc(1, 1) Mul_Acc(2, 0) \
1127  Mul_SaveAcc(2, 0, 3) Mul_Acc(1, 2) Mul_Acc(2, 1) Mul_Acc(3, 0) \
1128  Mul_SaveAcc(3, 0, 4) Mul_Acc(1, 3) Mul_Acc(2, 2) Mul_Acc(3, 1) Mul_Acc(4, 0) \
1129  Mul_SaveAcc(4, 0, 5) Mul_Acc(1, 4) Mul_Acc(2, 3) Mul_Acc(3, 2) Mul_Acc(4, 1) Mul_Acc(5, 0) \
1130  Mul_SaveAcc(5, 0, 6) Mul_Acc(1, 5) Mul_Acc(2, 4) Mul_Acc(3, 3) Mul_Acc(4, 2) Mul_Acc(5, 1) Mul_Acc(6, 0) \
1131  Bot_SaveAcc(6, 0, 7) Bot_Acc(1, 6) Bot_Acc(2, 5) Bot_Acc(3, 4) Bot_Acc(4, 3) Bot_Acc(5, 2) Bot_Acc(6, 1) Bot_Acc(7, 0) \
1132  Bot_End(8)
1133 
1134 #define Bot_16 \
1135  Mul_Begin(16) \
1136  Mul_SaveAcc(0, 0, 1) Mul_Acc(1, 0) \
1137  Mul_SaveAcc(1, 0, 2) Mul_Acc(1, 1) Mul_Acc(2, 0) \
1138  Mul_SaveAcc(2, 0, 3) Mul_Acc(1, 2) Mul_Acc(2, 1) Mul_Acc(3, 0) \
1139  Mul_SaveAcc(3, 0, 4) Mul_Acc(1, 3) Mul_Acc(2, 2) Mul_Acc(3, 1) Mul_Acc(4, 0) \
1140  Mul_SaveAcc(4, 0, 5) Mul_Acc(1, 4) Mul_Acc(2, 3) Mul_Acc(3, 2) Mul_Acc(4, 1) Mul_Acc(5, 0) \
1141  Mul_SaveAcc(5, 0, 6) Mul_Acc(1, 5) Mul_Acc(2, 4) Mul_Acc(3, 3) Mul_Acc(4, 2) Mul_Acc(5, 1) Mul_Acc(6, 0) \
1142  Mul_SaveAcc(6, 0, 7) Mul_Acc(1, 6) Mul_Acc(2, 5) Mul_Acc(3, 4) Mul_Acc(4, 3) Mul_Acc(5, 2) Mul_Acc(6, 1) Mul_Acc(7, 0) \
1143  Mul_SaveAcc(7, 0, 8) Mul_Acc(1, 7) Mul_Acc(2, 6) Mul_Acc(3, 5) Mul_Acc(4, 4) Mul_Acc(5, 3) Mul_Acc(6, 2) Mul_Acc(7, 1) Mul_Acc(8, 0) \
1144  Mul_SaveAcc(8, 0, 9) Mul_Acc(1, 8) Mul_Acc(2, 7) Mul_Acc(3, 6) Mul_Acc(4, 5) Mul_Acc(5, 4) Mul_Acc(6, 3) Mul_Acc(7, 2) Mul_Acc(8, 1) Mul_Acc(9, 0) \
1145  Mul_SaveAcc(9, 0, 10) Mul_Acc(1, 9) Mul_Acc(2, 8) Mul_Acc(3, 7) Mul_Acc(4, 6) Mul_Acc(5, 5) Mul_Acc(6, 4) Mul_Acc(7, 3) Mul_Acc(8, 2) Mul_Acc(9, 1) Mul_Acc(10, 0) \
1146  Mul_SaveAcc(10, 0, 11) Mul_Acc(1, 10) Mul_Acc(2, 9) Mul_Acc(3, 8) Mul_Acc(4, 7) Mul_Acc(5, 6) Mul_Acc(6, 5) Mul_Acc(7, 4) Mul_Acc(8, 3) Mul_Acc(9, 2) Mul_Acc(10, 1) Mul_Acc(11, 0) \
1147  Mul_SaveAcc(11, 0, 12) Mul_Acc(1, 11) Mul_Acc(2, 10) Mul_Acc(3, 9) Mul_Acc(4, 8) Mul_Acc(5, 7) Mul_Acc(6, 6) Mul_Acc(7, 5) Mul_Acc(8, 4) Mul_Acc(9, 3) Mul_Acc(10, 2) Mul_Acc(11, 1) Mul_Acc(12, 0) \
1148  Mul_SaveAcc(12, 0, 13) Mul_Acc(1, 12) Mul_Acc(2, 11) Mul_Acc(3, 10) Mul_Acc(4, 9) Mul_Acc(5, 8) Mul_Acc(6, 7) Mul_Acc(7, 6) Mul_Acc(8, 5) Mul_Acc(9, 4) Mul_Acc(10, 3) Mul_Acc(11, 2) Mul_Acc(12, 1) Mul_Acc(13, 0) \
1149  Mul_SaveAcc(13, 0, 14) Mul_Acc(1, 13) Mul_Acc(2, 12) Mul_Acc(3, 11) Mul_Acc(4, 10) Mul_Acc(5, 9) Mul_Acc(6, 8) Mul_Acc(7, 7) Mul_Acc(8, 6) Mul_Acc(9, 5) Mul_Acc(10, 4) Mul_Acc(11, 3) Mul_Acc(12, 2) Mul_Acc(13, 1) Mul_Acc(14, 0) \
1150  Bot_SaveAcc(14, 0, 15) Bot_Acc(1, 14) Bot_Acc(2, 13) Bot_Acc(3, 12) Bot_Acc(4, 11) Bot_Acc(5, 10) Bot_Acc(6, 9) Bot_Acc(7, 8) Bot_Acc(8, 7) Bot_Acc(9, 6) Bot_Acc(10, 5) Bot_Acc(11, 4) Bot_Acc(12, 3) Bot_Acc(13, 2) Bot_Acc(14, 1) Bot_Acc(15, 0) \
1151  Bot_End(16)
1152 
1153 #endif
1154 
1155 #if 0
1156 #define Mul_Begin(n) \
1157  Declare2Words(p) \
1158  Declare2Words(c) \
1159  Declare2Words(d) \
1160  MultiplyWords(p, A[0], B[0]) \
1161  AssignWord(c, LowWord(p)) \
1162  AssignWord(d, HighWord(p))
1163 
1164 #define Mul_Acc(i, j) \
1165  MultiplyWords(p, A[i], B[j]) \
1166  Acc2WordsBy1(c, LowWord(p)) \
1167  Acc2WordsBy1(d, HighWord(p))
1168 
1169 #define Mul_SaveAcc(k, i, j) \
1170  R[k] = LowWord(c); \
1171  Add2WordsBy1(c, d, HighWord(c)) \
1172  MultiplyWords(p, A[i], B[j]) \
1173  AssignWord(d, HighWord(p)) \
1174  Acc2WordsBy1(c, LowWord(p))
1175 
1176 #define Mul_End(n) \
1177  R[2*n-3] = LowWord(c); \
1178  Acc2WordsBy1(d, HighWord(c)) \
1179  MultiplyWords(p, A[n-1], B[n-1])\
1180  Acc2WordsBy2(d, p) \
1181  R[2*n-2] = LowWord(d); \
1182  R[2*n-1] = HighWord(d);
1183 
1184 #define Bot_SaveAcc(k, i, j) \
1185  R[k] = LowWord(c); \
1186  word e = LowWord(d) + HighWord(c); \
1187  e += A[i] * B[j];
1188 
1189 #define Bot_Acc(i, j) \
1190  e += A[i] * B[j];
1191 
1192 #define Bot_End(n) \
1193  R[n-1] = e;
1194 #else
1195 #define Mul_Begin(n) \
1196  Declare2Words(p) \
1197  word c; \
1198  Declare2Words(d) \
1199  MultiplyWords(p, A[0], B[0]) \
1200  c = LowWord(p); \
1201  AssignWord(d, HighWord(p))
1202 
1203 #define Mul_Acc(i, j) \
1204  MulAcc(c, d, A[i], B[j])
1205 
1206 #define Mul_SaveAcc(k, i, j) \
1207  R[k] = c; \
1208  c = LowWord(d); \
1209  AssignWord(d, HighWord(d)) \
1210  MulAcc(c, d, A[i], B[j])
1211 
1212 #define Mul_End(k, i) \
1213  R[k] = c; \
1214  MultiplyWords(p, A[i], B[i]) \
1215  Acc2WordsBy2(p, d) \
1216  R[k+1] = LowWord(p); \
1217  R[k+2] = HighWord(p);
1218 
1219 #define Bot_SaveAcc(k, i, j) \
1220  R[k] = c; \
1221  c = LowWord(d); \
1222  c += A[i] * B[j];
1223 
1224 #define Bot_Acc(i, j) \
1225  c += A[i] * B[j];
1226 
1227 #define Bot_End(n) \
1228  R[n-1] = c;
1229 #endif
1230 
1231 #define Squ_Begin(n) \
1232  Declare2Words(p) \
1233  word c; \
1234  Declare2Words(d) \
1235  Declare2Words(e) \
1236  MultiplyWords(p, A[0], A[0]) \
1237  R[0] = LowWord(p); \
1238  AssignWord(e, HighWord(p)) \
1239  MultiplyWords(p, A[0], A[1]) \
1240  c = LowWord(p); \
1241  AssignWord(d, HighWord(p)) \
1242  Squ_NonDiag \
1243 
1244 #define Squ_NonDiag \
1245  Double3Words(c, d)
1246 
1247 #define Squ_SaveAcc(k, i, j) \
1248  Acc3WordsBy2(c, d, e) \
1249  R[k] = c; \
1250  MultiplyWords(p, A[i], A[j]) \
1251  c = LowWord(p); \
1252  AssignWord(d, HighWord(p)) \
1253 
1254 #define Squ_Acc(i, j) \
1255  MulAcc(c, d, A[i], A[j])
1256 
1257 #define Squ_Diag(i) \
1258  Squ_NonDiag \
1259  MulAcc(c, d, A[i], A[i])
1260 
1261 #define Squ_End(n) \
1262  Acc3WordsBy2(c, d, e) \
1263  R[2*n-3] = c; \
1264  MultiplyWords(p, A[n-1], A[n-1])\
1265  Acc2WordsBy2(p, e) \
1266  R[2*n-2] = LowWord(p); \
1267  R[2*n-1] = HighWord(p);
1268 
1269 
1270 void Baseline_Multiply2(word *R, const word *AA, const word *BB)
1271 {
1272  // http://github.com/weidai11/cryptopp/issues/188
1275 
1276  Mul_2
1277 }
1278 
1279 void Baseline_Multiply4(word *R, const word *AA, const word *BB)
1280 {
1281  // http://github.com/weidai11/cryptopp/issues/188
1284 
1285  Mul_4
1286 }
1287 
1288 void Baseline_Multiply8(word *R, const word *AA, const word *BB)
1289 {
1290  // http://github.com/weidai11/cryptopp/issues/188
1293 
1294  Mul_8
1295 }
1296 
1297 void Baseline_Square2(word *R, const word *AA)
1298 {
1299  // http://github.com/weidai11/cryptopp/issues/188
1301 
1302  Squ_2
1303 }
1304 
1305 void Baseline_Square4(word *R, const word *AA)
1306 {
1307  // http://github.com/weidai11/cryptopp/issues/188
1309 
1310  Squ_4
1311 }
1312 
1313 void Baseline_Square8(word *R, const word *AA)
1314 {
1315  // http://github.com/weidai11/cryptopp/issues/188
1317 
1318  Squ_8
1319 }
1320 
1321 void Baseline_MultiplyBottom2(word *R, const word *AA, const word *BB)
1322 {
1323  // http://github.com/weidai11/cryptopp/issues/188
1326 
1327  Bot_2
1328 
1329 // http://github.com/weidai11/cryptopp/issues/340
1330 #if defined(TWO_64_BIT_WORDS)
1331  CRYPTOPP_UNUSED(d0); CRYPTOPP_UNUSED(d1);
1332 #endif
1333 }
1334 
1335 void Baseline_MultiplyBottom4(word *R, const word *AA, const word *BB)
1336 {
1337  // http://github.com/weidai11/cryptopp/issues/188
1340 
1341  Bot_4
1342 }
1343 
1344 void Baseline_MultiplyBottom8(word *R, const word *AA, const word *BB)
1345 {
1346  // http://github.com/weidai11/cryptopp/issues/188
1349 
1350  Bot_8
1351 }
1352 
1353 #define Top_Begin(n) \
1354  Declare2Words(p) \
1355  word c; \
1356  Declare2Words(d) \
1357  MultiplyWords(p, A[0], B[n-2]);\
1358  AssignWord(d, HighWord(p));
1359 
1360 #define Top_Acc(i, j) \
1361  MultiplyWords(p, A[i], B[j]);\
1362  Acc2WordsBy1(d, HighWord(p));
1363 
1364 #define Top_SaveAcc0(i, j) \
1365  c = LowWord(d); \
1366  AssignWord(d, HighWord(d)) \
1367  MulAcc(c, d, A[i], B[j])
1368 
1369 #define Top_SaveAcc1(i, j) \
1370  c = L<c; \
1371  Acc2WordsBy1(d, c); \
1372  c = LowWord(d); \
1373  AssignWord(d, HighWord(d)) \
1374  MulAcc(c, d, A[i], B[j])
1375 
1376 void Baseline_MultiplyTop2(word *R, const word *A, const word *B, word L)
1377 {
1378  CRYPTOPP_UNUSED(L);
1379  word T[4];
1380  Baseline_Multiply2(T, A, B);
1381  R[0] = T[2];
1382  R[1] = T[3];
1383 }
1384 
1385 void Baseline_MultiplyTop4(word *R, const word *AA, const word *BB, word L)
1386 {
1387  // http://github.com/weidai11/cryptopp/issues/188
1390 
1391  Top_Begin(4)
1392  Top_Acc(1, 1) Top_Acc(2, 0) \
1393  Top_SaveAcc0(0, 3) Mul_Acc(1, 2) Mul_Acc(2, 1) Mul_Acc(3, 0) \
1394  Top_SaveAcc1(1, 3) Mul_Acc(2, 2) Mul_Acc(3, 1) \
1395  Mul_SaveAcc(0, 2, 3) Mul_Acc(3, 2) \
1396  Mul_End(1, 3)
1397 }
1398 
1399 void Baseline_MultiplyTop8(word *R, const word *AA, const word *BB, word L)
1400 {
1401  // http://github.com/weidai11/cryptopp/issues/188
1404 
1405  Top_Begin(8)
1406  Top_Acc(1, 5) Top_Acc(2, 4) Top_Acc(3, 3) Top_Acc(4, 2) Top_Acc(5, 1) Top_Acc(6, 0) \
1407  Top_SaveAcc0(0, 7) Mul_Acc(1, 6) Mul_Acc(2, 5) Mul_Acc(3, 4) Mul_Acc(4, 3) Mul_Acc(5, 2) Mul_Acc(6, 1) Mul_Acc(7, 0) \
1408  Top_SaveAcc1(1, 7) Mul_Acc(2, 6) Mul_Acc(3, 5) Mul_Acc(4, 4) Mul_Acc(5, 3) Mul_Acc(6, 2) Mul_Acc(7, 1) \
1409  Mul_SaveAcc(0, 2, 7) Mul_Acc(3, 6) Mul_Acc(4, 5) Mul_Acc(5, 4) Mul_Acc(6, 3) Mul_Acc(7, 2) \
1410  Mul_SaveAcc(1, 3, 7) Mul_Acc(4, 6) Mul_Acc(5, 5) Mul_Acc(6, 4) Mul_Acc(7, 3) \
1411  Mul_SaveAcc(2, 4, 7) Mul_Acc(5, 6) Mul_Acc(6, 5) Mul_Acc(7, 4) \
1412  Mul_SaveAcc(3, 5, 7) Mul_Acc(6, 6) Mul_Acc(7, 5) \
1413  Mul_SaveAcc(4, 6, 7) Mul_Acc(7, 6) \
1414  Mul_End(5, 7)
1415 }
1416 
1417 #if !CRYPTOPP_INTEGER_SSE2 // save memory by not compiling these functions when SSE2 is available
1418 void Baseline_Multiply16(word *R, const word *AA, const word *BB)
1419 {
1420  // http://github.com/weidai11/cryptopp/issues/188
1423 
1424  Mul_16
1425 }
1426 
1427 void Baseline_Square16(word *R, const word *AA)
1428 {
1429  // http://github.com/weidai11/cryptopp/issues/188
1431 
1432  Squ_16
1433 }
1434 
1435 void Baseline_MultiplyBottom16(word *R, const word *AA, const word *BB)
1436 {
1437  // http://github.com/weidai11/cryptopp/issues/188
1440 
1441  Bot_16
1442 }
1443 
1444 void Baseline_MultiplyTop16(word *R, const word *AA, const word *BB, word L)
1445 {
1446  // http://github.com/weidai11/cryptopp/issues/188
1449 
1450  Top_Begin(16)
1451  Top_Acc(1, 13) Top_Acc(2, 12) Top_Acc(3, 11) Top_Acc(4, 10) Top_Acc(5, 9) Top_Acc(6, 8) Top_Acc(7, 7) Top_Acc(8, 6) Top_Acc(9, 5) Top_Acc(10, 4) Top_Acc(11, 3) Top_Acc(12, 2) Top_Acc(13, 1) Top_Acc(14, 0) \
1452  Top_SaveAcc0(0, 15) Mul_Acc(1, 14) Mul_Acc(2, 13) Mul_Acc(3, 12) Mul_Acc(4, 11) Mul_Acc(5, 10) Mul_Acc(6, 9) Mul_Acc(7, 8) Mul_Acc(8, 7) Mul_Acc(9, 6) Mul_Acc(10, 5) Mul_Acc(11, 4) Mul_Acc(12, 3) Mul_Acc(13, 2) Mul_Acc(14, 1) Mul_Acc(15, 0) \
1453  Top_SaveAcc1(1, 15) Mul_Acc(2, 14) Mul_Acc(3, 13) Mul_Acc(4, 12) Mul_Acc(5, 11) Mul_Acc(6, 10) Mul_Acc(7, 9) Mul_Acc(8, 8) Mul_Acc(9, 7) Mul_Acc(10, 6) Mul_Acc(11, 5) Mul_Acc(12, 4) Mul_Acc(13, 3) Mul_Acc(14, 2) Mul_Acc(15, 1) \
1454  Mul_SaveAcc(0, 2, 15) Mul_Acc(3, 14) Mul_Acc(4, 13) Mul_Acc(5, 12) Mul_Acc(6, 11) Mul_Acc(7, 10) Mul_Acc(8, 9) Mul_Acc(9, 8) Mul_Acc(10, 7) Mul_Acc(11, 6) Mul_Acc(12, 5) Mul_Acc(13, 4) Mul_Acc(14, 3) Mul_Acc(15, 2) \
1455  Mul_SaveAcc(1, 3, 15) Mul_Acc(4, 14) Mul_Acc(5, 13) Mul_Acc(6, 12) Mul_Acc(7, 11) Mul_Acc(8, 10) Mul_Acc(9, 9) Mul_Acc(10, 8) Mul_Acc(11, 7) Mul_Acc(12, 6) Mul_Acc(13, 5) Mul_Acc(14, 4) Mul_Acc(15, 3) \
1456  Mul_SaveAcc(2, 4, 15) Mul_Acc(5, 14) Mul_Acc(6, 13) Mul_Acc(7, 12) Mul_Acc(8, 11) Mul_Acc(9, 10) Mul_Acc(10, 9) Mul_Acc(11, 8) Mul_Acc(12, 7) Mul_Acc(13, 6) Mul_Acc(14, 5) Mul_Acc(15, 4) \
1457  Mul_SaveAcc(3, 5, 15) Mul_Acc(6, 14) Mul_Acc(7, 13) Mul_Acc(8, 12) Mul_Acc(9, 11) Mul_Acc(10, 10) Mul_Acc(11, 9) Mul_Acc(12, 8) Mul_Acc(13, 7) Mul_Acc(14, 6) Mul_Acc(15, 5) \
1458  Mul_SaveAcc(4, 6, 15) Mul_Acc(7, 14) Mul_Acc(8, 13) Mul_Acc(9, 12) Mul_Acc(10, 11) Mul_Acc(11, 10) Mul_Acc(12, 9) Mul_Acc(13, 8) Mul_Acc(14, 7) Mul_Acc(15, 6) \
1459  Mul_SaveAcc(5, 7, 15) Mul_Acc(8, 14) Mul_Acc(9, 13) Mul_Acc(10, 12) Mul_Acc(11, 11) Mul_Acc(12, 10) Mul_Acc(13, 9) Mul_Acc(14, 8) Mul_Acc(15, 7) \
1460  Mul_SaveAcc(6, 8, 15) Mul_Acc(9, 14) Mul_Acc(10, 13) Mul_Acc(11, 12) Mul_Acc(12, 11) Mul_Acc(13, 10) Mul_Acc(14, 9) Mul_Acc(15, 8) \
1461  Mul_SaveAcc(7, 9, 15) Mul_Acc(10, 14) Mul_Acc(11, 13) Mul_Acc(12, 12) Mul_Acc(13, 11) Mul_Acc(14, 10) Mul_Acc(15, 9) \
1462  Mul_SaveAcc(8, 10, 15) Mul_Acc(11, 14) Mul_Acc(12, 13) Mul_Acc(13, 12) Mul_Acc(14, 11) Mul_Acc(15, 10) \
1463  Mul_SaveAcc(9, 11, 15) Mul_Acc(12, 14) Mul_Acc(13, 13) Mul_Acc(14, 12) Mul_Acc(15, 11) \
1464  Mul_SaveAcc(10, 12, 15) Mul_Acc(13, 14) Mul_Acc(14, 13) Mul_Acc(15, 12) \
1465  Mul_SaveAcc(11, 13, 15) Mul_Acc(14, 14) Mul_Acc(15, 13) \
1466  Mul_SaveAcc(12, 14, 15) Mul_Acc(15, 14) \
1467  Mul_End(13, 15)
1468 }
1469 #endif
1470 
1471 // ********************************************************
1472 
1473 #if CRYPTOPP_INTEGER_SSE2
1474 
1475 CRYPTOPP_ALIGN_DATA(16)
1477 const word32 s_maskLow16[4] = {
1478  0xffff,0xffff,0xffff,0xffff
1479 };
1480 
1481 #undef Mul_Begin
1482 #undef Mul_Acc
1483 #undef Top_Begin
1484 #undef Top_Acc
1485 #undef Squ_Acc
1486 #undef Squ_NonDiag
1487 #undef Squ_Diag
1488 #undef Squ_SaveAcc
1489 #undef Squ_Begin
1490 #undef Mul_SaveAcc
1491 #undef Bot_Acc
1492 #undef Bot_SaveAcc
1493 #undef Bot_End
1494 #undef Squ_End
1495 #undef Mul_End
1496 
1497 #define SSE2_FinalSave(k) \
1498  AS2( psllq xmm5, 16) \
1499  AS2( paddq xmm4, xmm5) \
1500  AS2( movq QWORD PTR [ecx+8*(k)], xmm4)
1501 
1502 #define SSE2_SaveShift(k) \
1503  AS2( movq xmm0, xmm6) \
1504  AS2( punpckhqdq xmm6, xmm0) \
1505  AS2( movq xmm1, xmm7) \
1506  AS2( punpckhqdq xmm7, xmm1) \
1507  AS2( paddd xmm6, xmm0) \
1508  AS2( pslldq xmm6, 4) \
1509  AS2( paddd xmm7, xmm1) \
1510  AS2( paddd xmm4, xmm6) \
1511  AS2( pslldq xmm7, 4) \
1512  AS2( movq xmm6, xmm4) \
1513  AS2( paddd xmm5, xmm7) \
1514  AS2( movq xmm7, xmm5) \
1515  AS2( movd DWORD PTR [ecx+8*(k)], xmm4) \
1516  AS2( psrlq xmm6, 16) \
1517  AS2( paddq xmm6, xmm7) \
1518  AS2( punpckhqdq xmm4, xmm0) \
1519  AS2( punpckhqdq xmm5, xmm0) \
1520  AS2( movq QWORD PTR [ecx+8*(k)+2], xmm6) \
1521  AS2( psrlq xmm6, 3*16) \
1522  AS2( paddd xmm4, xmm6) \
1523 
1524 #define Squ_SSE2_SaveShift(k) \
1525  AS2( movq xmm0, xmm6) \
1526  AS2( punpckhqdq xmm6, xmm0) \
1527  AS2( movq xmm1, xmm7) \
1528  AS2( punpckhqdq xmm7, xmm1) \
1529  AS2( paddd xmm6, xmm0) \
1530  AS2( pslldq xmm6, 4) \
1531  AS2( paddd xmm7, xmm1) \
1532  AS2( paddd xmm4, xmm6) \
1533  AS2( pslldq xmm7, 4) \
1534  AS2( movhlps xmm6, xmm4) \
1535  AS2( movd DWORD PTR [ecx+8*(k)], xmm4) \
1536  AS2( paddd xmm5, xmm7) \
1537  AS2( movhps QWORD PTR [esp+12], xmm5)\
1538  AS2( psrlq xmm4, 16) \
1539  AS2( paddq xmm4, xmm5) \
1540  AS2( movq QWORD PTR [ecx+8*(k)+2], xmm4) \
1541  AS2( psrlq xmm4, 3*16) \
1542  AS2( paddd xmm4, xmm6) \
1543  AS2( movq QWORD PTR [esp+4], xmm4)\
1544 
1545 #define SSE2_FirstMultiply(i) \
1546  AS2( movdqa xmm7, [esi+(i)*16])\
1547  AS2( movdqa xmm5, [edi-(i)*16])\
1548  AS2( pmuludq xmm5, xmm7) \
1549  AS2( movdqa xmm4, [ebx])\
1550  AS2( movdqa xmm6, xmm4) \
1551  AS2( pand xmm4, xmm5) \
1552  AS2( psrld xmm5, 16) \
1553  AS2( pmuludq xmm7, [edx-(i)*16])\
1554  AS2( pand xmm6, xmm7) \
1555  AS2( psrld xmm7, 16)
1556 
1557 #define Squ_Begin(n) \
1558  SquPrologue \
1559  AS2( mov esi, esp)\
1560  AS2( and esp, 0xfffffff0)\
1561  AS2( lea edi, [esp-32*n])\
1562  AS2( sub esp, 32*n+16)\
1563  AS1( push esi)\
1564  AS2( mov esi, edi) \
1565  AS2( xor edx, edx) \
1566  ASL(1) \
1567  ASS( pshufd xmm0, [eax+edx], 3,1,2,0) \
1568  ASS( pshufd xmm1, [eax+edx], 2,0,3,1) \
1569  AS2( movdqa [edi+2*edx], xmm0) \
1570  AS2( psrlq xmm0, 32) \
1571  AS2( movdqa [edi+2*edx+16], xmm0) \
1572  AS2( movdqa [edi+16*n+2*edx], xmm1) \
1573  AS2( psrlq xmm1, 32) \
1574  AS2( movdqa [edi+16*n+2*edx+16], xmm1) \
1575  AS2( add edx, 16) \
1576  AS2( cmp edx, 8*(n)) \
1577  ASJ( jne, 1, b) \
1578  AS2( lea edx, [edi+16*n])\
1579  SSE2_FirstMultiply(0) \
1580 
1581 #define Squ_Acc(i) \
1582  ASL(LSqu##i) \
1583  AS2( movdqa xmm1, [esi+(i)*16]) \
1584  AS2( movdqa xmm0, [edi-(i)*16]) \
1585  AS2( movdqa xmm2, [ebx]) \
1586  AS2( pmuludq xmm0, xmm1) \
1587  AS2( pmuludq xmm1, [edx-(i)*16]) \
1588  AS2( movdqa xmm3, xmm2) \
1589  AS2( pand xmm2, xmm0) \
1590  AS2( psrld xmm0, 16) \
1591  AS2( paddd xmm4, xmm2) \
1592  AS2( paddd xmm5, xmm0) \
1593  AS2( pand xmm3, xmm1) \
1594  AS2( psrld xmm1, 16) \
1595  AS2( paddd xmm6, xmm3) \
1596  AS2( paddd xmm7, xmm1) \
1597 
1598 #define Squ_Acc1(i)
1599 #define Squ_Acc2(i) ASC(call, LSqu##i)
1600 #define Squ_Acc3(i) Squ_Acc2(i)
1601 #define Squ_Acc4(i) Squ_Acc2(i)
1602 #define Squ_Acc5(i) Squ_Acc2(i)
1603 #define Squ_Acc6(i) Squ_Acc2(i)
1604 #define Squ_Acc7(i) Squ_Acc2(i)
1605 #define Squ_Acc8(i) Squ_Acc2(i)
1606 
1607 #define SSE2_End(E, n) \
1608  SSE2_SaveShift(2*(n)-3) \
1609  AS2( movdqa xmm7, [esi+16]) \
1610  AS2( movdqa xmm0, [edi]) \
1611  AS2( pmuludq xmm0, xmm7) \
1612  AS2( movdqa xmm2, [ebx]) \
1613  AS2( pmuludq xmm7, [edx]) \
1614  AS2( movdqa xmm6, xmm2) \
1615  AS2( pand xmm2, xmm0) \
1616  AS2( psrld xmm0, 16) \
1617  AS2( paddd xmm4, xmm2) \
1618  AS2( paddd xmm5, xmm0) \
1619  AS2( pand xmm6, xmm7) \
1620  AS2( psrld xmm7, 16) \
1621  SSE2_SaveShift(2*(n)-2) \
1622  SSE2_FinalSave(2*(n)-1) \
1623  AS1( pop esp)\
1624  E
1625 
1626 #define Squ_End(n) SSE2_End(SquEpilogue, n)
1627 #define Mul_End(n) SSE2_End(MulEpilogue, n)
1628 #define Top_End(n) SSE2_End(TopEpilogue, n)
1629 
1630 #define Squ_Column1(k, i) \
1631  Squ_SSE2_SaveShift(k) \
1632  AS2( add esi, 16) \
1633  SSE2_FirstMultiply(1)\
1634  Squ_Acc##i(i) \
1635  AS2( paddd xmm4, xmm4) \
1636  AS2( paddd xmm5, xmm5) \
1637  AS2( movdqa xmm3, [esi]) \
1638  AS2( movq xmm1, QWORD PTR [esi+8]) \
1639  AS2( pmuludq xmm1, xmm3) \
1640  AS2( pmuludq xmm3, xmm3) \
1641  AS2( movdqa xmm0, [ebx])\
1642  AS2( movdqa xmm2, xmm0) \
1643  AS2( pand xmm0, xmm1) \
1644  AS2( psrld xmm1, 16) \
1645  AS2( paddd xmm6, xmm0) \
1646  AS2( paddd xmm7, xmm1) \
1647  AS2( pand xmm2, xmm3) \
1648  AS2( psrld xmm3, 16) \
1649  AS2( paddd xmm6, xmm6) \
1650  AS2( paddd xmm7, xmm7) \
1651  AS2( paddd xmm4, xmm2) \
1652  AS2( paddd xmm5, xmm3) \
1653  AS2( movq xmm0, QWORD PTR [esp+4])\
1654  AS2( movq xmm1, QWORD PTR [esp+12])\
1655  AS2( paddd xmm4, xmm0)\
1656  AS2( paddd xmm5, xmm1)\
1657 
1658 #define Squ_Column0(k, i) \
1659  Squ_SSE2_SaveShift(k) \
1660  AS2( add edi, 16) \
1661  AS2( add edx, 16) \
1662  SSE2_FirstMultiply(1)\
1663  Squ_Acc##i(i) \
1664  AS2( paddd xmm6, xmm6) \
1665  AS2( paddd xmm7, xmm7) \
1666  AS2( paddd xmm4, xmm4) \
1667  AS2( paddd xmm5, xmm5) \
1668  AS2( movq xmm0, QWORD PTR [esp+4])\
1669  AS2( movq xmm1, QWORD PTR [esp+12])\
1670  AS2( paddd xmm4, xmm0)\
1671  AS2( paddd xmm5, xmm1)\
1672 
1673 #define SSE2_MulAdd45 \
1674  AS2( movdqa xmm7, [esi]) \
1675  AS2( movdqa xmm0, [edi]) \
1676  AS2( pmuludq xmm0, xmm7) \
1677  AS2( movdqa xmm2, [ebx]) \
1678  AS2( pmuludq xmm7, [edx]) \
1679  AS2( movdqa xmm6, xmm2) \
1680  AS2( pand xmm2, xmm0) \
1681  AS2( psrld xmm0, 16) \
1682  AS2( paddd xmm4, xmm2) \
1683  AS2( paddd xmm5, xmm0) \
1684  AS2( pand xmm6, xmm7) \
1685  AS2( psrld xmm7, 16)
1686 
1687 #define Mul_Begin(n) \
1688  MulPrologue \
1689  AS2( mov esi, esp)\
1690  AS2( and esp, 0xfffffff0)\
1691  AS2( sub esp, 48*n+16)\
1692  AS1( push esi)\
1693  AS2( xor edx, edx) \
1694  ASL(1) \
1695  ASS( pshufd xmm0, [eax+edx], 3,1,2,0) \
1696  ASS( pshufd xmm1, [eax+edx], 2,0,3,1) \
1697  ASS( pshufd xmm2, [edi+edx], 3,1,2,0) \
1698  AS2( movdqa [esp+20+2*edx], xmm0) \
1699  AS2( psrlq xmm0, 32) \
1700  AS2( movdqa [esp+20+2*edx+16], xmm0) \
1701  AS2( movdqa [esp+20+16*n+2*edx], xmm1) \
1702  AS2( psrlq xmm1, 32) \
1703  AS2( movdqa [esp+20+16*n+2*edx+16], xmm1) \
1704  AS2( movdqa [esp+20+32*n+2*edx], xmm2) \
1705  AS2( psrlq xmm2, 32) \
1706  AS2( movdqa [esp+20+32*n+2*edx+16], xmm2) \
1707  AS2( add edx, 16) \
1708  AS2( cmp edx, 8*(n)) \
1709  ASJ( jne, 1, b) \
1710  AS2( lea edi, [esp+20])\
1711  AS2( lea edx, [esp+20+16*n])\
1712  AS2( lea esi, [esp+20+32*n])\
1713  SSE2_FirstMultiply(0) \
1714 
1715 #define Mul_Acc(i) \
1716  ASL(LMul##i) \
1717  AS2( movdqa xmm1, [esi+i/2*(1-(i-2*(i/2))*2)*16]) \
1718  AS2( movdqa xmm0, [edi-i/2*(1-(i-2*(i/2))*2)*16]) \
1719  AS2( movdqa xmm2, [ebx]) \
1720  AS2( pmuludq xmm0, xmm1) \
1721  AS2( pmuludq xmm1, [edx-i/2*(1-(i-2*(i/2))*2)*16]) \
1722  AS2( movdqa xmm3, xmm2) \
1723  AS2( pand xmm2, xmm0) \
1724  AS2( psrld xmm0, 16) \
1725  AS2( paddd xmm4, xmm2) \
1726  AS2( paddd xmm5, xmm0) \
1727  AS2( pand xmm3, xmm1) \
1728  AS2( psrld xmm1, 16) \
1729  AS2( paddd xmm6, xmm3) \
1730  AS2( paddd xmm7, xmm1) \
1731 
1732 #define Mul_Acc1(i)
1733 #define Mul_Acc2(i) ASC(call, LMul##i)
1734 #define Mul_Acc3(i) Mul_Acc2(i)
1735 #define Mul_Acc4(i) Mul_Acc2(i)
1736 #define Mul_Acc5(i) Mul_Acc2(i)
1737 #define Mul_Acc6(i) Mul_Acc2(i)
1738 #define Mul_Acc7(i) Mul_Acc2(i)
1739 #define Mul_Acc8(i) Mul_Acc2(i)
1740 #define Mul_Acc9(i) Mul_Acc2(i)
1741 #define Mul_Acc10(i) Mul_Acc2(i)
1742 #define Mul_Acc11(i) Mul_Acc2(i)
1743 #define Mul_Acc12(i) Mul_Acc2(i)
1744 #define Mul_Acc13(i) Mul_Acc2(i)
1745 #define Mul_Acc14(i) Mul_Acc2(i)
1746 #define Mul_Acc15(i) Mul_Acc2(i)
1747 #define Mul_Acc16(i) Mul_Acc2(i)
1748 
1749 #define Mul_Column1(k, i) \
1750  SSE2_SaveShift(k) \
1751  AS2( add esi, 16) \
1752  SSE2_MulAdd45\
1753  Mul_Acc##i(i) \
1754 
1755 #define Mul_Column0(k, i) \
1756  SSE2_SaveShift(k) \
1757  AS2( add edi, 16) \
1758  AS2( add edx, 16) \
1759  SSE2_MulAdd45\
1760  Mul_Acc##i(i) \
1761 
1762 #define Bot_Acc(i) \
1763  AS2( movdqa xmm1, [esi+i/2*(1-(i-2*(i/2))*2)*16]) \
1764  AS2( movdqa xmm0, [edi-i/2*(1-(i-2*(i/2))*2)*16]) \
1765  AS2( pmuludq xmm0, xmm1) \
1766  AS2( pmuludq xmm1, [edx-i/2*(1-(i-2*(i/2))*2)*16]) \
1767  AS2( paddq xmm4, xmm0) \
1768  AS2( paddd xmm6, xmm1)
1769 
1770 #define Bot_SaveAcc(k) \
1771  SSE2_SaveShift(k) \
1772  AS2( add edi, 16) \
1773  AS2( add edx, 16) \
1774  AS2( movdqa xmm6, [esi]) \
1775  AS2( movdqa xmm0, [edi]) \
1776  AS2( pmuludq xmm0, xmm6) \
1777  AS2( paddq xmm4, xmm0) \
1778  AS2( psllq xmm5, 16) \
1779  AS2( paddq xmm4, xmm5) \
1780  AS2( pmuludq xmm6, [edx])
1781 
1782 #define Bot_End(n) \
1783  AS2( movhlps xmm7, xmm6) \
1784  AS2( paddd xmm6, xmm7) \
1785  AS2( psllq xmm6, 32) \
1786  AS2( paddd xmm4, xmm6) \
1787  AS2( movq QWORD PTR [ecx+8*((n)-1)], xmm4) \
1788  AS1( pop esp)\
1789  MulEpilogue
1790 
1791 #define Top_Begin(n) \
1792  TopPrologue \
1793  AS2( mov edx, esp)\
1794  AS2( and esp, 0xfffffff0)\
1795  AS2( sub esp, 48*n+16)\
1796  AS1( push edx)\
1797  AS2( xor edx, edx) \
1798  ASL(1) \
1799  ASS( pshufd xmm0, [eax+edx], 3,1,2,0) \
1800  ASS( pshufd xmm1, [eax+edx], 2,0,3,1) \
1801  ASS( pshufd xmm2, [edi+edx], 3,1,2,0) \
1802  AS2( movdqa [esp+20+2*edx], xmm0) \
1803  AS2( psrlq xmm0, 32) \
1804  AS2( movdqa [esp+20+2*edx+16], xmm0) \
1805  AS2( movdqa [esp+20+16*n+2*edx], xmm1) \
1806  AS2( psrlq xmm1, 32) \
1807  AS2( movdqa [esp+20+16*n+2*edx+16], xmm1) \
1808  AS2( movdqa [esp+20+32*n+2*edx], xmm2) \
1809  AS2( psrlq xmm2, 32) \
1810  AS2( movdqa [esp+20+32*n+2*edx+16], xmm2) \
1811  AS2( add edx, 16) \
1812  AS2( cmp edx, 8*(n)) \
1813  ASJ( jne, 1, b) \
1814  AS2( mov eax, esi) \
1815  AS2( lea edi, [esp+20+00*n+16*(n/2-1)])\
1816  AS2( lea edx, [esp+20+16*n+16*(n/2-1)])\
1817  AS2( lea esi, [esp+20+32*n+16*(n/2-1)])\
1818  AS2( pxor xmm4, xmm4)\
1819  AS2( pxor xmm5, xmm5)
1820 
1821 #define Top_Acc(i) \
1822  AS2( movq xmm0, QWORD PTR [esi+i/2*(1-(i-2*(i/2))*2)*16+8]) \
1823  AS2( pmuludq xmm0, [edx-i/2*(1-(i-2*(i/2))*2)*16]) \
1824  AS2( psrlq xmm0, 48) \
1825  AS2( paddd xmm5, xmm0)\
1826 
1827 #define Top_Column0(i) \
1828  AS2( psllq xmm5, 32) \
1829  AS2( add edi, 16) \
1830  AS2( add edx, 16) \
1831  SSE2_MulAdd45\
1832  Mul_Acc##i(i) \
1833 
1834 #define Top_Column1(i) \
1835  SSE2_SaveShift(0) \
1836  AS2( add esi, 16) \
1837  SSE2_MulAdd45\
1838  Mul_Acc##i(i) \
1839  AS2( shr eax, 16) \
1840  AS2( movd xmm0, eax)\
1841  AS2( movd xmm1, [ecx+4])\
1842  AS2( psrld xmm1, 16)\
1843  AS2( pcmpgtd xmm1, xmm0)\
1844  AS2( psrld xmm1, 31)\
1845  AS2( paddd xmm4, xmm1)\
1846 
1847 void SSE2_Square4(word *C, const word *A)
1848 {
1849  Squ_Begin(2)
1850  Squ_Column0(0, 1)
1851  Squ_End(2)
1852 }
1853 
1854 void SSE2_Square8(word *C, const word *A)
1855 {
1856  Squ_Begin(4)
1857 #ifndef __GNUC__
1858  ASJ( jmp, 0, f)
1859  Squ_Acc(2)
1860  AS1( ret) ASL(0)
1861 #endif
1862  Squ_Column0(0, 1)
1863  Squ_Column1(1, 1)
1864  Squ_Column0(2, 2)
1865  Squ_Column1(3, 1)
1866  Squ_Column0(4, 1)
1867  Squ_End(4)
1868 }
1869 
1870 void SSE2_Square16(word *C, const word *A)
1871 {
1872  Squ_Begin(8)
1873 #ifndef __GNUC__
1874  ASJ( jmp, 0, f)
1875  Squ_Acc(4) Squ_Acc(3) Squ_Acc(2)
1876  AS1( ret) ASL(0)
1877 #endif
1878  Squ_Column0(0, 1)
1879  Squ_Column1(1, 1)
1880  Squ_Column0(2, 2)
1881  Squ_Column1(3, 2)
1882  Squ_Column0(4, 3)
1883  Squ_Column1(5, 3)
1884  Squ_Column0(6, 4)
1885  Squ_Column1(7, 3)
1886  Squ_Column0(8, 3)
1887  Squ_Column1(9, 2)
1888  Squ_Column0(10, 2)
1889  Squ_Column1(11, 1)
1890  Squ_Column0(12, 1)
1891  Squ_End(8)
1892 }
1893 
1894 void SSE2_Square32(word *C, const word *A)
1895 {
1896  Squ_Begin(16)
1897  ASJ( jmp, 0, f)
1898  Squ_Acc(8) Squ_Acc(7) Squ_Acc(6) Squ_Acc(5) Squ_Acc(4) Squ_Acc(3) Squ_Acc(2)
1899  AS1( ret) ASL(0)
1900  Squ_Column0(0, 1)
1901  Squ_Column1(1, 1)
1902  Squ_Column0(2, 2)
1903  Squ_Column1(3, 2)
1904  Squ_Column0(4, 3)
1905  Squ_Column1(5, 3)
1906  Squ_Column0(6, 4)
1907  Squ_Column1(7, 4)
1908  Squ_Column0(8, 5)
1909  Squ_Column1(9, 5)
1910  Squ_Column0(10, 6)
1911  Squ_Column1(11, 6)
1912  Squ_Column0(12, 7)
1913  Squ_Column1(13, 7)
1914  Squ_Column0(14, 8)
1915  Squ_Column1(15, 7)
1916  Squ_Column0(16, 7)
1917  Squ_Column1(17, 6)
1918  Squ_Column0(18, 6)
1919  Squ_Column1(19, 5)
1920  Squ_Column0(20, 5)
1921  Squ_Column1(21, 4)
1922  Squ_Column0(22, 4)
1923  Squ_Column1(23, 3)
1924  Squ_Column0(24, 3)
1925  Squ_Column1(25, 2)
1926  Squ_Column0(26, 2)
1927  Squ_Column1(27, 1)
1928  Squ_Column0(28, 1)
1929  Squ_End(16)
1930 }
1931 
1932 void SSE2_Multiply4(word *C, const word *A, const word *B)
1933 {
1934  Mul_Begin(2)
1935 #ifndef __GNUC__
1936  ASJ( jmp, 0, f)
1937  Mul_Acc(2)
1938  AS1( ret) ASL(0)
1939 #endif
1940  Mul_Column0(0, 2)
1941  Mul_End(2)
1942 }
1943 
1944 void SSE2_Multiply8(word *C, const word *A, const word *B)
1945 {
1946  Mul_Begin(4)
1947 #ifndef __GNUC__
1948  ASJ( jmp, 0, f)
1949  Mul_Acc(4) Mul_Acc(3) Mul_Acc(2)
1950  AS1( ret) ASL(0)
1951 #endif
1952  Mul_Column0(0, 2)
1953  Mul_Column1(1, 3)
1954  Mul_Column0(2, 4)
1955  Mul_Column1(3, 3)
1956  Mul_Column0(4, 2)
1957  Mul_End(4)
1958 }
1959 
1960 void SSE2_Multiply16(word *C, const word *A, const word *B)
1961 {
1962  Mul_Begin(8)
1963 #ifndef __GNUC__
1964  ASJ( jmp, 0, f)
1965  Mul_Acc(8) Mul_Acc(7) Mul_Acc(6) Mul_Acc(5) Mul_Acc(4) Mul_Acc(3) Mul_Acc(2)
1966  AS1( ret) ASL(0)
1967 #endif
1968  Mul_Column0(0, 2)
1969  Mul_Column1(1, 3)
1970  Mul_Column0(2, 4)
1971  Mul_Column1(3, 5)
1972  Mul_Column0(4, 6)
1973  Mul_Column1(5, 7)
1974  Mul_Column0(6, 8)
1975  Mul_Column1(7, 7)
1976  Mul_Column0(8, 6)
1977  Mul_Column1(9, 5)
1978  Mul_Column0(10, 4)
1979  Mul_Column1(11, 3)
1980  Mul_Column0(12, 2)
1981  Mul_End(8)
1982 }
1983 
1984 void SSE2_Multiply32(word *C, const word *A, const word *B)
1985 {
1986  Mul_Begin(16)
1987  ASJ( jmp, 0, f)
1988  Mul_Acc(16) Mul_Acc(15) Mul_Acc(14) Mul_Acc(13) Mul_Acc(12) Mul_Acc(11) Mul_Acc(10) Mul_Acc(9) Mul_Acc(8) Mul_Acc(7) Mul_Acc(6) Mul_Acc(5) Mul_Acc(4) Mul_Acc(3) Mul_Acc(2)
1989  AS1( ret) ASL(0)
1990  Mul_Column0(0, 2)
1991  Mul_Column1(1, 3)
1992  Mul_Column0(2, 4)
1993  Mul_Column1(3, 5)
1994  Mul_Column0(4, 6)
1995  Mul_Column1(5, 7)
1996  Mul_Column0(6, 8)
1997  Mul_Column1(7, 9)
1998  Mul_Column0(8, 10)
1999  Mul_Column1(9, 11)
2000  Mul_Column0(10, 12)
2001  Mul_Column1(11, 13)
2002  Mul_Column0(12, 14)
2003  Mul_Column1(13, 15)
2004  Mul_Column0(14, 16)
2005  Mul_Column1(15, 15)
2006  Mul_Column0(16, 14)
2007  Mul_Column1(17, 13)
2008  Mul_Column0(18, 12)
2009  Mul_Column1(19, 11)
2010  Mul_Column0(20, 10)
2011  Mul_Column1(21, 9)
2012  Mul_Column0(22, 8)
2013  Mul_Column1(23, 7)
2014  Mul_Column0(24, 6)
2015  Mul_Column1(25, 5)
2016  Mul_Column0(26, 4)
2017  Mul_Column1(27, 3)
2018  Mul_Column0(28, 2)
2019  Mul_End(16)
2020 }
2021 
2022 void SSE2_MultiplyBottom4(word *C, const word *A, const word *B)
2023 {
2024  Mul_Begin(2)
2025  Bot_SaveAcc(0) Bot_Acc(2)
2026  Bot_End(2)
2027 }
2028 
2029 void SSE2_MultiplyBottom8(word *C, const word *A, const word *B)
2030 {
2031  Mul_Begin(4)
2032 #ifndef __GNUC__
2033  ASJ( jmp, 0, f)
2034  Mul_Acc(3) Mul_Acc(2)
2035  AS1( ret) ASL(0)
2036 #endif
2037  Mul_Column0(0, 2)
2038  Mul_Column1(1, 3)
2039  Bot_SaveAcc(2) Bot_Acc(4) Bot_Acc(3) Bot_Acc(2)
2040  Bot_End(4)
2041 }
2042 
2043 void SSE2_MultiplyBottom16(word *C, const word *A, const word *B)
2044 {
2045  Mul_Begin(8)
2046 #ifndef __GNUC__
2047  ASJ( jmp, 0, f)
2048  Mul_Acc(7) Mul_Acc(6) Mul_Acc(5) Mul_Acc(4) Mul_Acc(3) Mul_Acc(2)
2049  AS1( ret) ASL(0)
2050 #endif
2051  Mul_Column0(0, 2)
2052  Mul_Column1(1, 3)
2053  Mul_Column0(2, 4)
2054  Mul_Column1(3, 5)
2055  Mul_Column0(4, 6)
2056  Mul_Column1(5, 7)
2057  Bot_SaveAcc(6) Bot_Acc(8) Bot_Acc(7) Bot_Acc(6) Bot_Acc(5) Bot_Acc(4) Bot_Acc(3) Bot_Acc(2)
2058  Bot_End(8)
2059 }
2060 
2061 void SSE2_MultiplyBottom32(word *C, const word *A, const word *B)
2062 {
2063  Mul_Begin(16)
2064 #ifndef __GNUC__
2065  ASJ( jmp, 0, f)
2066  Mul_Acc(15) Mul_Acc(14) Mul_Acc(13) Mul_Acc(12) Mul_Acc(11) Mul_Acc(10) Mul_Acc(9) Mul_Acc(8) Mul_Acc(7) Mul_Acc(6) Mul_Acc(5) Mul_Acc(4) Mul_Acc(3) Mul_Acc(2)
2067  AS1( ret) ASL(0)
2068 #endif
2069  Mul_Column0(0, 2)
2070  Mul_Column1(1, 3)
2071  Mul_Column0(2, 4)
2072  Mul_Column1(3, 5)
2073  Mul_Column0(4, 6)
2074  Mul_Column1(5, 7)
2075  Mul_Column0(6, 8)
2076  Mul_Column1(7, 9)
2077  Mul_Column0(8, 10)
2078  Mul_Column1(9, 11)
2079  Mul_Column0(10, 12)
2080  Mul_Column1(11, 13)
2081  Mul_Column0(12, 14)
2082  Mul_Column1(13, 15)
2083  Bot_SaveAcc(14) Bot_Acc(16) Bot_Acc(15) Bot_Acc(14) Bot_Acc(13) Bot_Acc(12) Bot_Acc(11) Bot_Acc(10) Bot_Acc(9) Bot_Acc(8) Bot_Acc(7) Bot_Acc(6) Bot_Acc(5) Bot_Acc(4) Bot_Acc(3) Bot_Acc(2)
2084  Bot_End(16)
2085 }
2086 
2087 void SSE2_MultiplyTop8(word *C, const word *A, const word *B, word L)
2088 {
2089  Top_Begin(4)
2090  Top_Acc(3) Top_Acc(2) Top_Acc(1)
2091 #ifndef __GNUC__
2092  ASJ( jmp, 0, f)
2093  Mul_Acc(4) Mul_Acc(3) Mul_Acc(2)
2094  AS1( ret) ASL(0)
2095 #endif
2096  Top_Column0(4)
2097  Top_Column1(3)
2098  Mul_Column0(0, 2)
2099  Top_End(2)
2100 }
2101 
2102 void SSE2_MultiplyTop16(word *C, const word *A, const word *B, word L)
2103 {
2104  Top_Begin(8)
2105  Top_Acc(7) Top_Acc(6) Top_Acc(5) Top_Acc(4) Top_Acc(3) Top_Acc(2) Top_Acc(1)
2106 #ifndef __GNUC__
2107  ASJ( jmp, 0, f)
2108  Mul_Acc(8) Mul_Acc(7) Mul_Acc(6) Mul_Acc(5) Mul_Acc(4) Mul_Acc(3) Mul_Acc(2)
2109  AS1( ret) ASL(0)
2110 #endif
2111  Top_Column0(8)
2112  Top_Column1(7)
2113  Mul_Column0(0, 6)
2114  Mul_Column1(1, 5)
2115  Mul_Column0(2, 4)
2116  Mul_Column1(3, 3)
2117  Mul_Column0(4, 2)
2118  Top_End(4)
2119 }
2120 
2121 void SSE2_MultiplyTop32(word *C, const word *A, const word *B, word L)
2122 {
2123  Top_Begin(16)
2124  Top_Acc(15) Top_Acc(14) Top_Acc(13) Top_Acc(12) Top_Acc(11) Top_Acc(10) Top_Acc(9) Top_Acc(8) Top_Acc(7) Top_Acc(6) Top_Acc(5) Top_Acc(4) Top_Acc(3) Top_Acc(2) Top_Acc(1)
2125 #ifndef __GNUC__
2126  ASJ( jmp, 0, f)
2127  Mul_Acc(16) Mul_Acc(15) Mul_Acc(14) Mul_Acc(13) Mul_Acc(12) Mul_Acc(11) Mul_Acc(10) Mul_Acc(9) Mul_Acc(8) Mul_Acc(7) Mul_Acc(6) Mul_Acc(5) Mul_Acc(4) Mul_Acc(3) Mul_Acc(2)
2128  AS1( ret) ASL(0)
2129 #endif
2130  Top_Column0(16)
2131  Top_Column1(15)
2132  Mul_Column0(0, 14)
2133  Mul_Column1(1, 13)
2134  Mul_Column0(2, 12)
2135  Mul_Column1(3, 11)
2136  Mul_Column0(4, 10)
2137  Mul_Column1(5, 9)
2138  Mul_Column0(6, 8)
2139  Mul_Column1(7, 7)
2140  Mul_Column0(8, 6)
2141  Mul_Column1(9, 5)
2142  Mul_Column0(10, 4)
2143  Mul_Column1(11, 3)
2144  Mul_Column0(12, 2)
2145  Top_End(8)
2146 }
2147 
2148 #endif // #if CRYPTOPP_INTEGER_SSE2
2149 
2150 // ********************************************************
2151 
2152 typedef int (CRYPTOPP_FASTCALL * PAdd)(size_t N, word *C, const word *A, const word *B);
2153 typedef void (* PMul)(word *C, const word *A, const word *B);
2154 typedef void (* PSqu)(word *C, const word *A);
2155 typedef void (* PMulTop)(word *C, const word *A, const word *B, word L);
2156 
2157 #if CRYPTOPP_INTEGER_SSE2
2158 static PAdd s_pAdd = &Baseline_Add, s_pSub = &Baseline_Sub;
2159 static size_t s_recursionLimit = 8;
2160 #else
2161 static const size_t s_recursionLimit = 16;
2162 #endif // CRYPTOPP_INTEGER_SSE2
2163 
2164 static PMul s_pMul[9], s_pBot[9];
2165 static PSqu s_pSqu[9];
2166 static PMulTop s_pTop[9];
2167 
2168 void SetFunctionPointers()
2169 {
2170  s_pMul[0] = &Baseline_Multiply2;
2171  s_pBot[0] = &Baseline_MultiplyBottom2;
2172  s_pSqu[0] = &Baseline_Square2;
2173  s_pTop[0] = &Baseline_MultiplyTop2;
2174  s_pTop[1] = &Baseline_MultiplyTop4;
2175 
2176 #if CRYPTOPP_INTEGER_SSE2
2177  if (HasSSE2())
2178  {
2179  if (IsP4())
2180  {
2181  s_pAdd = &SSE2_Add;
2182  s_pSub = &SSE2_Sub;
2183  }
2184 
2185  s_recursionLimit = 32;
2186 
2187  s_pMul[1] = &SSE2_Multiply4;
2188  s_pMul[2] = &SSE2_Multiply8;
2189  s_pMul[4] = &SSE2_Multiply16;
2190  s_pMul[8] = &SSE2_Multiply32;
2191 
2192  s_pBot[1] = &SSE2_MultiplyBottom4;
2193  s_pBot[2] = &SSE2_MultiplyBottom8;
2194  s_pBot[4] = &SSE2_MultiplyBottom16;
2195  s_pBot[8] = &SSE2_MultiplyBottom32;
2196 
2197  s_pSqu[1] = &SSE2_Square4;
2198  s_pSqu[2] = &SSE2_Square8;
2199  s_pSqu[4] = &SSE2_Square16;
2200  s_pSqu[8] = &SSE2_Square32;
2201 
2202  s_pTop[2] = &SSE2_MultiplyTop8;
2203  s_pTop[4] = &SSE2_MultiplyTop16;
2204  s_pTop[8] = &SSE2_MultiplyTop32;
2205  }
2206  else
2207 #endif // CRYPTOPP_INTEGER_SSE2
2208  {
2209  s_pMul[1] = &Baseline_Multiply4;
2210  s_pMul[2] = &Baseline_Multiply8;
2211 
2212  s_pBot[1] = &Baseline_MultiplyBottom4;
2213  s_pBot[2] = &Baseline_MultiplyBottom8;
2214 
2215  s_pSqu[1] = &Baseline_Square4;
2216  s_pSqu[2] = &Baseline_Square8;
2217 
2218  s_pTop[2] = &Baseline_MultiplyTop8;
2219 
2220 #if !CRYPTOPP_INTEGER_SSE2
2221  s_pMul[4] = &Baseline_Multiply16;
2222  s_pBot[4] = &Baseline_MultiplyBottom16;
2223  s_pSqu[4] = &Baseline_Square16;
2224  s_pTop[4] = &Baseline_MultiplyTop16;
2225 #endif // !CRYPTOPP_INTEGER_SSE2
2226  }
2227 }
2228 
2229 inline int Add(word *C, const word *A, const word *B, size_t N)
2230 {
2231 #if CRYPTOPP_INTEGER_SSE2
2232  return s_pAdd(N, C, A, B);
2233 #else
2234  return Baseline_Add(N, C, A, B);
2235 #endif // CRYPTOPP_INTEGER_SSE2
2236 }
2237 
2238 inline int Subtract(word *C, const word *A, const word *B, size_t N)
2239 {
2240 #if CRYPTOPP_INTEGER_SSE2
2241  return s_pSub(N, C, A, B);
2242 #else
2243  return Baseline_Sub(N, C, A, B);
2244 #endif // CRYPTOPP_INTEGER_SSE2
2245 }
2246 
2247 // ********************************************************
2248 
2249 
2250 #define A0 A
2251 #define A1 (A+N2)
2252 #define B0 B
2253 #define B1 (B+N2)
2254 
2255 #define T0 T
2256 #define T1 (T+N2)
2257 #define T2 (T+N)
2258 #define T3 (T+N+N2)
2259 
2260 #define R0 R
2261 #define R1 (R+N2)
2262 #define R2 (R+N)
2263 #define R3 (R+N+N2)
2264 
2265 // R[2*N] - result = A*B
2266 // T[2*N] - temporary work space
2267 // A[N] --- multiplier
2268 // B[N] --- multiplicant
2269 
2270 void RecursiveMultiply(word *R, word *T, const word *A, const word *B, size_t N)
2271 {
2272  CRYPTOPP_ASSERT(N>=2 && N%2==0);
2273 
2274  if (N <= s_recursionLimit)
2275  s_pMul[N/4](R, A, B);
2276  else
2277  {
2278  const size_t N2 = N/2;
2279 
2280  size_t AN2 = Compare(A0, A1, N2) > 0 ? 0 : N2;
2281  Subtract(R0, A + AN2, A + (N2 ^ AN2), N2);
2282 
2283  size_t BN2 = Compare(B0, B1, N2) > 0 ? 0 : N2;
2284  Subtract(R1, B + BN2, B + (N2 ^ BN2), N2);
2285 
2286  RecursiveMultiply(R2, T2, A1, B1, N2);
2287  RecursiveMultiply(T0, T2, R0, R1, N2);
2288  RecursiveMultiply(R0, T2, A0, B0, N2);
2289 
2290  // now T[01] holds (A1-A0)*(B0-B1), R[01] holds A0*B0, R[23] holds A1*B1
2291 
2292  int c2 = Add(R2, R2, R1, N2);
2293  int c3 = c2;
2294  c2 += Add(R1, R2, R0, N2);
2295  c3 += Add(R2, R2, R3, N2);
2296 
2297  if (AN2 == BN2)
2298  c3 -= Subtract(R1, R1, T0, N);
2299  else
2300  c3 += Add(R1, R1, T0, N);
2301 
2302  c3 += Increment(R2, N2, c2);
2303  CRYPTOPP_ASSERT (c3 >= 0 && c3 <= 2);
2304  Increment(R3, N2, c3);
2305  }
2306 }
2307 
2308 // R[2*N] - result = A*A
2309 // T[2*N] - temporary work space
2310 // A[N] --- number to be squared
2311 
2312 void RecursiveSquare(word *R, word *T, const word *A, size_t N)
2313 {
2314  CRYPTOPP_ASSERT(N && N%2==0);
2315 
2316  if (N <= s_recursionLimit)
2317  s_pSqu[N/4](R, A);
2318  else
2319  {
2320  const size_t N2 = N/2;
2321 
2322  RecursiveSquare(R0, T2, A0, N2);
2323  RecursiveSquare(R2, T2, A1, N2);
2324  RecursiveMultiply(T0, T2, A0, A1, N2);
2325 
2326  int carry = Add(R1, R1, T0, N);
2327  carry += Add(R1, R1, T0, N);
2328  Increment(R3, N2, carry);
2329  }
2330 }
2331 
2332 // R[N] - bottom half of A*B
2333 // T[3*N/2] - temporary work space
2334 // A[N] - multiplier
2335 // B[N] - multiplicant
2336 
2337 void RecursiveMultiplyBottom(word *R, word *T, const word *A, const word *B, size_t N)
2338 {
2339  CRYPTOPP_ASSERT(N>=2 && N%2==0);
2340 
2341  if (N <= s_recursionLimit)
2342  s_pBot[N/4](R, A, B);
2343  else
2344  {
2345  const size_t N2 = N/2;
2346 
2347  RecursiveMultiply(R, T, A0, B0, N2);
2348  RecursiveMultiplyBottom(T0, T1, A1, B0, N2);
2349  Add(R1, R1, T0, N2);
2350  RecursiveMultiplyBottom(T0, T1, A0, B1, N2);
2351  Add(R1, R1, T0, N2);
2352  }
2353 }
2354 
2355 // R[N] --- upper half of A*B
2356 // T[2*N] - temporary work space
2357 // L[N] --- lower half of A*B
2358 // A[N] --- multiplier
2359 // B[N] --- multiplicant
2360 
2361 void MultiplyTop(word *R, word *T, const word *L, const word *A, const word *B, size_t N)
2362 {
2363  CRYPTOPP_ASSERT(N>=2 && N%2==0);
2364 
2365  if (N <= s_recursionLimit)
2366  s_pTop[N/4](R, A, B, L[N-1]);
2367  else
2368  {
2369  const size_t N2 = N/2;
2370 
2371  size_t AN2 = Compare(A0, A1, N2) > 0 ? 0 : N2;
2372  Subtract(R0, A + AN2, A + (N2 ^ AN2), N2);
2373 
2374  size_t BN2 = Compare(B0, B1, N2) > 0 ? 0 : N2;
2375  Subtract(R1, B + BN2, B + (N2 ^ BN2), N2);
2376 
2377  RecursiveMultiply(T0, T2, R0, R1, N2);
2378  RecursiveMultiply(R0, T2, A1, B1, N2);
2379 
2380  // now T[01] holds (A1-A0)*(B0-B1) = A1*B0+A0*B1-A1*B1-A0*B0, R[01] holds A1*B1
2381 
2382  int t, c3;
2383  int c2 = Subtract(T2, L+N2, L, N2);
2384 
2385  if (AN2 == BN2)
2386  {
2387  c2 -= Add(T2, T2, T0, N2);
2388  t = (Compare(T2, R0, N2) == -1);
2389  c3 = t - Subtract(T2, T2, T1, N2);
2390  }
2391  else
2392  {
2393  c2 += Subtract(T2, T2, T0, N2);
2394  t = (Compare(T2, R0, N2) == -1);
2395  c3 = t + Add(T2, T2, T1, N2);
2396  }
2397 
2398  c2 += t;
2399  if (c2 >= 0)
2400  c3 += Increment(T2, N2, c2);
2401  else
2402  c3 -= Decrement(T2, N2, -c2);
2403  c3 += Add(R0, T2, R1, N2);
2404 
2405  CRYPTOPP_ASSERT (c3 >= 0 && c3 <= 2);
2406  Increment(R1, N2, c3);
2407  }
2408 }
2409 
2410 inline void Multiply(word *R, word *T, const word *A, const word *B, size_t N)
2411 {
2412  RecursiveMultiply(R, T, A, B, N);
2413 }
2414 
2415 inline void Square(word *R, word *T, const word *A, size_t N)
2416 {
2417  RecursiveSquare(R, T, A, N);
2418 }
2419 
2420 inline void MultiplyBottom(word *R, word *T, const word *A, const word *B, size_t N)
2421 {
2422  RecursiveMultiplyBottom(R, T, A, B, N);
2423 }
2424 
2425 // R[NA+NB] - result = A*B
2426 // T[NA+NB] - temporary work space
2427 // A[NA] ---- multiplier
2428 // B[NB] ---- multiplicant
2429 
2430 void AsymmetricMultiply(word *R, word *T, const word *A, size_t NA, const word *B, size_t NB)
2431 {
2432  if (NA == NB)
2433  {
2434  // Profiling guided the flow below.
2435  if (A != B)
2436  Multiply(R, T, A, B, NA);
2437  else
2438  Square(R, T, A, NA);
2439 
2440  return;
2441  }
2442 
2443  if (NA > NB)
2444  {
2445  std::swap(A, B);
2446  std::swap(NA, NB);
2447  }
2448 
2449  CRYPTOPP_ASSERT(NB % NA == 0);
2450 
2451  if (NA==2 && !A[1])
2452  {
2453  // Profiling guided the flow below.
2454  switch (A[0])
2455  {
2456  default:
2457  R[NB] = LinearMultiply(R, B, A[0], NB);
2458  R[NB+1] = 0;
2459  return;
2460  case 0:
2461  SetWords(R, 0, NB+2);
2462  return;
2463  case 1:
2464  CopyWords(R, B, NB);
2465  R[NB] = R[NB+1] = 0;
2466  return;
2467  }
2468  }
2469 
2470  size_t i;
2471  if ((NB/NA)%2 == 0)
2472  {
2473  Multiply(R, T, A, B, NA);
2474  CopyWords(T+2*NA, R+NA, NA);
2475 
2476  for (i=2*NA; i<NB; i+=2*NA)
2477  Multiply(T+NA+i, T, A, B+i, NA);
2478  for (i=NA; i<NB; i+=2*NA)
2479  Multiply(R+i, T, A, B+i, NA);
2480  }
2481  else
2482  {
2483  for (i=0; i<NB; i+=2*NA)
2484  Multiply(R+i, T, A, B+i, NA);
2485  for (i=NA; i<NB; i+=2*NA)
2486  Multiply(T+NA+i, T, A, B+i, NA);
2487  }
2488 
2489  if (Add(R+NA, R+NA, T+2*NA, NB-NA))
2490  Increment(R+NB, NA);
2491 }
2492 
2493 // R[N] ----- result = A inverse mod 2**(WORD_BITS*N)
2494 // T[3*N/2] - temporary work space
2495 // A[N] ----- an odd number as input
2496 
2497 void RecursiveInverseModPower2(word *R, word *T, const word *A, size_t N)
2498 {
2499  // Profiling guided the flow below.
2500  if (N!=2)
2501  {
2502  const size_t N2 = N/2;
2503  RecursiveInverseModPower2(R0, T0, A0, N2);
2504  T0[0] = 1;
2505  SetWords(T0+1, 0, N2-1);
2506  MultiplyTop(R1, T1, T0, R0, A0, N2);
2507  MultiplyBottom(T0, T1, R0, A1, N2);
2508  Add(T0, R1, T0, N2);
2509  TwosComplement(T0, N2);
2510  MultiplyBottom(R1, T1, R0, T0, N2);
2511  }
2512  else
2513  {
2514  T[0] = AtomicInverseModPower2(A[0]);
2515  T[1] = 0;
2516  s_pBot[0](T+2, T, A);
2517  TwosComplement(T+2, 2);
2518  Increment(T+2, 2, 2);
2519  s_pBot[0](R, T, T+2);
2520  }
2521 }
2522 
2523 // R[N] --- result = X/(2**(WORD_BITS*N)) mod M
2524 // T[3*N] - temporary work space
2525 // X[2*N] - number to be reduced
2526 // M[N] --- modulus
2527 // U[N] --- multiplicative inverse of M mod 2**(WORD_BITS*N)
2528 
2529 void MontgomeryReduce(word *R, word *T, word *X, const word *M, const word *U, size_t N)
2530 {
2531 #if 1
2532  MultiplyBottom(R, T, X, U, N);
2533  MultiplyTop(T, T+N, X, R, M, N);
2534  word borrow = Subtract(T, X+N, T, N);
2535  // defend against timing attack by doing this Add even when not needed
2536  word carry = Add(T+N, T, M, N);
2537  CRYPTOPP_ASSERT(carry | !borrow);
2538  CRYPTOPP_UNUSED(carry), CRYPTOPP_UNUSED(borrow);
2539  CopyWords(R, T + ((0-borrow) & N), N);
2540 #elif 0
2541  const word u = 0-U[0];
2542  Declare2Words(p)
2543  for (size_t i=0; i<N; i++)
2544  {
2545  const word t = u * X[i];
2546  word c = 0;
2547  for (size_t j=0; j<N; j+=2)
2548  {
2549  MultiplyWords(p, t, M[j]);
2550  Acc2WordsBy1(p, X[i+j]);
2551  Acc2WordsBy1(p, c);
2552  X[i+j] = LowWord(p);
2553  c = HighWord(p);
2554  MultiplyWords(p, t, M[j+1]);
2555  Acc2WordsBy1(p, X[i+j+1]);
2556  Acc2WordsBy1(p, c);
2557  X[i+j+1] = LowWord(p);
2558  c = HighWord(p);
2559  }
2560 
2561  if (Increment(X+N+i, N-i, c))
2562  while (!Subtract(X+N, X+N, M, N)) {}
2563  }
2564 
2565  std::memcpy(R, X+N, N*WORD_SIZE);
2566 #else
2567  __m64 u = _mm_cvtsi32_si64(0-U[0]), p;
2568  for (size_t i=0; i<N; i++)
2569  {
2570  __m64 t = _mm_cvtsi32_si64(X[i]);
2571  t = _mm_mul_su32(t, u);
2572  __m64 c = _mm_setzero_si64();
2573  for (size_t j=0; j<N; j+=2)
2574  {
2575  p = _mm_mul_su32(t, _mm_cvtsi32_si64(M[j]));
2576  p = _mm_add_si64(p, _mm_cvtsi32_si64(X[i+j]));
2577  c = _mm_add_si64(c, p);
2578  X[i+j] = _mm_cvtsi64_si32(c);
2579  c = _mm_srli_si64(c, 32);
2580  p = _mm_mul_su32(t, _mm_cvtsi32_si64(M[j+1]));
2581  p = _mm_add_si64(p, _mm_cvtsi32_si64(X[i+j+1]));
2582  c = _mm_add_si64(c, p);
2583  X[i+j+1] = _mm_cvtsi64_si32(c);
2584  c = _mm_srli_si64(c, 32);
2585  }
2586 
2587  if (Increment(X+N+i, N-i, _mm_cvtsi64_si32(c)))
2588  while (!Subtract(X+N, X+N, M, N)) {}
2589  }
2590 
2591  std::memcpy(R, X+N, N*WORD_SIZE);
2592  _mm_empty();
2593 #endif
2594 }
2595 
2596 // R[N] --- result = X/(2**(WORD_BITS*N/2)) mod M
2597 // T[2*N] - temporary work space
2598 // X[2*N] - number to be reduced
2599 // M[N] --- modulus
2600 // U[N/2] - multiplicative inverse of M mod 2**(WORD_BITS*N/2)
2601 // V[N] --- 2**(WORD_BITS*3*N/2) mod M
2602 
2603 void HalfMontgomeryReduce(word *R, word *T, const word *X, const word *M, const word *U, const word *V, size_t N)
2604 {
2605  CRYPTOPP_ASSERT(N%2==0 && N>=4);
2606 
2607 #define M0 M
2608 #define M1 (M+N2)
2609 #define V0 V
2610 #define V1 (V+N2)
2611 
2612 #define X0 X
2613 #define X1 (X+N2)
2614 #define X2 (X+N)
2615 #define X3 (X+N+N2)
2616 
2617  const size_t N2 = N/2;
2618  Multiply(T0, T2, V0, X3, N2);
2619  int c2 = Add(T0, T0, X0, N);
2620  MultiplyBottom(T3, T2, T0, U, N2);
2621  MultiplyTop(T2, R, T0, T3, M0, N2);
2622  c2 -= Subtract(T2, T1, T2, N2);
2623  Multiply(T0, R, T3, M1, N2);
2624  c2 -= Subtract(T0, T2, T0, N2);
2625  int c3 = -(int)Subtract(T1, X2, T1, N2);
2626  Multiply(R0, T2, V1, X3, N2);
2627  c3 += Add(R, R, T, N);
2628 
2629  if (c2>0)
2630  c3 += Increment(R1, N2);
2631  else if (c2<0)
2632  c3 -= Decrement(R1, N2, -c2);
2633 
2634  CRYPTOPP_ASSERT(c3>=-1 && c3<=1);
2635  if (c3>0)
2636  Subtract(R, R, M, N);
2637  else if (c3<0)
2638  Add(R, R, M, N);
2639 
2640 #undef M0
2641 #undef M1
2642 #undef V0
2643 #undef V1
2644 
2645 #undef X0
2646 #undef X1
2647 #undef X2
2648 #undef X3
2649 }
2650 
2651 #undef A0
2652 #undef A1
2653 #undef B0
2654 #undef B1
2655 
2656 #undef T0
2657 #undef T1
2658 #undef T2
2659 #undef T3
2660 
2661 #undef R0
2662 #undef R1
2663 #undef R2
2664 #undef R3
2665 
2666 /*
2667 // do a 3 word by 2 word divide, returns quotient and leaves remainder in A
2668 static word SubatomicDivide(word *A, word B0, word B1)
2669 {
2670  // Assert {A[2],A[1]} < {B1,B0}, so quotient can fit in a word
2671  CRYPTOPP_ASSERT(A[2] < B1 || (A[2]==B1 && A[1] < B0));
2672 
2673  // estimate the quotient: do a 2 word by 1 word divide
2674  word Q;
2675  if (B1+1 == 0)
2676  Q = A[2];
2677  else
2678  Q = DWord(A[1], A[2]).DividedBy(B1+1);
2679 
2680  // now subtract Q*B from A
2681  DWord p = DWord::Multiply(B0, Q);
2682  DWord u = (DWord) A[0] - p.GetLowHalf();
2683  A[0] = u.GetLowHalf();
2684  u = (DWord) A[1] - p.GetHighHalf() - u.GetHighHalfAsBorrow() - DWord::Multiply(B1, Q);
2685  A[1] = u.GetLowHalf();
2686  A[2] += u.GetHighHalf();
2687 
2688  // Q <= actual quotient, so fix it
2689  while (A[2] || A[1] > B1 || (A[1]==B1 && A[0]>=B0))
2690  {
2691  u = (DWord) A[0] - B0;
2692  A[0] = u.GetLowHalf();
2693  u = (DWord) A[1] - B1 - u.GetHighHalfAsBorrow();
2694  A[1] = u.GetLowHalf();
2695  A[2] += u.GetHighHalf();
2696  Q++;
2697  CRYPTOPP_ASSERT(Q); // shouldn't overflow
2698  }
2699 
2700  return Q;
2701 }
2702 
2703 // do a 4 word by 2 word divide, returns 2 word quotient in Q0 and Q1
2704 static inline void AtomicDivide(word *Q, const word *A, const word *B)
2705 {
2706  if (!B[0] && !B[1]) // if divisor is 0, we assume divisor==2**(2*WORD_BITS)
2707  {
2708  Q[0] = A[2];
2709  Q[1] = A[3];
2710  }
2711  else
2712  {
2713  word T[4];
2714  T[0] = A[0]; T[1] = A[1]; T[2] = A[2]; T[3] = A[3];
2715  Q[1] = SubatomicDivide(T+1, B[0], B[1]);
2716  Q[0] = SubatomicDivide(T, B[0], B[1]);
2717 
2718 #if defined(CRYPTOPP_DEBUG)
2719  // multiply quotient and divisor and add remainder, make sure it equals dividend
2720  CRYPTOPP_ASSERT(!T[2] && !T[3] && (T[1] < B[1] || (T[1]==B[1] && T[0]<B[0])));
2721  word P[4];
2722  LowLevel::Multiply2(P, Q, B);
2723  Add(P, P, T, 4);
2724  CRYPTOPP_ASSERT(std::memcmp(P, A, 4*WORD_SIZE)==0);
2725 #endif
2726  }
2727 }
2728 */
2729 
2730 static inline void AtomicDivide(word *Q, const word *A, const word *B)
2731 {
2732  word T[4];
2733  DWord q = DivideFourWordsByTwo<word, DWord>(T, DWord(A[0], A[1]), DWord(A[2], A[3]), DWord(B[0], B[1]));
2734  Q[0] = q.GetLowHalf();
2735  Q[1] = q.GetHighHalf();
2736 
2737 #if defined(CRYPTOPP_DEBUG)
2738  if (B[0] || B[1])
2739  {
2740  // multiply quotient and divisor and add remainder, make sure it equals dividend
2741  CRYPTOPP_ASSERT(!T[2] && !T[3] && (T[1] < B[1] || (T[1]==B[1] && T[0]<B[0])));
2742  word P[4];
2743  s_pMul[0](P, Q, B);
2744  Add(P, P, T, 4);
2745  CRYPTOPP_ASSERT(std::memcmp(P, A, 4*WORD_SIZE)==0);
2746  }
2747 #endif
2748 }
2749 
2750 // for use by Divide(), corrects the underestimated quotient {Q1,Q0}
2751 static void CorrectQuotientEstimate(word *R, word *T, word *Q, const word *B, size_t N)
2752 {
2753  CRYPTOPP_ASSERT(N && N%2==0);
2754 
2755  AsymmetricMultiply(T, T+N+2, Q, 2, B, N);
2756 
2757  word borrow = Subtract(R, R, T, N+2);
2758  CRYPTOPP_ASSERT(!borrow && !R[N+1]);
2759  CRYPTOPP_UNUSED(borrow);
2760 
2761  while (R[N] || Compare(R, B, N) >= 0)
2762  {
2763  R[N] -= Subtract(R, R, B, N);
2764  Q[1] += (++Q[0]==0);
2765  CRYPTOPP_ASSERT(Q[0] || Q[1]); // no overflow
2766  }
2767 }
2768 
2769 // R[NB] -------- remainder = A%B
2770 // Q[NA-NB+2] --- quotient = A/B
2771 // T[NA+3*(NB+2)] - temp work space
2772 // A[NA] -------- dividend
2773 // B[NB] -------- divisor
2774 
2775 void Divide(word *R, word *Q, word *T, const word *A, size_t NA, const word *B, size_t NB)
2776 {
2777  CRYPTOPP_ASSERT(NA && NB && NA%2==0 && NB%2==0);
2778  CRYPTOPP_ASSERT(B[NB-1] || B[NB-2]);
2779  CRYPTOPP_ASSERT(NB <= NA);
2780 
2781  // set up temporary work space
2782  word *const TA=T;
2783  word *const TB=T+NA+2;
2784  word *const TP=T+NA+2+NB;
2785 
2786  // copy B into TB and normalize it so that TB has highest bit set to 1
2787  unsigned shiftWords = (B[NB-1]==0);
2788  TB[0] = TB[NB-1] = 0;
2789  CopyWords(TB+shiftWords, B, NB-shiftWords);
2790  unsigned shiftBits = WORD_BITS - BitPrecision(TB[NB-1]);
2791  CRYPTOPP_ASSERT(shiftBits < WORD_BITS);
2792  ShiftWordsLeftByBits(TB, NB, shiftBits);
2793 
2794  // copy A into TA and normalize it
2795  TA[0] = TA[NA] = TA[NA+1] = 0;
2796  CopyWords(TA+shiftWords, A, NA);
2797  ShiftWordsLeftByBits(TA, NA+2, shiftBits);
2798 
2799  if (TA[NA+1]==0 && TA[NA] <= 1)
2800  {
2801  Q[NA-NB+1] = Q[NA-NB] = 0;
2802  while (TA[NA] || Compare(TA+NA-NB, TB, NB) >= 0)
2803  {
2804  TA[NA] -= Subtract(TA+NA-NB, TA+NA-NB, TB, NB);
2805  ++Q[NA-NB];
2806  }
2807  }
2808  else
2809  {
2810  NA+=2;
2811  CRYPTOPP_ASSERT(Compare(TA+NA-NB, TB, NB) < 0);
2812  }
2813 
2814  word BT[2];
2815  BT[0] = TB[NB-2] + 1;
2816  BT[1] = TB[NB-1] + (BT[0]==0);
2817 
2818  // start reducing TA mod TB, 2 words at a time
2819  for (size_t i=NA-2; i>=NB; i-=2)
2820  {
2821  AtomicDivide(Q+i-NB, TA+i-2, BT);
2822  CorrectQuotientEstimate(TA+i-NB, TP, Q+i-NB, TB, NB);
2823  }
2824 
2825  // copy TA into R, and denormalize it
2826  CopyWords(R, TA+shiftWords, NB);
2827  ShiftWordsRightByBits(R, NB, shiftBits);
2828 }
2829 
2830 static inline size_t EvenWordCount(const word *X, size_t N)
2831 {
2832  while (N && X[N-2]==0 && X[N-1]==0)
2833  N-=2;
2834  return N;
2835 }
2836 
2837 // return k
2838 // R[N] --- result = A^(-1) * 2^k mod M
2839 // T[4*N] - temporary work space
2840 // A[NA] -- number to take inverse of
2841 // M[N] --- modulus
2842 
2843 unsigned int AlmostInverse(word *R, word *T, const word *A, size_t NA, const word *M, size_t N)
2844 {
2845  CRYPTOPP_ASSERT(NA<=N && N && N%2==0);
2846 
2847  word *b = T;
2848  word *c = T+N;
2849  word *f = T+2*N;
2850  word *g = T+3*N;
2851  size_t bcLen=2, fgLen=EvenWordCount(M, N);
2852  unsigned int k=0;
2853  bool s=false;
2854 
2855  SetWords(T, 0, 3*N);
2856  b[0]=1;
2857  CopyWords(f, A, NA);
2858  CopyWords(g, M, N);
2859 
2860  while (1)
2861  {
2862  word t=f[0];
2863  while (!t)
2864  {
2865  if (EvenWordCount(f, fgLen)==0)
2866  {
2867  SetWords(R, 0, N);
2868  return 0;
2869  }
2870 
2871  ShiftWordsRightByWords(f, fgLen, 1);
2872  bcLen += 2 * (c[bcLen-1] != 0);
2873  CRYPTOPP_ASSERT(bcLen <= N);
2874  ShiftWordsLeftByWords(c, bcLen, 1);
2875  k+=WORD_BITS;
2876  t=f[0];
2877  }
2878 
2879  // t must be non-0; otherwise undefined.
2880  unsigned int i = TrailingZeros(t);
2881  t >>= i;
2882  k += i;
2883 
2884  if (t==1 && f[1]==0 && EvenWordCount(f+2, fgLen-2)==0)
2885  {
2886  if (s)
2887  Subtract(R, M, b, N);
2888  else
2889  CopyWords(R, b, N);
2890  return k;
2891  }
2892 
2893  ShiftWordsRightByBits(f, fgLen, i);
2894  t = ShiftWordsLeftByBits(c, bcLen, i);
2895  c[bcLen] += t;
2896  bcLen += 2 * (t!=0);
2897  CRYPTOPP_ASSERT(bcLen <= N);
2898 
2899  bool swap = Compare(f, g, fgLen)==-1;
2902  s ^= swap;
2903 
2904  fgLen -= 2 * !(f[fgLen-2] | f[fgLen-1]);
2905 
2906  Subtract(f, f, g, fgLen);
2907  t = Add(b, b, c, bcLen);
2908  b[bcLen] += t;
2909  bcLen += 2*t;
2910  CRYPTOPP_ASSERT(bcLen <= N);
2911  }
2912 }
2913 
2914 // R[N] - result = A/(2^k) mod M
2915 // A[N] - input
2916 // M[N] - modulus
2917 
2918 void DivideByPower2Mod(word *R, const word *A, size_t k, const word *M, size_t N)
2919 {
2920  CopyWords(R, A, N);
2921 
2922  while (k--)
2923  {
2924  if (R[0]%2==0)
2925  ShiftWordsRightByBits(R, N, 1);
2926  else
2927  {
2928  word carry = Add(R, R, M, N);
2929  ShiftWordsRightByBits(R, N, 1);
2930  R[N-1] += carry<<(WORD_BITS-1);
2931  }
2932  }
2933 }
2934 
2935 // R[N] - result = A*(2^k) mod M
2936 // A[N] - input
2937 // M[N] - modulus
2938 
2939 void MultiplyByPower2Mod(word *R, const word *A, size_t k, const word *M, size_t N)
2940 {
2941  CopyWords(R, A, N);
2942 
2943  while (k--)
2944  if (ShiftWordsLeftByBits(R, N, 1) || Compare(R, M, N)>=0)
2945  Subtract(R, R, M, N);
2946 }
2947 
2948 // ******************************************************************
2949 
2950 static const unsigned int RoundupSizeTable[] = {2, 2, 2, 4, 4, 8, 8, 8, 8};
2951 
2952 static inline size_t RoundupSize(size_t n)
2953 {
2954  if (n<=8)
2955  return RoundupSizeTable[n];
2956  else if (n<=16)
2957  return 16;
2958  else if (n<=32)
2959  return 32;
2960  else if (n<=64)
2961  return 64;
2962  else
2963  return size_t(1) << BitPrecision(n-1);
2964 }
2965 
2967  : reg(2), sign(POSITIVE)
2968 {
2969  reg[0] = reg[1] = 0;
2970 }
2971 
2972 Integer::Integer(const Integer& t)
2973  : reg(RoundupSize(t.WordCount())), sign(t.sign)
2974 {
2975  CopyWords(reg, t.reg, reg.size());
2976 }
2977 
2978 Integer::Integer(Sign s, lword value)
2979  : reg(2), sign(s)
2980 {
2981  reg[0] = word(value);
2982  reg[1] = word(SafeRightShift<WORD_BITS>(value));
2983 }
2984 
2985 Integer::Integer(signed long value)
2986  : reg(2)
2987 {
2988  if (value >= 0)
2989  sign = POSITIVE;
2990  else
2991  {
2992  sign = NEGATIVE;
2993  value = -value;
2994  }
2995  reg[0] = word(value);
2996  reg[1] = word(SafeRightShift<WORD_BITS>((unsigned long)value));
2997 }
2998 
2999 Integer::Integer(Sign s, word high, word low)
3000  : reg(2), sign(s)
3001 {
3002  reg[0] = low;
3003  reg[1] = high;
3004 }
3005 
3006 bool Integer::IsConvertableToLong() const
3007 {
3008  if (ByteCount() > sizeof(long))
3009  return false;
3010 
3011  unsigned long value = (unsigned long)reg[0];
3012  value += SafeLeftShift<WORD_BITS, unsigned long>((unsigned long)reg[1]);
3013 
3014  if (sign==POSITIVE)
3015  return (signed long)value >= 0;
3016  else
3017  return -(signed long)value < 0;
3018 }
3019 
3020 signed long Integer::ConvertToLong() const
3021 {
3023 
3024  unsigned long value = (unsigned long)reg[0];
3025  value += SafeLeftShift<WORD_BITS, unsigned long>((unsigned long)reg[1]);
3026  return sign==POSITIVE ? value : -(signed long)value;
3027 }
3028 
3029 Integer::Integer(BufferedTransformation &encodedInteger, size_t byteCount, Signedness s, ByteOrder o)
3030 {
3032 
3033  if (o != LITTLE_ENDIAN_ORDER)
3034  {
3035  Decode(encodedInteger, byteCount, s);
3036  }
3037  else
3038  {
3039  SecByteBlock block(byteCount);
3040  encodedInteger.Get(block, block.size());
3041  std::reverse(block.begin(), block.begin()+block.size());
3042 
3043  Decode(block.begin(), block.size(), s);
3044  }
3045 }
3046 
3047 Integer::Integer(const byte *encodedInteger, size_t byteCount, Signedness s, ByteOrder o)
3048 {
3049  CRYPTOPP_ASSERT(encodedInteger && byteCount); // NULL buffer
3051 
3052  if (o != LITTLE_ENDIAN_ORDER)
3053  {
3054  Decode(encodedInteger, byteCount, s);
3055  }
3056  else
3057  {
3058  SecByteBlock block(byteCount);
3059 #if (_MSC_VER >= 1500)
3060  std::reverse_copy(encodedInteger, encodedInteger+byteCount,
3061  stdext::make_checked_array_iterator(block.begin(), block.size()));
3062 #else
3063  std::reverse_copy(encodedInteger, encodedInteger+byteCount, block.begin());
3064 #endif
3065  Decode(block.begin(), block.size(), s);
3066  return;
3067  }
3068 }
3069 
3071 {
3072  // Make explicit call to avoid virtual-dispatch findings in ctor
3073  Integer::BERDecode(bt);
3074 }
3075 
3076 Integer::Integer(RandomNumberGenerator &rng, size_t bitcount)
3077 {
3078  Randomize(rng, bitcount);
3079 }
3080 
3081 Integer::Integer(RandomNumberGenerator &rng, const Integer &min, const Integer &max, RandomNumberType rnType, const Integer &equiv, const Integer &mod)
3082 {
3083  if (!Randomize(rng, min, max, rnType, equiv, mod))
3085 }
3086 
3087 Integer Integer::Power2(size_t e)
3088 {
3089  Integer r((word)0, BitsToWords(e+1));
3090  r.SetBit(e);
3091  return r;
3092 }
3093 
3094 bool Integer::operator!() const
3095 {
3096  return IsNegative() ? false : (reg[0]==0 && WordCount()==0);
3097 }
3098 
3100 {
3101  if (this != &t)
3102  {
3103  if (reg.size() != t.reg.size() || t.reg[t.reg.size()/2] == 0)
3104  reg.New(RoundupSize(t.WordCount()));
3105  CopyWords(reg, t.reg, reg.size());
3106  sign = t.sign;
3107  }
3108  return *this;
3109 }
3110 
3111 bool Integer::GetBit(size_t n) const
3112 {
3113  // Profiling guided the flow below.
3114  if (n/WORD_BITS < reg.size())
3115  return bool((reg[n/WORD_BITS] >> (n % WORD_BITS)) & 1);
3116  else
3117  return 0;
3118 }
3119 
3120 void Integer::SetBit(size_t n, bool value)
3121 {
3122  if (value)
3123  {
3124  reg.CleanGrow(RoundupSize(BitsToWords(n+1)));
3125  reg[n/WORD_BITS] |= (word(1) << (n%WORD_BITS));
3126  }
3127  else
3128  {
3129  if (n/WORD_BITS < reg.size())
3130  reg[n/WORD_BITS] &= ~(word(1) << (n%WORD_BITS));
3131  }
3132 }
3133 
3134 byte Integer::GetByte(size_t n) const
3135 {
3136  // Profiling guided the flow below.
3137  if (n/WORD_SIZE < reg.size())
3138  return byte(reg[n/WORD_SIZE] >> ((n%WORD_SIZE)*8));
3139  else
3140  return 0;
3141 }
3142 
3143 void Integer::SetByte(size_t n, byte value)
3144 {
3145  reg.CleanGrow(RoundupSize(BytesToWords(n+1)));
3146  reg[n/WORD_SIZE] &= ~(word(0xff) << 8*(n%WORD_SIZE));
3147  reg[n/WORD_SIZE] |= (word(value) << 8*(n%WORD_SIZE));
3148 }
3149 
3150 lword Integer::GetBits(size_t i, size_t n) const
3151 {
3152  lword v = 0;
3153  CRYPTOPP_ASSERT(n <= sizeof(v)*8);
3154  for (unsigned int j=0; j<n; j++)
3155  v |= lword(GetBit(i+j)) << j;
3156  return v;
3157 }
3158 
3160 {
3161  Integer result(*this);
3162  result.Negate();
3163  return result;
3164 }
3165 
3167 {
3168  Integer result(*this);
3169  result.sign = POSITIVE;
3170  return result;
3171 }
3172 
3173 void Integer::swap(Integer &a)
3174 {
3175  reg.swap(a.reg);
3176  std::swap(sign, a.sign);
3177 }
3178 
3179 Integer::Integer(word value, size_t length)
3180  : reg(RoundupSize(length)), sign(POSITIVE)
3181 {
3182  reg[0] = value;
3183  SetWords(reg+1, 0, reg.size()-1);
3184 }
3185 
3186 template <class T>
3187 static Integer StringToInteger(const T *str, ByteOrder order)
3188 {
3189  CRYPTOPP_ASSERT( order == BIG_ENDIAN_ORDER || order == LITTLE_ENDIAN_ORDER );
3190 
3191  int radix, sign = 1;
3192  // GCC workaround
3193  // std::char_traits<wchar_t>::length() not defined in GCC 3.2 and STLport 4.5.3
3194  unsigned int length;
3195  for (length = 0; str[length] != 0; length++) {}
3196 
3197  Integer v;
3198 
3199  if (length == 0)
3200  return Integer::Zero();
3201 
3202  // 'str' is of length 1 or more
3203  switch (str[length-1])
3204  {
3205  case 'h':
3206  case 'H':
3207  radix=16;
3208  break;
3209  case 'o':
3210  case 'O':
3211  radix=8;
3212  break;
3213  case 'b':
3214  case 'B':
3215  radix=2;
3216  break;
3217  default:
3218  radix=10;
3219  }
3220 
3221  // 'str' is of length 1 or more
3222  if (str[0] == '-')
3223  {
3224  sign = -1;
3225  str += 1, length -= 1;
3226  }
3227 
3228  // Recognize common prefixes for hexadecimal, octal and decimal.
3229  // Microsoft's MASM also recognizes 0t for octal, but not here.
3230  if (length > 2 && str[0] == '0')
3231  {
3232  if (str[1] == 'x' || str[1] == 'X')
3233  {
3234  radix = 16;
3235  str += 2, length -= 2;
3236  }
3237  else if (str[1] == 'n' || str[1] == 'N')
3238  {
3239  radix = 10;
3240  str += 2, length -= 2;
3241  }
3242  else if (str[1] == 'o' || str[1] == 'O')
3243  {
3244  radix = 8;
3245  str += 2, length -= 2;
3246  }
3247  }
3248 
3249  if (order == BIG_ENDIAN_ORDER)
3250  {
3251  for (unsigned int i=0; i<length; i++)
3252  {
3253  int digit, ch = static_cast<int>(str[i]);
3254 
3255  // Profiling guided the flow below.
3256  if (ch >= '0' && ch <= '9')
3257  digit = ch - '0';
3258  else if (ch >= 'a' && ch <= 'f')
3259  digit = ch - 'a' + 10;
3260  else if (ch >= 'A' && ch <= 'F')
3261  digit = ch - 'A' + 10;
3262  else
3263  digit = radix;
3264 
3265  if (digit < radix)
3266  {
3267  v *= radix;
3268  v += digit;
3269  }
3270  }
3271  }
3272  else if (radix == 16 && order == LITTLE_ENDIAN_ORDER)
3273  {
3274  // Nibble high, low and count
3275  unsigned int nh = 0, nl = 0, nc = 0;
3276  Integer position(Integer::One());
3277 
3278  for (unsigned int i=0; i<length; i++)
3279  {
3280  int digit, ch = static_cast<int>(str[i]);
3281 
3282  if (ch >= '0' && ch <= '9')
3283  digit = ch - '0';
3284  else if (ch >= 'a' && ch <= 'f')
3285  digit = ch - 'a' + 10;
3286  else if (ch >= 'A' && ch <= 'F')
3287  digit = ch - 'A' + 10;
3288  else
3289  digit = radix;
3290 
3291  if (digit < radix)
3292  {
3293  if (nc++ == 0)
3294  nh = digit;
3295  else
3296  nl = digit;
3297 
3298  if (nc == 2)
3299  {
3300  v += position * (nh << 4 | nl);
3301  nc = 0, position <<= 8;
3302  }
3303  }
3304  }
3305 
3306  if (nc == 1)
3307  v += nh * position;
3308  }
3309  else // LITTLE_ENDIAN_ORDER && radix != 16
3310  {
3311  for (int i=static_cast<int>(length)-1; i>=0; i--)
3312  {
3313  int digit, ch = static_cast<int>(str[i]);
3314 
3315  if (ch >= '0' && ch <= '9')
3316  digit = ch - '0';
3317  else if (ch >= 'a' && ch <= 'f')
3318  digit = ch - 'a' + 10;
3319  else if (ch >= 'A' && ch <= 'F')
3320  digit = ch - 'A' + 10;
3321  else
3322  digit = radix;
3323 
3324  if (digit < radix)
3325  {
3326  v *= radix;
3327  v += digit;
3328  }
3329  }
3330  }
3331 
3332  if (sign == -1)
3333  v.Negate();
3334 
3335  return v;
3336 }
3337 
3338 Integer::Integer(const char *str, ByteOrder order)
3339  : reg(2), sign(POSITIVE)
3340 {
3341  *this = StringToInteger(str,order);
3342 }
3343 
3344 Integer::Integer(const wchar_t *str, ByteOrder order)
3345  : reg(2), sign(POSITIVE)
3346 {
3347  *this = StringToInteger(str,order);
3348 }
3349 
3350 unsigned int Integer::WordCount() const
3351 {
3352  return (unsigned int)CountWords(reg, reg.size());
3353 }
3354 
3355 unsigned int Integer::ByteCount() const
3356 {
3357  unsigned wordCount = WordCount();
3358  if (wordCount)
3359  return (wordCount-1)*WORD_SIZE + BytePrecision(reg[wordCount-1]);
3360  else
3361  return 0;
3362 }
3363 
3364 unsigned int Integer::BitCount() const
3365 {
3366  unsigned wordCount = WordCount();
3367  if (wordCount)
3368  return (wordCount-1)*WORD_BITS + BitPrecision(reg[wordCount-1]);
3369  else
3370  return 0;
3371 }
3372 
3373 void Integer::Decode(const byte *input, size_t inputLen, Signedness s)
3374 {
3375  CRYPTOPP_ASSERT(input && inputLen); // NULL buffer
3376  StringStore store(input, inputLen);
3377  Decode(store, inputLen, s);
3378 }
3379 
3380 void Integer::Decode(BufferedTransformation &bt, size_t inputLen, Signedness s)
3381 {
3382  CRYPTOPP_ASSERT(bt.MaxRetrievable() >= inputLen);
3383  if (bt.MaxRetrievable() < inputLen)
3384  throw InvalidArgument("Integer: input length is too small");
3385 
3386  byte b;
3387  bt.Peek(b);
3388  sign = ((s==SIGNED) && (b & 0x80)) ? NEGATIVE : POSITIVE;
3389 
3390  while (inputLen>0 && (sign==POSITIVE ? b==0 : b==0xff))
3391  {
3392  bt.Skip(1);
3393  inputLen--;
3394  bt.Peek(b);
3395  }
3396 
3397  reg.CleanNew(RoundupSize(BytesToWords(inputLen)));
3398  for (size_t i=inputLen; i > 0; i--)
3399  {
3400  (void)bt.Get(b);
3401  reg[(i-1)/WORD_SIZE] |= word(b) << ((i-1)%WORD_SIZE)*8;
3402  }
3403 
3404  if (sign == NEGATIVE)
3405  {
3406  for (size_t i=inputLen; i<reg.size()*WORD_SIZE; i++)
3407  reg[i/WORD_SIZE] |= word(0xff) << (i%WORD_SIZE)*8;
3408  TwosComplement(reg, reg.size());
3409  }
3410 }
3411 
3412 size_t Integer::MinEncodedSize(Signedness signedness) const
3413 {
3414  // Profiling guided the flow below.
3415  unsigned int outputLen = STDMAX(1U, ByteCount());
3416  const bool pre = (signedness == UNSIGNED);
3417  if (!pre && NotNegative() && (GetByte(outputLen-1) & 0x80))
3418  outputLen++;
3419  if (pre)
3420  return outputLen;
3421  if (IsNegative() && *this < -Power2(outputLen*8-1))
3422  outputLen++;
3423  return outputLen;
3424 }
3425 
3426 // PKCS12_PBKDF and other classes use undersized buffers
3427 void Integer::Encode(byte *output, size_t outputLen, Signedness signedness) const
3428 {
3429  CRYPTOPP_ASSERT(output && outputLen); // NULL buffer
3430  ArraySink sink(output, outputLen);
3431  Encode(sink, outputLen, signedness);
3432 }
3433 
3434 void Integer::Encode(BufferedTransformation &bt, size_t outputLen, Signedness signedness) const
3435 {
3436  if (signedness == UNSIGNED || NotNegative())
3437  {
3438  for (size_t i=outputLen; i > 0; i--)
3439  bt.Put(GetByte(i-1));
3440  }
3441  else
3442  {
3443  // take two's complement of *this
3444  Integer temp = Integer::Power2(8*STDMAX((size_t)ByteCount(), outputLen)) + *this;
3445  temp.Encode(bt, outputLen, UNSIGNED);
3446  }
3447 }
3448 
3450 {
3451  DERGeneralEncoder enc(bt, INTEGER);
3453  enc.MessageEnd();
3454 }
3455 
3456 void Integer::BERDecode(const byte *input, size_t len)
3457 {
3458  CRYPTOPP_ASSERT(input && len); // NULL buffer
3459  StringStore store(input, len);
3460  BERDecode(store);
3461 }
3462 
3464 {
3465  BERGeneralDecoder dec(bt, INTEGER);
3466  if (!dec.IsDefiniteLength() || dec.MaxRetrievable() < dec.RemainingLength())
3467  BERDecodeError();
3468  Decode(dec, (size_t)dec.RemainingLength(), SIGNED);
3469  dec.MessageEnd();
3470 }
3471 
3472 void Integer::DEREncodeAsOctetString(BufferedTransformation &bt, size_t length) const
3473 {
3475  Encode(enc, length);
3476  enc.MessageEnd();
3477 }
3478 
3480 {
3482  if (!dec.IsDefiniteLength() || dec.RemainingLength() != length)
3483  BERDecodeError();
3484  Decode(dec, length);
3485  dec.MessageEnd();
3486 }
3487 
3488 size_t Integer::OpenPGPEncode(byte *output, size_t bufferSize) const
3489 {
3490  CRYPTOPP_ASSERT(output && bufferSize); // NULL buffer
3491  CRYPTOPP_ASSERT(bufferSize >= 2+ByteCount()); // Undersized buffer
3492  ArraySink sink(output, bufferSize);
3493  return OpenPGPEncode(sink);
3494 }
3495 
3497 {
3498  word16 bitCount = word16(BitCount());
3499  bt.PutWord16(bitCount);
3500  size_t byteCount = BitsToBytes(bitCount);
3501  Encode(bt, byteCount);
3502  return 2 + byteCount;
3503 }
3504 
3505 void Integer::OpenPGPDecode(const byte *input, size_t len)
3506 {
3507  CRYPTOPP_ASSERT(input && len); // NULL buffer
3508  StringStore store(input, len);
3509  OpenPGPDecode(store);
3510 }
3511 
3513 {
3514  word16 bitCount;
3515  if (bt.GetWord16(bitCount) != 2 || bt.MaxRetrievable() < BitsToBytes(bitCount))
3516  throw OpenPGPDecodeErr();
3517  Decode(bt, BitsToBytes(bitCount));
3518 }
3519 
3520 void Integer::Randomize(RandomNumberGenerator &rng, size_t nbits)
3521 {
3522  const size_t nbytes = nbits/8 + 1;
3523  SecByteBlock buf(nbytes);
3524  rng.GenerateBlock(buf, nbytes);
3525  if (nbytes)
3526  buf[0] = (byte)Crop(buf[0], nbits % 8);
3527  Decode(buf, nbytes, UNSIGNED);
3528 }
3529 
3530 void Integer::Randomize(RandomNumberGenerator &rng, const Integer &min, const Integer &max)
3531 {
3532  if (min > max)
3533  throw InvalidArgument("Integer: Min must be no greater than Max");
3534 
3535  Integer range = max - min;
3536  const unsigned int nbits = range.BitCount();
3537 
3538  do
3539  {
3540  Randomize(rng, nbits);
3541  }
3542  while (*this > range);
3543 
3544  *this += min;
3545 }
3546 
3547 bool Integer::Randomize(RandomNumberGenerator &rng, const Integer &min, const Integer &max, RandomNumberType rnType, const Integer &equiv, const Integer &mod)
3548 {
3549  return GenerateRandomNoThrow(rng, MakeParameters("Min", min)("Max", max)
3550  ("RandomNumberType", rnType)("EquivalentTo", equiv)("Mod", mod));
3551 }
3552 
3553 class KDF2_RNG : public RandomNumberGenerator
3554 {
3555 public:
3556  KDF2_RNG(const byte *seed, size_t seedSize)
3557  : m_counter(0), m_counterAndSeed(ClampSize(seedSize) + 4)
3558  {
3559  std::memcpy(m_counterAndSeed + 4, seed, ClampSize(seedSize));
3560  }
3561 
3562  void GenerateBlock(byte *output, size_t size)
3563  {
3564  CRYPTOPP_ASSERT(output && size); // NULL buffer
3565  PutWord(false, BIG_ENDIAN_ORDER, m_counterAndSeed, m_counter);
3566  ++m_counter;
3567  P1363_KDF2<SHA1>::DeriveKey(output, size, m_counterAndSeed, m_counterAndSeed.size(), NULLPTR, 0);
3568  }
3569 
3570  // UBsan finding, -Wstringop-overflow
3571  inline size_t ClampSize(size_t req) const
3572  {
3573  // Clamp at 16 MB
3574  if (req > 16U*1024*1024)
3575  return 16U*1024*1024;
3576  return req;
3577  }
3578 
3579 private:
3580  word32 m_counter;
3581  SecByteBlock m_counterAndSeed;
3582 };
3583 
3585 {
3586  Integer min = params.GetValueWithDefault("Min", Integer::Zero());
3587  Integer max;
3588  if (!params.GetValue("Max", max))
3589  {
3590  int bitLength;
3591  if (params.GetIntValue("BitLength", bitLength))
3592  max = Integer::Power2(bitLength);
3593  else
3594  throw InvalidArgument("Integer: missing Max argument");
3595  }
3596  if (min > max)
3597  throw InvalidArgument("Integer: Min must be no greater than Max");
3598 
3599  Integer equiv = params.GetValueWithDefault("EquivalentTo", Integer::Zero());
3600  Integer mod = params.GetValueWithDefault("Mod", Integer::One());
3601 
3602  if (equiv.IsNegative() || equiv >= mod)
3603  throw InvalidArgument("Integer: invalid EquivalentTo and/or Mod argument");
3604 
3605  Integer::RandomNumberType rnType = params.GetValueWithDefault("RandomNumberType", Integer::ANY);
3606 
3607  member_ptr<KDF2_RNG> kdf2Rng;
3609  if (params.GetValue(Name::Seed(), seed))
3610  {
3611  ByteQueue bq;
3612  DERSequenceEncoder seq(bq);
3613  min.DEREncode(seq);
3614  max.DEREncode(seq);
3615  equiv.DEREncode(seq);
3616  mod.DEREncode(seq);
3617  DEREncodeUnsigned(seq, rnType);
3618  DEREncodeOctetString(seq, seed.begin(), seed.size());
3619  seq.MessageEnd();
3620 
3621  SecByteBlock finalSeed((size_t)bq.MaxRetrievable());
3622  bq.Get(finalSeed, finalSeed.size());
3623  kdf2Rng.reset(new KDF2_RNG(finalSeed.begin(), finalSeed.size()));
3624  }
3625  RandomNumberGenerator &rng = kdf2Rng.get() ? (RandomNumberGenerator &)*kdf2Rng : i_rng;
3626 
3627  switch (rnType)
3628  {
3629  case ANY:
3630  if (mod == One())
3631  Randomize(rng, min, max);
3632  else
3633  {
3634  Integer min1 = min + (equiv-min)%mod;
3635  if (max < min1)
3636  return false;
3637  Randomize(rng, Zero(), (max - min1) / mod);
3638  *this *= mod;
3639  *this += min1;
3640  }
3641  return true;
3642 
3643  case PRIME:
3644  {
3645  const PrimeSelector *pSelector = params.GetValueWithDefault(Name::PointerToPrimeSelector(), (const PrimeSelector *)NULLPTR);
3646 
3647  int i;
3648  i = 0;
3649  while (1)
3650  {
3651  if (++i==16)
3652  {
3653  // check if there are any suitable primes in [min, max]
3654  Integer first = min;
3655  if (FirstPrime(first, max, equiv, mod, pSelector))
3656  {
3657  // if there is only one suitable prime, we're done
3658  *this = first;
3659  if (!FirstPrime(first, max, equiv, mod, pSelector))
3660  return true;
3661  }
3662  else
3663  return false;
3664  }
3665 
3666  Randomize(rng, min, max);
3667  if (FirstPrime(*this, STDMIN(*this+mod*PrimeSearchInterval(max), max), equiv, mod, pSelector))
3668  return true;
3669  }
3670  }
3671 
3672  default:
3673  throw InvalidArgument("Integer: invalid RandomNumberType argument");
3674  }
3675 }
3676 
3677 std::istream& operator>>(std::istream& in, Integer &a)
3678 {
3679  char c;
3680  unsigned int length = 0;
3681  SecBlock<char> str(length + 16);
3682 
3683  std::ws(in);
3684 
3685  do
3686  {
3687  in.read(&c, 1);
3688  str[length++] = c;
3689  if (length >= str.size())
3690  str.Grow(length + 16);
3691  }
3692  while (in && (c=='-' || c=='x' || (c>='0' && c<='9') || (c>='a' && c<='f') || (c>='A' && c<='F') || c=='h' || c=='H' || c=='o' || c=='O' || c==',' || c=='.'));
3693 
3694  if (in.gcount())
3695  in.putback(c);
3696  str[length-1] = '\0';
3697  a = Integer(str);
3698 
3699  return in;
3700 }
3701 
3702 // Ensure base 10 is default
3703 inline int FlagToBase(long f) {
3704  return f == std::ios::hex ? 16 : (f == std::ios::oct ? 8 : 10);
3705 }
3706 
3707 inline char FlagToSuffix(long f) {
3708  return f == std::ios::hex ? 'h' : (f == std::ios::oct ? 'o' : '.');
3709 }
3710 
3711 // Ensure base 10 is default
3712 std::ostream& operator<<(std::ostream& out, const Integer &a)
3713 {
3714  // Get relevant conversion specifications from ostream.
3715  const long f = out.flags() & std::ios::basefield;
3716  const int base = FlagToBase(f);
3717  const char suffix = FlagToSuffix(f);
3718 
3719  Integer temp1=a, temp2;
3720  if (a.IsNegative())
3721  {
3722  out << '-';
3723  temp1.Negate();
3724  }
3725 
3726  if (!a)
3727  out << '0';
3728 
3729  static const char upper[]="0123456789ABCDEF";
3730  static const char lower[]="0123456789abcdef";
3731 
3732  const char* vec = (out.flags() & std::ios::uppercase) ? upper : lower;
3733  unsigned int i=0;
3734  SecBlock<char> s(a.BitCount() / (SaturatingSubtract1(BitPrecision(base),1U)) + 1);
3735 
3736  while (!!temp1)
3737  {
3738  word digit;
3739  Integer::Divide(digit, temp2, temp1, base);
3740  s[i++]=vec[digit];
3741  temp1.swap(temp2);
3742  }
3743 
3744  while (i--)
3745  {
3746  out << s[i];
3747  }
3748 
3749 #ifdef CRYPTOPP_USE_STD_SHOWBASE
3750  if (out.flags() & std::ios_base::showbase)
3751  out << suffix;
3752 
3753  return out;
3754 #else
3755  return out << suffix;
3756 #endif
3757 }
3758 
3760 {
3761  if (NotNegative())
3762  {
3763  if (Increment(reg, reg.size()))
3764  {
3765  reg.CleanGrow(2*reg.size());
3766  reg[reg.size()/2]=1;
3767  }
3768  }
3769  else
3770  {
3771  word borrow = Decrement(reg, reg.size());
3772  CRYPTOPP_ASSERT(!borrow); CRYPTOPP_UNUSED(borrow);
3773 
3774  if (WordCount()==0)
3775  *this = Zero();
3776  }
3777  return *this;
3778 }
3779 
3781 {
3782  if (IsNegative())
3783  {
3784  if (Increment(reg, reg.size()))
3785  {
3786  reg.CleanGrow(2*reg.size());
3787  reg[reg.size()/2]=1;
3788  }
3789  }
3790  else
3791  {
3792  if (Decrement(reg, reg.size()))
3793  *this = -One();
3794  }
3795  return *this;
3796 }
3797 
3798 // This is a bit operation. We set sign to POSITIVE, so there's no need to
3799 // worry about negative zero. Also see http://stackoverflow.com/q/11644362.
3800 Integer Integer::And(const Integer& t) const
3801 {
3802  // Grow due to https://github.com/weidai11/cryptopp/issues/1072
3803  // The temporary Integer 'result' may have fewer blocks than
3804  // 'this' or 't', if leading 0-blocks are trimmed in copy ctor.
3805 
3806  if (this == &t)
3807  {
3808  return AbsoluteValue();
3809  }
3810  else if (reg.size() >= t.reg.size())
3811  {
3812  IntegerSecBlock temp(t.reg.size());
3813  // AndWords(temp, reg, t.reg, t.reg.size());
3814  for (size_t i=0; i<t.reg.size(); ++i)
3815  temp[i] = reg[i] & t.reg[i];
3816 
3817  Integer result;
3818  std::swap(result.reg, temp);
3819 
3820  return result;
3821  }
3822  else // reg.size() < t.reg.size()
3823  {
3824  IntegerSecBlock temp(reg.size());
3825  // AndWords(temp, reg, t.reg, reg.size());
3826  for (size_t i=0; i<reg.size(); ++i)
3827  temp[i] = reg[i] & t.reg[i];
3828 
3829  Integer result;
3830  std::swap(result.reg, temp);
3831 
3832  return result;
3833  }
3834 }
3835 
3836 // This is a bit operation. We set sign to POSITIVE, so there's no need to
3837 // worry about negative zero. Also see http://stackoverflow.com/q/11644362.
3838 Integer Integer::Or(const Integer& t) const
3839 {
3840  // Grow due to https://github.com/weidai11/cryptopp/issues/1072
3841  // The temporary Integer 'result' may have fewer blocks than
3842  // 'this' or 't', if leading 0-blocks are trimmed in copy ctor.
3843 
3844  if (this == &t)
3845  {
3846  return AbsoluteValue();
3847  }
3848  else if (reg.size() >= t.reg.size())
3849  {
3850  IntegerSecBlock temp(reg, reg.size());
3851  // OrWords(temp, t.reg, t.reg.size());
3852  for (size_t i=0; i<t.reg.size(); ++i)
3853  temp[i] |= t.reg[i];
3854 
3855  Integer result;
3856  std::swap(result.reg, temp);
3857 
3858  return result;
3859  }
3860  else // reg.size() < t.reg.size()
3861  {
3862  IntegerSecBlock temp(t.reg, t.reg.size());
3863  // OrWords(temp, reg, reg.size());
3864  for (size_t i=0; i<reg.size(); ++i)
3865  temp[i] |= reg[i];
3866 
3867  Integer result;
3868  std::swap(result.reg, temp);
3869 
3870  return result;
3871  }
3872 }
3873 
3874 // This is a bit operation. We set sign to POSITIVE, so there's no need to
3875 // worry about negative zero. Also see http://stackoverflow.com/q/11644362.
3876 Integer Integer::Xor(const Integer& t) const
3877 {
3878  // Grow due to https://github.com/weidai11/cryptopp/issues/1072
3879  // The temporary Integer 'result' may have fewer blocks than
3880  // 'this' or 't', if leading 0-blocks are trimmed in copy ctor.
3881 
3882  if (this == &t)
3883  {
3884  return Integer::Zero();
3885  }
3886  else if (reg.size() >= t.reg.size())
3887  {
3888  IntegerSecBlock temp(reg, reg.size());
3889  // OrWords(temp, t.reg, t.reg.size());
3890  for (size_t i=0; i<t.reg.size(); ++i)
3891  temp[i] ^= t.reg[i];
3892 
3893  Integer result;
3894  std::swap(result.reg, temp);
3895 
3896  return result;
3897  }
3898  else // reg.size() < t.reg.size()
3899  {
3900  IntegerSecBlock temp(t.reg, t.reg.size());
3901  // OrWords(temp, reg, reg.size());
3902  for (size_t i=0; i<reg.size(); ++i)
3903  temp[i] ^= reg[i];
3904 
3905  Integer result;
3906  std::swap(result.reg, temp);
3907 
3908  return result;
3909  }
3910 }
3911 
3912 void PositiveAdd(Integer &sum, const Integer &a, const Integer& b)
3913 {
3914  // Profiling guided the flow below.
3915  int carry; const bool pre = (a.reg.size() == b.reg.size());
3916  if (!pre && a.reg.size() > b.reg.size())
3917  {
3918  carry = Add(sum.reg, a.reg, b.reg, b.reg.size());
3919  CopyWords(sum.reg+b.reg.size(), a.reg+b.reg.size(), a.reg.size()-b.reg.size());
3920  carry = Increment(sum.reg+b.reg.size(), a.reg.size()-b.reg.size(), carry);
3921  }
3922  else if (pre)
3923  {
3924  carry = Add(sum.reg, a.reg, b.reg, a.reg.size());
3925  }
3926  else
3927  {
3928  carry = Add(sum.reg, a.reg, b.reg, a.reg.size());
3929  CopyWords(sum.reg+a.reg.size(), b.reg+a.reg.size(), b.reg.size()-a.reg.size());
3930  carry = Increment(sum.reg+a.reg.size(), b.reg.size()-a.reg.size(), carry);
3931  }
3932 
3933  if (carry)
3934  {
3935  sum.reg.CleanGrow(2*sum.reg.size());
3936  sum.reg[sum.reg.size()/2] = 1;
3937  }
3938  sum.sign = Integer::POSITIVE;
3939 }
3940 
3941 void PositiveSubtract(Integer &diff, const Integer &a, const Integer& b)
3942 {
3943  unsigned aSize = a.WordCount();
3944  aSize += aSize%2;
3945  unsigned bSize = b.WordCount();
3946  bSize += bSize%2;
3947 
3948  // Profiling guided the flow below.
3949  if (aSize > bSize)
3950  {
3951  word borrow = Subtract(diff.reg, a.reg, b.reg, bSize);
3952  CopyWords(diff.reg+bSize, a.reg+bSize, aSize-bSize);
3953  borrow = Decrement(diff.reg+bSize, aSize-bSize, borrow);
3954  CRYPTOPP_ASSERT(!borrow); CRYPTOPP_UNUSED(borrow);
3955  diff.sign = Integer::POSITIVE;
3956  }
3957  else if (aSize == bSize)
3958  {
3959  if (Compare(a.reg, b.reg, aSize) >= 0)
3960  {
3961  Subtract(diff.reg, a.reg, b.reg, aSize);
3962  diff.sign = Integer::POSITIVE;
3963  }
3964  else
3965  {
3966  Subtract(diff.reg, b.reg, a.reg, aSize);
3967  diff.sign = Integer::NEGATIVE;
3968  }
3969  }
3970  else
3971  {
3972  word borrow = Subtract(diff.reg, b.reg, a.reg, aSize);
3973  CopyWords(diff.reg+aSize, b.reg+aSize, bSize-aSize);
3974  borrow = Decrement(diff.reg+aSize, bSize-aSize, borrow);
3975  CRYPTOPP_ASSERT(!borrow); CRYPTOPP_UNUSED(borrow);
3976  diff.sign = Integer::NEGATIVE;
3977  }
3978 }
3979 
3980 // MSVC .NET 2003 workaround
3981 template <class T> inline const T& STDMAX2(const T& a, const T& b)
3982 {
3983  return a < b ? b : a;
3984 }
3985 
3986 Integer Integer::Plus(const Integer& b) const
3987 {
3988  Integer sum((word)0, STDMAX2(reg.size(), b.reg.size()));
3989  if (NotNegative())
3990  {
3991  if (b.NotNegative())
3992  PositiveAdd(sum, *this, b);
3993  else
3994  PositiveSubtract(sum, *this, b);
3995  }
3996  else
3997  {
3998  if (b.NotNegative())
3999  PositiveSubtract(sum, b, *this);
4000  else
4001  {
4002  PositiveAdd(sum, *this, b);
4003  sum.sign = Integer::NEGATIVE;
4004  }
4005  }
4006  return sum;
4007 }
4008 
4010 {
4011  reg.CleanGrow(t.reg.size());
4012  if (NotNegative())
4013  {
4014  if (t.NotNegative())
4015  PositiveAdd(*this, *this, t);
4016  else
4017  PositiveSubtract(*this, *this, t);
4018  }
4019  else
4020  {
4021  if (t.NotNegative())
4022  PositiveSubtract(*this, t, *this);
4023  else
4024  {
4025  PositiveAdd(*this, *this, t);
4026  sign = Integer::NEGATIVE;
4027  }
4028  }
4029  return *this;
4030 }
4031 
4032 Integer Integer::Minus(const Integer& b) const
4033 {
4034  Integer diff((word)0, STDMAX2(reg.size(), b.reg.size()));
4035  if (NotNegative())
4036  {
4037  if (b.NotNegative())
4038  PositiveSubtract(diff, *this, b);
4039  else
4040  PositiveAdd(diff, *this, b);
4041  }
4042  else
4043  {
4044  if (b.NotNegative())
4045  {
4046  PositiveAdd(diff, *this, b);
4047  diff.sign = Integer::NEGATIVE;
4048  }
4049  else
4050  PositiveSubtract(diff, b, *this);
4051  }
4052  return diff;
4053 }
4054 
4056 {
4057  reg.CleanGrow(t.reg.size());
4058  if (NotNegative())
4059  {
4060  if (t.NotNegative())
4061  PositiveSubtract(*this, *this, t);
4062  else
4063  PositiveAdd(*this, *this, t);
4064  }
4065  else
4066  {
4067  if (t.NotNegative())
4068  {
4069  PositiveAdd(*this, *this, t);
4070  sign = Integer::NEGATIVE;
4071  }
4072  else
4073  PositiveSubtract(*this, t, *this);
4074  }
4075  return *this;
4076 }
4077 
4078 Integer& Integer::operator<<=(size_t n)
4079 {
4080  const size_t wordCount = WordCount();
4081  const size_t shiftWords = n / WORD_BITS;
4082  const unsigned int shiftBits = (unsigned int)(n % WORD_BITS);
4083 
4084  reg.CleanGrow(RoundupSize(wordCount+BitsToWords(n)));
4085  ShiftWordsLeftByWords(reg, wordCount + shiftWords, shiftWords);
4086  ShiftWordsLeftByBits(reg+shiftWords, wordCount+BitsToWords(shiftBits), shiftBits);
4087  return *this;
4088 }
4089 
4090 Integer& Integer::operator>>=(size_t n)
4091 {
4092  const size_t wordCount = WordCount();
4093  const size_t shiftWords = n / WORD_BITS;
4094  const unsigned int shiftBits = (unsigned int)(n % WORD_BITS);
4095 
4096  ShiftWordsRightByWords(reg, wordCount, shiftWords);
4097  if (wordCount > shiftWords)
4098  ShiftWordsRightByBits(reg, wordCount-shiftWords, shiftBits);
4099  if (IsNegative() && WordCount()==0) // avoid -0
4100  *this = Zero();
4101  return *this;
4102 }
4103 
4105 {
4106  if (this != &t)
4107  {
4108  const size_t size = STDMIN(reg.size(), t.reg.size());
4109  reg.resize(size);
4110  AndWords(reg, t.reg, size);
4111  }
4112  sign = POSITIVE;
4113  return *this;
4114 }
4115 
4117 {
4118  if (this != &t)
4119  {
4120  if (reg.size() >= t.reg.size())
4121  {
4122  OrWords(reg, t.reg, t.reg.size());
4123  }
4124  else // reg.size() < t.reg.size()
4125  {
4126  const size_t head = reg.size();
4127  const size_t tail = t.reg.size() - reg.size();
4128  reg.resize(head+tail);
4129  OrWords(reg, t.reg, head);
4130  CopyWords(reg+head,t.reg+head,tail);
4131  }
4132  }
4133  sign = POSITIVE;
4134  return *this;
4135 }
4136 
4138 {
4139  if (this == &t)
4140  {
4141  *this = Zero();
4142  }
4143  else
4144  {
4145  if (reg.size() >= t.reg.size())
4146  {
4147  XorWords(reg, t.reg, t.reg.size());
4148  }
4149  else // reg.size() < t.reg.size()
4150  {
4151  const size_t head = reg.size();
4152  const size_t tail = t.reg.size() - reg.size();
4153  reg.resize(head+tail);
4154  XorWords(reg, t.reg, head);
4155  CopyWords(reg+head,t.reg+head,tail);
4156  }
4157  }
4158  sign = POSITIVE;
4159  return *this;
4160 }
4161 
4162 void PositiveMultiply(Integer &product, const Integer &a, const Integer &b)
4163 {
4164  size_t aSize = RoundupSize(a.WordCount());
4165  size_t bSize = RoundupSize(b.WordCount());
4166 
4167  product.reg.CleanNew(RoundupSize(aSize+bSize));
4168  product.sign = Integer::POSITIVE;
4169 
4170  IntegerSecBlock workspace(aSize + bSize);
4171  AsymmetricMultiply(product.reg, workspace, a.reg, aSize, b.reg, bSize);
4172 }
4173 
4174 void Multiply(Integer &product, const Integer &a, const Integer &b)
4175 {
4176  PositiveMultiply(product, a, b);
4177 
4178  if (a.NotNegative() != b.NotNegative())
4179  product.Negate();
4180 }
4181 
4182 Integer Integer::Times(const Integer &b) const
4183 {
4184  Integer product;
4185  Multiply(product, *this, b);
4186  return product;
4187 }
4188 
4189 /*
4190 void PositiveDivide(Integer &remainder, Integer &quotient,
4191  const Integer &dividend, const Integer &divisor)
4192 {
4193  remainder.reg.CleanNew(divisor.reg.size());
4194  remainder.sign = Integer::POSITIVE;
4195  quotient.reg.New(0);
4196  quotient.sign = Integer::POSITIVE;
4197  unsigned i=dividend.BitCount();
4198  while (i--)
4199  {
4200  word overflow = ShiftWordsLeftByBits(remainder.reg, remainder.reg.size(), 1);
4201  remainder.reg[0] |= dividend[i];
4202  if (overflow || remainder >= divisor)
4203  {
4204  Subtract(remainder.reg, remainder.reg, divisor.reg, remainder.reg.size());
4205  quotient.SetBit(i);
4206  }
4207  }
4208 }
4209 */
4210 
4211 void PositiveDivide(Integer &remainder, Integer &quotient,
4212  const Integer &a, const Integer &b)
4213 {
4214  unsigned aSize = a.WordCount();
4215  unsigned bSize = b.WordCount();
4216 
4217  if (!bSize)
4218  throw Integer::DivideByZero();
4219 
4220  if (aSize < bSize)
4221  {
4222  remainder = a;
4223  remainder.sign = Integer::POSITIVE;
4224  quotient = Integer::Zero();
4225  return;
4226  }
4227 
4228  aSize += aSize%2; // round up to next even number
4229  bSize += bSize%2;
4230 
4231  remainder.reg.CleanNew(RoundupSize(bSize));
4232  remainder.sign = Integer::POSITIVE;
4233  quotient.reg.CleanNew(RoundupSize(aSize-bSize+2));
4234  quotient.sign = Integer::POSITIVE;
4235 
4236  IntegerSecBlock T(aSize+3*(bSize+2));
4237  Divide(remainder.reg, quotient.reg, T, a.reg, aSize, b.reg, bSize);
4238 }
4239 
4240 void Integer::Divide(Integer &remainder, Integer &quotient, const Integer &dividend, const Integer &divisor)
4241 {
4242  PositiveDivide(remainder, quotient, dividend, divisor);
4243 
4244  if (dividend.IsNegative())
4245  {
4246  quotient.Negate();
4247  if (remainder.NotZero())
4248  {
4249  --quotient;
4250  remainder = divisor.AbsoluteValue() - remainder;
4251  }
4252  }
4253 
4254  if (divisor.IsNegative())
4255  quotient.Negate();
4256 }
4257 
4258 void Integer::DivideByPowerOf2(Integer &r, Integer &q, const Integer &a, unsigned int n)
4259 {
4260  q = a;
4261  q >>= n;
4262 
4263  const size_t wordCount = BitsToWords(n);
4264  if (wordCount <= a.WordCount())
4265  {
4266  r.reg.resize(RoundupSize(wordCount));
4267  CopyWords(r.reg, a.reg, wordCount);
4268  SetWords(r.reg+wordCount, 0, r.reg.size()-wordCount);
4269  if (n % WORD_BITS != 0)
4270  r.reg[wordCount-1] %= (word(1) << (n % WORD_BITS));
4271  }
4272  else
4273  {
4274  r.reg.resize(RoundupSize(a.WordCount()));
4275  CopyWords(r.reg, a.reg, r.reg.size());
4276  }
4277  r.sign = POSITIVE;
4278 
4279  if (a.IsNegative() && r.NotZero())
4280  {
4281  --q;
4282  r = Power2(n) - r;
4283  }
4284 }
4285 
4286 Integer Integer::DividedBy(const Integer &b) const
4287 {
4288  Integer remainder, quotient;
4289  Integer::Divide(remainder, quotient, *this, b);
4290  return quotient;
4291 }
4292 
4293 Integer Integer::Modulo(const Integer &b) const
4294 {
4295  Integer remainder, quotient;
4296  Integer::Divide(remainder, quotient, *this, b);
4297  return remainder;
4298 }
4299 
4300 void Integer::Divide(word &remainder, Integer &quotient, const Integer &dividend, word divisor)
4301 {
4302  if (!divisor)
4303  throw Integer::DivideByZero();
4304 
4305  // IsPowerOf2 uses BMI on x86 if available. There is a small
4306  // but measurable improvement during decryption and signing.
4307  if (IsPowerOf2(divisor))
4308  {
4309  quotient = dividend >> (BitPrecision(divisor)-1);
4310  remainder = dividend.reg[0] & (divisor-1);
4311  return;
4312  }
4313 
4314  unsigned int i = dividend.WordCount();
4315  quotient.reg.CleanNew(RoundupSize(i));
4316  remainder = 0;
4317  while (i--)
4318  {
4319  quotient.reg[i] = DWord(dividend.reg[i], remainder) / divisor;
4320  remainder = DWord(dividend.reg[i], remainder) % divisor;
4321  }
4322 
4323  if (dividend.NotNegative())
4324  quotient.sign = POSITIVE;
4325  else
4326  {
4327  quotient.sign = NEGATIVE;
4328  if (remainder)
4329  {
4330  --quotient;
4331  remainder = divisor - remainder;
4332  }
4333  }
4334 }
4335 
4337 {
4338  word remainder;
4339  Integer quotient;
4340  Integer::Divide(remainder, quotient, *this, b);
4341  return quotient;
4342 }
4343 
4344 word Integer::Modulo(word divisor) const
4345 {
4346  if (!divisor)
4347  throw Integer::DivideByZero();
4348 
4349  word remainder;
4350 
4351  // Profiling guided the flow below.
4352  if ((divisor & (divisor-1)) != 0) // divisor is not a power of 2
4353  {
4354  // Profiling guided the flow below.
4355  unsigned int i = WordCount();
4356  if (divisor > 5)
4357  {
4358  remainder = 0;
4359  while (i--)
4360  remainder = DWord(reg[i], remainder) % divisor;
4361  }
4362  else
4363  {
4364  DWord sum(0, 0);
4365  while (i--)
4366  sum += reg[i];
4367  remainder = sum % divisor;
4368  }
4369  }
4370  else // divisor is a power of 2
4371  {
4372  remainder = reg[0] & (divisor-1);
4373  }
4374 
4375  if (IsNegative() && remainder)
4376  remainder = divisor - remainder;
4377 
4378  return remainder;
4379 }
4380 
4381 void Integer::Negate()
4382 {
4383  if (!!(*this)) // don't flip sign if *this==0
4384  sign = Sign(1-sign);
4385 }
4386 
4387 int Integer::PositiveCompare(const Integer& t) const
4388 {
4389  // Profiling guided the flow below.
4390  const unsigned size = WordCount(), tSize = t.WordCount();
4391  if (size != tSize)
4392  return size > tSize ? 1 : -1;
4393  else
4394  return CryptoPP::Compare(reg, t.reg, size);
4395 }
4396 
4397 int Integer::Compare(const Integer& t) const
4398 {
4399  if (NotNegative())
4400  {
4401  if (t.NotNegative())
4402  return PositiveCompare(t);
4403  else
4404  return 1;
4405  }
4406  else
4407  {
4408  if (t.NotNegative())
4409  return -1;
4410  else
4411  return -PositiveCompare(t);
4412  }
4413 }
4414 
4416 {
4417  if (!IsPositive())
4418  return Zero();
4419 
4420  // overestimate square root
4421  Integer x, y = Power2((BitCount()+1)/2);
4422  CRYPTOPP_ASSERT(y*y >= *this);
4423 
4424  do
4425  {
4426  x = y;
4427  y = (x + *this/x) >> 1;
4428  } while (y<x);
4429 
4430  return x;
4431 }
4432 
4433 bool Integer::IsSquare() const
4434 {
4435  Integer r = SquareRoot();
4436  return *this == r.Squared();
4437 }
4438 
4439 bool Integer::IsUnit() const
4440 {
4441  return (WordCount() == 1) && (reg[0] == 1);
4442 }
4443 
4445 {
4446  return IsUnit() ? *this : Zero();
4447 }
4448 
4449 Integer a_times_b_mod_c(const Integer &x, const Integer& y, const Integer& m)
4450 {
4451  CRYPTOPP_ASSERT(m.NotZero());
4452  if (m.IsZero())
4453  throw Integer::DivideByZero();
4454 
4455  return x*y%m;
4456 }
4457 
4458 Integer a_exp_b_mod_c(const Integer &x, const Integer& e, const Integer& m)
4459 {
4460  CRYPTOPP_ASSERT(m.NotZero());
4461  if (m.IsZero())
4462  throw Integer::DivideByZero();
4463 
4464  ModularArithmetic mr(m);
4465  return mr.Exponentiate(x, e);
4466 }
4467 
4468 Integer Integer::Gcd(const Integer &a, const Integer &b)
4469 {
4470  return EuclideanDomainOf<Integer>().Gcd(a, b);
4471 }
4472 
4473 Integer Integer::InverseMod(const Integer &m) const
4474 {
4476  CRYPTOPP_ASSERT(m.NotZero());
4477 
4478  if (IsNegative())
4479  return Modulo(m).InverseModNext(m);
4480 
4481  // http://github.com/weidai11/cryptopp/issues/602
4482  if (*this >= m)
4483  return Modulo(m).InverseModNext(m);
4484 
4485  return InverseModNext(m);
4486 }
4487 
4488 Integer Integer::InverseModNext(const Integer &m) const
4489 {
4491  CRYPTOPP_ASSERT(m.NotZero());
4492 
4493  if (m.IsEven())
4494  {
4495  if (!m || IsEven())
4496  return Zero(); // no inverse
4497  if (*this == One())
4498  return One();
4499 
4500  Integer u = m.Modulo(*this).InverseModNext(*this);
4501  return !u ? Zero() : (m*(*this-u)+1)/(*this);
4502  }
4503 
4504  // AlmostInverse requires a 4x workspace
4505  IntegerSecBlock T(m.reg.size() * 4);
4506  Integer r((word)0, m.reg.size());
4507  unsigned k = AlmostInverse(r.reg, T, reg, reg.size(), m.reg, m.reg.size());
4508  DivideByPower2Mod(r.reg, r.reg, k, m.reg, m.reg.size());
4509  return r;
4510 }
4511 
4512 word Integer::InverseMod(word mod) const
4513 {
4514  CRYPTOPP_ASSERT(mod != 0);
4515 
4516  word g0 = mod, g1 = *this % mod;
4517  word v0 = 0, v1 = 1;
4518  word y;
4519 
4520  while (g1)
4521  {
4522  if (g1 == 1)
4523  return v1;
4524  y = g0 / g1;
4525  g0 = g0 % g1;
4526  v0 += y * v1;
4527 
4528  if (!g0)
4529  break;
4530  if (g0 == 1)
4531  return mod-v0;
4532  y = g1 / g0;
4533  g1 = g1 % g0;
4534  v1 += y * v0;
4535  }
4536  return 0;
4537 }
4538 
4539 // ********************************************************
4540 
4542 {
4543  BERSequenceDecoder seq(bt);
4544  OID oid(seq);
4545  if (oid != ASN1::prime_field())
4546  BERDecodeError();
4547  m_modulus.BERDecode(seq);
4548  seq.MessageEnd();
4549  m_result.reg.resize(m_modulus.reg.size());
4550 }
4551 
4553 {
4554  DERSequenceEncoder seq(bt);
4555  ASN1::prime_field().DEREncode(seq);
4556  m_modulus.DEREncode(seq);
4557  seq.MessageEnd();
4558 }
4559 
4560 void ModularArithmetic::DEREncodeElement(BufferedTransformation &out, const Element &a) const
4561 {
4562  a.DEREncodeAsOctetString(out, MaxElementByteLength());
4563 }
4564 
4566 {
4567  a.BERDecodeAsOctetString(in, MaxElementByteLength());
4568 }
4569 
4570 const Integer& ModularArithmetic::Half(const Integer &a) const
4571 {
4572  if (a.reg.size()==m_modulus.reg.size())
4573  {
4574  CryptoPP::DivideByPower2Mod(m_result.reg.begin(), a.reg, 1, m_modulus.reg, a.reg.size());
4575  return m_result;
4576  }
4577  else
4578  return m_result1 = (a.IsEven() ? (a >> 1) : ((a+m_modulus) >> 1));
4579 }
4580 
4581 const Integer& ModularArithmetic::Add(const Integer &a, const Integer &b) const
4582 {
4583  if (a.reg.size()==m_modulus.reg.size() && b.reg.size()==m_modulus.reg.size())
4584  {
4585  if (CryptoPP::Add(m_result.reg.begin(), a.reg, b.reg, a.reg.size())
4586  || Compare(m_result.reg, m_modulus.reg, a.reg.size()) >= 0)
4587  {
4588  CryptoPP::Subtract(m_result.reg.begin(), m_result.reg, m_modulus.reg, a.reg.size());
4589  }
4590  return m_result;
4591  }
4592  else
4593  {
4594  m_result1 = a+b;
4595  if (m_result1 >= m_modulus)
4596  m_result1 -= m_modulus;
4597  return m_result1;
4598  }
4599 }
4600 
4602 {
4603  if (a.reg.size()==m_modulus.reg.size() && b.reg.size()==m_modulus.reg.size())
4604  {
4605  if (CryptoPP::Add(a.reg, a.reg, b.reg, a.reg.size())
4606  || Compare(a.reg, m_modulus.reg, a.reg.size()) >= 0)
4607  {
4608  CryptoPP::Subtract(a.reg, a.reg, m_modulus.reg, a.reg.size());
4609  }
4610  }
4611  else
4612  {
4613  a+=b;
4614  if (a>=m_modulus)
4615  a-=m_modulus;
4616  }
4617 
4618  return a;
4619 }
4620 
4621 const Integer& ModularArithmetic::Subtract(const Integer &a, const Integer &b) const
4622 {
4623  if (a.reg.size()==m_modulus.reg.size() && b.reg.size()==m_modulus.reg.size())
4624  {
4625  if (CryptoPP::Subtract(m_result.reg.begin(), a.reg, b.reg, a.reg.size()))
4626  CryptoPP::Add(m_result.reg.begin(), m_result.reg, m_modulus.reg, a.reg.size());
4627  return m_result;
4628  }
4629  else
4630  {
4631  m_result1 = a-b;
4632  if (m_result1.IsNegative())
4633  m_result1 += m_modulus;
4634  return m_result1;
4635  }
4636 }
4637 
4638 Integer& ModularArithmetic::Reduce(Integer &a, const Integer &b) const
4639 {
4640  if (a.reg.size()==m_modulus.reg.size() && b.reg.size()==m_modulus.reg.size())
4641  {
4642  if (CryptoPP::Subtract(a.reg, a.reg, b.reg, a.reg.size()))
4643  CryptoPP::Add(a.reg, a.reg, m_modulus.reg, a.reg.size());
4644  }
4645  else
4646  {
4647  a-=b;
4648  if (a.IsNegative())
4649  a+=m_modulus;
4650  }
4651 
4652  return a;
4653 }
4654 
4655 const Integer& ModularArithmetic::Inverse(const Integer &a) const
4656 {
4657  if (!a)
4658  return a;
4659 
4660  CopyWords(m_result.reg.begin(), m_modulus.reg, m_modulus.reg.size());
4661  if (CryptoPP::Subtract(m_result.reg.begin(), m_result.reg, a.reg, a.reg.size()))
4662  Decrement(m_result.reg.begin()+a.reg.size(), m_modulus.reg.size()-a.reg.size());
4663 
4664  return m_result;
4665 }
4666 
4667 Integer ModularArithmetic::CascadeExponentiate(const Integer &x, const Integer &e1, const Integer &y, const Integer &e2) const
4668 {
4669  if (m_modulus.IsOdd())
4670  {
4671  MontgomeryRepresentation dr(m_modulus);
4672  return dr.ConvertOut(dr.CascadeExponentiate(dr.ConvertIn(x), e1, dr.ConvertIn(y), e2));
4673  }
4674  else
4675  return AbstractRing<Integer>::CascadeExponentiate(x, e1, y, e2);
4676 }
4677 
4678 void ModularArithmetic::SimultaneousExponentiate(Integer *results, const Integer &base, const Integer *exponents, unsigned int exponentsCount) const
4679 {
4680  if (m_modulus.IsOdd())
4681  {
4682  MontgomeryRepresentation dr(m_modulus);
4683  dr.SimultaneousExponentiate(results, dr.ConvertIn(base), exponents, exponentsCount);
4684  for (unsigned int i=0; i<exponentsCount; i++)
4685  results[i] = dr.ConvertOut(results[i]);
4686  }
4687  else
4688  AbstractRing<Integer>::SimultaneousExponentiate(results, base, exponents, exponentsCount);
4689 }
4690 
4691 MontgomeryRepresentation::MontgomeryRepresentation(const Integer &m) // modulus must be odd
4692  : ModularArithmetic(m),
4693  m_u((word)0, m_modulus.reg.size()),
4694  m_workspace(5*m_modulus.reg.size())
4695 {
4696  if (!m_modulus.IsOdd())
4697  throw InvalidArgument("MontgomeryRepresentation: Montgomery representation requires an odd modulus");
4698 
4699  RecursiveInverseModPower2(m_u.reg, m_workspace, m_modulus.reg, m_modulus.reg.size());
4700 }
4701 
4702 const Integer& MontgomeryRepresentation::Multiply(const Integer &a, const Integer &b) const
4703 {
4704  word *const T = m_workspace.begin();
4705  word *const R = m_result.reg.begin();
4706  const size_t N = m_modulus.reg.size();
4707  CRYPTOPP_ASSERT(a.reg.size()<=N && b.reg.size()<=N);
4708 
4709  AsymmetricMultiply(T, T+2*N, a.reg, a.reg.size(), b.reg, b.reg.size());
4710  SetWords(T+a.reg.size()+b.reg.size(), 0, 2*N-a.reg.size()-b.reg.size());
4711  MontgomeryReduce(R, T+2*N, T, m_modulus.reg, m_u.reg, N);
4712  return m_result;
4713 }
4714 
4715 const Integer& MontgomeryRepresentation::Square(const Integer &a) const
4716 {
4717  word *const T = m_workspace.begin();
4718  word *const R = m_result.reg.begin();
4719  const size_t N = m_modulus.reg.size();
4720  CRYPTOPP_ASSERT(a.reg.size()<=N);
4721 
4722  CryptoPP::Square(T, T+2*N, a.reg, a.reg.size());
4723  SetWords(T+2*a.reg.size(), 0, 2*N-2*a.reg.size());
4724  MontgomeryReduce(R, T+2*N, T, m_modulus.reg, m_u.reg, N);
4725  return m_result;
4726 }
4727 
4729 {
4730  word *const T = m_workspace.begin();
4731  word *const R = m_result.reg.begin();
4732  const size_t N = m_modulus.reg.size();
4733  CRYPTOPP_ASSERT(a.reg.size()<=N);
4734 
4735  CopyWords(T, a.reg, a.reg.size());
4736  SetWords(T+a.reg.size(), 0, 2*N-a.reg.size());
4737  MontgomeryReduce(R, T+2*N, T, m_modulus.reg, m_u.reg, N);
4738  return m_result;
4739 }
4740 
4742 {
4743 // return (EuclideanMultiplicativeInverse(a, modulus)<<(2*WORD_BITS*modulus.reg.size()))%modulus;
4744  word *const T = m_workspace.begin();
4745  word *const R = m_result.reg.begin();
4746  const size_t N = m_modulus.reg.size();
4747  CRYPTOPP_ASSERT(a.reg.size()<=N);
4748 
4749  CopyWords(T, a.reg, a.reg.size());
4750  SetWords(T+a.reg.size(), 0, 2*N-a.reg.size());
4751  MontgomeryReduce(R, T+2*N, T, m_modulus.reg, m_u.reg, N);
4752  unsigned k = AlmostInverse(R, T, R, N, m_modulus.reg, N);
4753 
4754 // cout << "k=" << k << " N*32=" << 32*N << endl;
4755 
4756  if (k>N*WORD_BITS)
4757  DivideByPower2Mod(R, R, k-N*WORD_BITS, m_modulus.reg, N);
4758  else
4759  MultiplyByPower2Mod(R, R, N*WORD_BITS-k, m_modulus.reg, N);
4760 
4761  return m_result;
4762 }
4763 
4764 // Specialization declared in misc.h to allow us to print integers
4765 // with additional control options, like arbitrary bases and uppercase.
4766 template <> CRYPTOPP_DLL
4767 std::string IntToString<Integer>(Integer value, unsigned int base)
4768 {
4769  // Hack... set the high bit for uppercase. Set the next bit fo a suffix.
4770  static const unsigned int BIT_32 = (1U << 31);
4771  const bool UPPER = !!(base & BIT_32);
4772  static const unsigned int BIT_31 = (1U << 30);
4773  const bool BASE = !!(base & BIT_31);
4774 
4775  const char CH = UPPER ? 'A' : 'a';
4776  base &= ~(BIT_32|BIT_31);
4777  CRYPTOPP_ASSERT(base >= 2 && base <= 32);
4778 
4779  if (value == 0)
4780  return "0";
4781 
4782  bool negative = false, zero = false;
4783  if (value.IsNegative())
4784  {
4785  negative = true;
4786  value.Negate();
4787  }
4788 
4789  if (!value)
4790  zero = true;
4791 
4792  SecBlock<char> s(value.BitCount() / (SaturatingSubtract1(BitPrecision(base),1U)) + 1);
4793  Integer temp;
4794 
4795  unsigned int i=0;
4796  while (!!value)
4797  {
4798  word digit;
4799  Integer::Divide(digit, temp, value, word(base));
4800  s[i++]=char((digit < 10 ? '0' : (CH - 10)) + digit);
4801  value.swap(temp);
4802  }
4803 
4804  std::string result;
4805  result.reserve(i+2);
4806 
4807  if (negative)
4808  result += '-';
4809 
4810  if (zero)
4811  result += '0';
4812 
4813  while (i--)
4814  result += s[i];
4815 
4816  if (BASE)
4817  {
4818  if (base == 10)
4819  result += '.';
4820  else if (base == 16)
4821  result += 'h';
4822  else if (base == 8)
4823  result += 'o';
4824  else if (base == 2)
4825  result += 'b';
4826  }
4827 
4828  return result;
4829 }
4830 
4831 // Specialization declared in misc.h to avoid Coverity findings.
4832 template <> CRYPTOPP_DLL
4833 std::string IntToString<word64>(word64 value, unsigned int base)
4834 {
4835  // Hack... set the high bit for uppercase.
4836  static const unsigned int HIGH_BIT = (1U << 31);
4837  const char CH = !!(base & HIGH_BIT) ? 'A' : 'a';
4838  base &= ~HIGH_BIT;
4839 
4840  CRYPTOPP_ASSERT(base >= 2);
4841  if (value == 0)
4842  return "0";
4843 
4844  std::string result;
4845  while (value > 0)
4846  {
4847  word64 digit = value % base;
4848  result = char((digit < 10 ? '0' : (CH - 10)) + digit) + result;
4849  value /= base;
4850  }
4851  return result;
4852 }
4853 
4854 #ifndef CRYPTOPP_NO_ASSIGN_TO_INTEGER
4855 // Allow the linker to discard Integer code if not needed.
4856 // Also see http://github.com/weidai11/cryptopp/issues/389.
4857 bool AssignIntToInteger(const std::type_info &valueType, void *pInteger, const void *pInt)
4858 {
4859  if (valueType != typeid(Integer))
4860  return false;
4861  *reinterpret_cast<Integer *>(pInteger) = *reinterpret_cast<const int *>(pInt);
4862  return true;
4863 }
4864 #endif // CRYPTOPP_NO_ASSIGN_TO_INTEGER
4865 
4866 // *************************** C++ Static Initialization ***************************
4867 
4868 class InitInteger
4869 {
4870 public:
4871  InitInteger()
4872  {
4873  SetFunctionPointers();
4874  }
4875 };
4876 
4877 // This is not really needed because each Integer can dynamically initialize
4878 // itself, but we take a peephole optimization and initialize the class once
4879 // if init priorities are available. Dynamic initialization will be used if
4880 // init priorities are not available.
4881 
4882 #if defined(HAVE_GCC_INIT_PRIORITY)
4883  const InitInteger s_init __attribute__ ((init_priority (CRYPTOPP_INIT_PRIORITY + 10))) = InitInteger();
4884  const Integer g_zero __attribute__ ((init_priority (CRYPTOPP_INIT_PRIORITY + 11))) = Integer(0L);
4885  const Integer g_one __attribute__ ((init_priority (CRYPTOPP_INIT_PRIORITY + 12))) = Integer(1L);
4886  const Integer g_two __attribute__ ((init_priority (CRYPTOPP_INIT_PRIORITY + 13))) = Integer(2L);
4887 #elif defined(HAVE_MSC_INIT_PRIORITY)
4888  #pragma warning(disable: 4075)
4889  #pragma init_seg(".CRT$XCU")
4890  const InitInteger s_init;
4891  const Integer g_zero(0L);
4892  const Integer g_one(1L);
4893  const Integer g_two(2L);
4894  #pragma warning(default: 4075)
4895 #elif HAVE_XLC_INIT_PRIORITY
4896  // XLC needs constant, not a define
4897  #pragma priority(280)
4898  const InitInteger s_init;
4899  const Integer g_zero(0L);
4900  const Integer g_one(1L);
4901  const Integer g_two(2L);
4902 #else
4903  const InitInteger s_init;
4904 #endif
4905 
4906 // ***************** Library code ********************
4907 
4908 const Integer &Integer::Zero()
4909 {
4910 #if defined(HAVE_GCC_INIT_PRIORITY) || defined(HAVE_MSC_INIT_PRIORITY) || defined(HAVE_XLC_INIT_PRIORITY)
4911  return g_zero;
4912 #elif defined(CRYPTOPP_CXX11_STATIC_INIT)
4913  static const Integer s_zero(0L);
4914  return s_zero;
4915 #else // Potential memory leak. Avoid if possible.
4916  return Singleton<Integer, NewInteger<0L> >().Ref();
4917 #endif
4918 }
4919 
4920 const Integer &Integer::One()
4921 {
4922 #if defined(HAVE_GCC_INIT_PRIORITY) || defined(HAVE_MSC_INIT_PRIORITY) || defined(HAVE_XLC_INIT_PRIORITY)
4923  return g_one;
4924 #elif defined(CRYPTOPP_CXX11_STATIC_INIT)
4925  static const Integer s_one(1L);
4926  return s_one;
4927 #else // Potential memory leak. Avoid if possible.
4928  return Singleton<Integer, NewInteger<1L> >().Ref();
4929 #endif
4930 }
4931 
4932 const Integer &Integer::Two()
4933 {
4934 #if defined(HAVE_GCC_INIT_PRIORITY) || defined(HAVE_MSC_INIT_PRIORITY) || defined(HAVE_XLC_INIT_PRIORITY)
4935  return g_two;
4936 #elif defined(CRYPTOPP_CXX11_STATIC_INIT)
4937  static const Integer s_two(2L);
4938  return s_two;
4939 #else // Potential memory leak. Avoid if possible.
4940  return Singleton<Integer, NewInteger<2L> >().Ref();
4941 #endif
4942 }
4943 
4944 NAMESPACE_END
4945 
4946 #endif // CRYPTOPP_IMPORTS
#define MAYBE_UNCONST_CAST(T, x)
SunCC workaround.
Definition: adv_simd.h:595
#define MAYBE_CONST
SunCC workaround.
Definition: adv_simd.h:590
Classes for working with NameValuePairs.
AlgorithmParameters MakeParameters(const char *name, const T &value, bool throwIfNotUsed=true)
Create an object that implements NameValuePairs.
Definition: algparam.h:508
Classes and functions for working with ANS.1 objects.
OID operator+(const OID &lhs, unsigned long rhs)
Append a value to an OID.
CRYPTOPP_DLL size_t DEREncodeOctetString(BufferedTransformation &bt, const byte *str, size_t strLen)
DER encode octet string.
size_t DEREncodeUnsigned(BufferedTransformation &out, T w, byte asnTag=INTEGER)
DER Encode unsigned value.
Definition: asn.h:820
std::ostream & operator<<(std::ostream &out, const OID &oid)
Print a OID value.
Definition: asn.h:939
@ OCTET_STRING
ASN.1 Octet string.
Definition: asn.h:38
@ INTEGER
ASN.1 Integer.
Definition: asn.h:34
void BERDecodeError()
Raises a BERDecodeErr.
Definition: asn.h:104
virtual const Element & Gcd(const Element &a, const Element &b) const
Calculates the greatest common denominator in the ring.
Definition: algebra.cpp:56
virtual void SimultaneousExponentiate(Element *results, const Element &base, const Integer *exponents, unsigned int exponentsCount) const
Exponentiates a base to multiple exponents in the Ring.
Definition: algebra.cpp:334
virtual Element CascadeExponentiate(const Element &x, const Integer &e1, const Element &y, const Integer &e2) const
TODO.
Definition: algebra.cpp:323
Copy input to a memory buffer.
Definition: filters.h:1200
BER General Decoder.
Definition: asn.h:380
BER Sequence Decoder.
Definition: asn.h:525
Interface for buffered transformations.
Definition: cryptlib.h:1657
virtual size_t Get(byte &outByte)
Retrieve a 8-bit byte.
size_t GetWord16(word16 &value, ByteOrder order=BIG_ENDIAN_ORDER)
Retrieve a 16-bit word.
virtual size_t Peek(byte &outByte) const
Peek a 8-bit byte.
size_t PutWord16(word16 value, ByteOrder order=BIG_ENDIAN_ORDER, bool blocking=true)
Input a 16-bit word for processing.
size_t Put(byte inByte, bool blocking=true)
Input a byte for processing.
Definition: cryptlib.h:1678
virtual lword Skip(lword skipMax=LWORD_MAX)
Discard skipMax bytes from the output buffer.
virtual lword MaxRetrievable() const
Provides the number of bytes ready for retrieval.
Data structure used to store byte strings.
Definition: queue.h:23
size_t Get(byte &outByte)
Retrieve a 8-bit byte.
lword MaxRetrievable() const
Provides the number of bytes ready for retrieval.
Definition: queue.h:40
Used to pass byte array input as part of a NameValuePairs object.
Definition: algparam.h:25
size_t size() const
Length of the memory block.
Definition: algparam.h:88
const byte * begin() const
Pointer to the first byte in the memory block.
Definition: algparam.h:84
DER General Encoder.
Definition: asn.h:491
DER Sequence Encoder.
Definition: asn.h:557
Euclidean domain.
Definition: algebra.h:316
Exception thrown when division by 0 is encountered.
Definition: integer.h:56
Exception thrown when a random number cannot be found that satisfies the condition.
Definition: integer.h:64
Multiple precision integer with arithmetic operations.
Definition: integer.h:50
static void Divide(Integer &r, Integer &q, const Integer &a, const Integer &d)
Extended Division.
void DEREncode(BufferedTransformation &bt) const
Encode in DER format.
bool GetBit(size_t i) const
Provides the i-th bit of the Integer.
void SetByte(size_t n, byte value)
Set the n-th byte to value.
static void DivideByPowerOf2(Integer &r, Integer &q, const Integer &a, unsigned int n)
Extended Division.
Integer & operator&=(const Integer &t)
Bitwise AND Assignment.
bool IsPositive() const
Determines if the Integer is positive.
Definition: integer.h:347
Integer Minus(const Integer &b) const
Subtraction.
signed long ConvertToLong() const
Convert the Integer to Long.
Integer operator-() const
Subtraction.
void SetBit(size_t n, bool value=1)
Set the n-th bit to value.
Integer And(const Integer &t) const
Bitwise AND.
bool IsSquare() const
Determine whether this integer is a perfect square.
Integer Plus(const Integer &b) const
Addition.
Integer DividedBy(const Integer &b) const
Division.
void DEREncodeAsOctetString(BufferedTransformation &bt, size_t length) const
Encode absolute value as big-endian octet string.
void OpenPGPDecode(const byte *input, size_t inputLen)
Decode from OpenPGP format.
Integer & operator^=(const Integer &t)
Bitwise XOR Assignment.
bool NotZero() const
Determines if the Integer is non-0.
Definition: integer.h:338
Integer Times(const Integer &b) const
Multiplication.
Integer & operator=(const Integer &t)
Assignment.
void BERDecodeAsOctetString(BufferedTransformation &bt, size_t length)
Decode nonnegative value from big-endian octet string.
byte GetByte(size_t i) const
Provides the i-th byte of the Integer.
void Randomize(RandomNumberGenerator &rng, size_t bitCount)
Set this Integer to random integer.
bool IsConvertableToLong() const
Determines if the Integer is convertable to Long.
Integer Or(const Integer &t) const
Bitwise OR.
lword GetBits(size_t i, size_t n) const
Provides the low order bits of the Integer.
Integer & operator+=(const Integer &t)
Addition Assignment.
static Integer Power2(size_t e)
Exponentiates to a power of 2.
Integer & operator|=(const Integer &t)
Bitwise OR Assignment.
Integer Squared() const
Multiply this integer by itself.
Definition: integer.h:633
Integer()
Creates the zero integer.
void BERDecode(const byte *input, size_t inputLen)
Decode from BER format.
size_t MinEncodedSize(Signedness sign=UNSIGNED) const
Minimum number of bytes to encode this integer.
static const Integer & One()
Integer representing 1.
unsigned int BitCount() const
Determines the number of bits required to represent the Integer.
void Negate()
Reverse the Sign of the Integer.
bool NotNegative() const
Determines if the Integer is non-negative.
Definition: integer.h:344
Integer & operator++()
Pre-increment.
unsigned int WordCount() const
Determines the number of words required to represent the Integer.
Integer & operator--()
Pre-decrement.
RandomNumberType
Properties of a random integer.
Definition: integer.h:91
@ ANY
a number with no special properties
Definition: integer.h:93
@ PRIME
a number which is probabilistically prime
Definition: integer.h:95
bool operator!() const
Negation.
static const Integer & Zero()
Integer representing 0.
Integer & operator-=(const Integer &t)
Subtraction Assignment.
Integer AbsoluteValue() const
Retrieve the absolute value of this integer.
@ SIGNED
a signed value
Definition: integer.h:87
@ UNSIGNED
an unsigned value
Definition: integer.h:85
Integer Xor(const Integer &t) const
Bitwise XOR.
int Compare(const Integer &a) const
Perform signed comparison.
Integer Modulo(const Integer &b) const
Remainder.
size_t OpenPGPEncode(byte *output, size_t bufferSize) const
Encode absolute value in OpenPGP format.
void swap(Integer &a)
Swaps this Integer with another Integer.
bool IsZero() const
Determines if the Integer is 0.
Definition: integer.h:335
Integer MultiplicativeInverse() const
Calculate multiplicative inverse.
bool GenerateRandomNoThrow(RandomNumberGenerator &rng, const NameValuePairs &params=g_nullNameValuePairs)
Generate a random number.
Integer & operator<<=(size_t n)
Left-shift Assignment.
bool IsNegative() const
Determines if the Integer is negative.
Definition: integer.h:341
void Decode(const byte *input, size_t inputLen, Signedness sign=UNSIGNED)
Decode from big-endian byte array.
Sign
Used internally to represent the integer.
Definition: integer.h:73
@ NEGATIVE
the value is negative
Definition: integer.h:77
@ POSITIVE
the value is positive or 0
Definition: integer.h:75
static const Integer & Two()
Integer representing 2.
unsigned int ByteCount() const
Determines the number of bytes required to represent the Integer.
bool IsUnit() const
Determine if 1 or -1.
bool IsOdd() const
Determines if the Integer is odd parity.
Definition: integer.h:356
Integer & operator>>=(size_t n)
Right-shift Assignment.
static Integer Gcd(const Integer &a, const Integer &n)
Calculate greatest common divisor.
void Encode(byte *output, size_t outputLen, Signedness sign=UNSIGNED) const
Encode in big-endian format.
Integer InverseMod(const Integer &n) const
Calculate multiplicative inverse.
Integer SquareRoot() const
Extract square root.
bool IsEven() const
Determines if the Integer is even parity.
Definition: integer.h:353
An invalid argument was detected.
Definition: cryptlib.h:208
Ring of congruence classes modulo n.
Definition: modarith.h:44
ModularArithmetic(const Integer &modulus=Integer::One())
Construct a ModularArithmetic.
Definition: modarith.h:54
const Integer & Add(const Integer &a, const Integer &b) const
Adds elements in the ring.
const Integer & Inverse(const Integer &a) const
Inverts the element in the ring.
Integer & Reduce(Integer &a, const Integer &b) const
TODO.
void BERDecodeElement(BufferedTransformation &in, Element &a) const
Decodes element in DER format.
unsigned int MaxElementByteLength() const
Provides the maximum byte size of an element in the ring.
Definition: modarith.h:248
Integer & Accumulate(Integer &a, const Integer &b) const
TODO.
const Integer & Half(const Integer &a) const
Divides an element by 2.
void DEREncodeElement(BufferedTransformation &out, const Element &a) const
Encodes element in DER format.
void SimultaneousExponentiate(Element *results, const Element &base, const Integer *exponents, unsigned int exponentsCount) const
Exponentiates a base to multiple exponents in the ring.
Integer CascadeExponentiate(const Integer &x, const Integer &e1, const Integer &y, const Integer &e2) const
TODO.
const Integer & Subtract(const Integer &a, const Integer &b) const
Subtracts elements in the ring.
void DEREncode(BufferedTransformation &bt) const
Encodes in DER format.
Performs modular arithmetic in Montgomery representation for increased speed.
Definition: modarith.h:296
Integer ConvertOut(const Integer &a) const
Reduces an element in the congruence class.
const Integer & MultiplicativeInverse(const Integer &a) const
Calculate the multiplicative inverse of an element in the ring.
const Integer & Multiply(const Integer &a, const Integer &b) const
Multiplies elements in the ring.
MontgomeryRepresentation(const Integer &modulus)
Construct a MontgomeryRepresentation.
Interface for retrieving values given their names.
Definition: cryptlib.h:327
T GetValueWithDefault(const char *name, T defaultValue) const
Get a named value.
Definition: cryptlib.h:397
bool GetValue(const char *name, T &value) const
Get a named value.
Definition: cryptlib.h:384
CRYPTOPP_DLL bool GetIntValue(const char *name, int &value) const
Get a named value with type int.
Definition: cryptlib.h:420
Object Identifier.
Definition: asn.h:265
static void DeriveKey(byte *output, size_t outputLength, const byte *input, size_t inputLength, const byte *derivationParams, size_t derivationParamsLength)
P1363 key derivation function.
Definition: pubkey.h:760
Application callback to signal suitability of a cabdidate prime.
Definition: nbtheory.h:114
Interface for random number generators.
Definition: cryptlib.h:1440
virtual void GenerateBlock(byte *output, size_t size)
Generate random array of bytes.
Secure memory block with allocator and cleanup.
Definition: secblock.h:731
iterator begin()
Provides an iterator pointing to the first element in the memory block.
Definition: secblock.h:836
void CleanNew(size_type newSize)
Change size without preserving contents.
Definition: secblock.h:1143
void swap(SecBlock< T, A > &b)
Swap contents with another SecBlock.
Definition: secblock.h:1209
void CleanGrow(size_type newSize)
Change size and preserve contents.
Definition: secblock.h:1179
void New(size_type newSize)
Change size without preserving contents.
Definition: secblock.h:1126
size_type size() const
Provides the count of elements in the SecBlock.
Definition: secblock.h:867
void resize(size_type newSize)
Change size and preserve contents.
Definition: secblock.h:1198
SecBlock<byte> typedef.
Definition: secblock.h:1226
Restricts the instantiation of a class to one static object without locks.
Definition: misc.h:309
Square block cipher.
Definition: square.h:25
String-based implementation of Store interface.
Definition: filters.h:1259
Pointer that overloads operator ->
Definition: smartptr.h:38
Library configuration file.
#define CRYPTOPP_TABLE
Override for internal linkage.
Definition: config_dll.h:111
unsigned char byte
8-bit unsigned datatype
Definition: config_int.h:66
word64 word
Full word used for multiprecision integer arithmetic.
Definition: config_int.h:192
const unsigned int WORD_BITS
Size of a platform word in bits.
Definition: config_int.h:260
unsigned int word32
32-bit unsigned datatype
Definition: config_int.h:72
unsigned short word16
16-bit unsigned datatype
Definition: config_int.h:69
word128 dword
Double word used for multiprecision integer arithmetic.
Definition: config_int.h:203
unsigned long long word64
64-bit unsigned datatype
Definition: config_int.h:101
word32 hword
Half word used for multiprecision integer arithmetic.
Definition: config_int.h:184
const unsigned int WORD_SIZE
Size of a platform word in bytes.
Definition: config_int.h:255
word64 lword
Large word type.
Definition: config_int.h:168
Functions for CPU features and intrinsics.
ByteOrder
Provides the byte ordering.
Definition: cryptlib.h:148
@ LITTLE_ENDIAN_ORDER
byte order is little-endian
Definition: cryptlib.h:150
@ BIG_ENDIAN_ORDER
byte order is big-endian
Definition: cryptlib.h:152
Implementation of BufferedTransformation's attachment interface.
Multiple precision integer with arithmetic operations.
inline ::Integer operator-(const ::Integer &a, const ::Integer &b)
Subtraction.
Definition: integer.h:771
Utility functions for the Crypto++ library.
CRYPTOPP_DLL std::string IntToString< Integer >(Integer value, unsigned int base)
Converts an Integer to a string.
unsigned int BitPrecision(const T &value)
Returns the number of bits required for a value.
Definition: misc.h:1044
unsigned int BytePrecision(const T &value)
Returns the number of 8-bit bytes or octets required for a value.
Definition: misc.h:1021
size_t BitsToWords(size_t bitCount)
Returns the number of words required for the specified number of bits.
Definition: misc.h:1160
unsigned int TrailingZeros(word32 v)
Determines the number of trailing 0-bits in a value.
Definition: misc.h:1069
void ConditionalSwapPointers(bool c, T &a, T &b)
Performs a branch-less swap of pointers a and b if condition c is true.
Definition: misc.h:1560
T Crop(T value, size_t bits)
Truncates the value to the specified number of bits.
Definition: misc.h:1128
bool IsPowerOf2(const T &value)
Tests whether a value is a power of 2.
Definition: misc.h:1212
size_t BitsToBytes(size_t bitCount)
Returns the number of 8-bit bytes or octets required for the specified number of bits.
Definition: misc.h:1140
#define MEMORY_BARRIER
A memory barrier.
Definition: misc.h:272
size_t BytesToWords(size_t byteCount)
Returns the number of words required for the specified number of bytes.
Definition: misc.h:1150
CRYPTOPP_DLL std::string IntToString< word64 >(word64 value, unsigned int base)
Converts an unsigned value to a string.
const T & STDMIN(const T &a, const T &b)
Replacement function for std::min.
Definition: misc.h:657
void PutWord(bool assumeAligned, ByteOrder order, byte *block, T value, const byte *xorBlock=NULL)
Access a block of memory.
Definition: misc.h:2941
const T & STDMAX(const T &a, const T &b)
Replacement function for std::max.
Definition: misc.h:668
T1 SaturatingSubtract1(const T1 &a, const T2 &b)
Performs a saturating subtract clamped at 1.
Definition: misc.h:1312
Class file for performing modular arithmetic.
Crypto++ library namespace.
const char * PointerToPrimeSelector()
const PrimeSelector *
Definition: argnames.h:42
const char * Seed()
ConstByteArrayParameter.
Definition: argnames.h:19
Classes and functions for number theoretic operations.
CRYPTOPP_DLL bool FirstPrime(Integer &p, const Integer &max, const Integer &equiv, const Integer &mod, const PrimeSelector *pSelector)
Finds a random prime of special form.
ASN.1 object identifiers for algorithms and schemes.
Precompiled header file.
This file contains helper classes/functions for implementing public key algorithms.
Classes and functions for secure memory allocations.
void swap(::SecBlock< T, A > &a, ::SecBlock< T, A > &b)
Swap two SecBlocks.
Definition: secblock.h:1289
Classes for SHA-1 and SHA-2 family of message digests.
Classes for automatic resource management.
Common C++ header files.
#define CRYPTOPP_ASSERT(exp)
Debugging and diagnostic assertion.
Definition: trap.h:68
Support functions for word operations.
void ShiftWordsRightByWords(word *r, size_t n, size_t shiftWords)
Right shift word array.
Definition: words.h:212
void XorWords(word *r, const word *a, const word *b, size_t n)
XOR word arrays.
Definition: words.h:66
void SetWords(word *r, word a, size_t n)
Set the value of words.
Definition: words.h:35
void OrWords(word *r, const word *a, const word *b, size_t n)
OR word arrays.
Definition: words.h:120
word ShiftWordsLeftByBits(word *r, size_t n, unsigned int shiftBits)
Left shift word array.
Definition: words.h:149
void ShiftWordsLeftByWords(word *r, size_t n, size_t shiftWords)
Left shift word array.
Definition: words.h:194
size_t CountWords(const word *x, size_t n)
Count the number of words.
Definition: words.h:21
word ShiftWordsRightByBits(word *r, size_t n, unsigned int shiftBits)
Right shift word array.
Definition: words.h:172
void CopyWords(word *r, const word *a, size_t n)
Copy word array.
Definition: words.h:48
void AndWords(word *r, const word *a, const word *b, size_t n)
AND word arrays.
Definition: words.h:93