RDRAND and RDSEED

From Crypto++ Wiki
Jump to: navigation, search
RDRAND and RDSEED
Documentation
#include <cryptopp/rdrand.h>

RDRAND is a class file to access Intel and AMD's rdrand instruction of the same name. RDSEED is a similar class for access to the rdseed instruction. The classes access a hardware random number generator provided on-die with the some IA32 CPUs. Both RDRAND and RDSEED are included in the header file rdrand.h.

AMD and Intel each provide both RDRAND and RDSEED. RDRAND was provided with Intel's Ivy Bridge processors, while RDSEED made its debut with Broadwell. AMD added RDRAND in Bulldozer v4, and RDSEED in Ryzen. Intel's RDRAND circuit provides random numbers that satisfy NIST SP800-90A; while RDSEED provides random numbers that satisfy NIST SP800-90B and SP800-90C. Its not clear if AMD processors generate values according to a particular standard.

The library provides unconditional support regardless of compiler or intrinsics availability. GCC ASM and MASM/MASM64 assembly language routines are provided to ensure the library can call the instruction if its available. In addition, intrinsic support is available if desired (but the compiler must support it). Microsoft uses Intel's intrinsics, so a barrier was present for non-Intel CPU users. The barrier required MASM/MASM64-compatible ASM to overcome the spurious limitation.

The project takes no position on the suitability of RDRAND or RDSEED as a generator. If you are concerned about the Bull Run program and possible tampering, then you can avoid the generators in their entirety. Or, you can use the generators as a source and extract its entropy using a class like HKDF. Or, you can combine the output of RDRAND or RDSEED with another generator using xorbuff from misc.h. Or, you can use the generators directly. It all depends on your risk adversity and comfort zone, and the project or library cannot make that choice for you.

RDRAND and RDSEED were added to the library in Crypto++ 5.6.3, and it was back-ported to Visual Studio 2005 solution files.

Classes

rdrand.h provides class files for both RDRAND and RDSEED. The classes are nearly identical. Intel's RDRAND is designed to never underflow, but its not clear what AMD's behavior is.

As of Crypto++ 6.0, each generator provides one constructor that takes no arguments. The constructor will throw a RDRAND_Err or RDSEED_Err if the hardware is not available. be sure to use HasRDRAND or HasRDSEED before creating one. Also see Sample Program below for an example.

Crypto++ 5.6.5 and earlier provided constructors that accepted a number of retries. The retry parameter was removed from Crypto++ 6.0 for three reasons. First, the book keeping overhead made the generator run 30% to 40% slower. Second, an unconditional retry was a better strategy since the user wants the random bytes without guessing at the number of retries needed. Third, RDRAND did not underflow and it was hard to guess how many retires were needed for RDSEED.

After Crypto++ added RNG benchmarks, we also learned the latencies of RDRAND and RDSEED were more important than the number of retries. Even if the library retried a "generate" immediately, the one cycle jump to the "generate" instruction was dwarfed by, say, a 10 cycle latency for RDRAND or a 30 cycle latency for RDSEED instruction.

Constructor

RDRAND ()

RDSEED ()

Each generator provides one constructor that takes no arguments. The constructor will throw a RDRAND_Err or RDSEED_Err if the hardware is not available.

Seeding

The RDRAND and RDSEED generators do not accept a seed. If you call CanIncorporateEntropy, then it will return false. If you call IncorporateEntropy, then the generator will ignore the request.

Generating Bytes

Use GenerateBlock to retrieve a block of bytes. If the instruction is not available, then the generator will SIGILL. The generator does not throw an exception.

You can test for the presence of CPU support by calling HasRDRAND() or HasRDSEED(). See the example below at Sample Program.

Discarding Bytes

The RDRAND and RDSEED generators will discard bytes if requested. The implementation rounds up the number of bytes to machine words and then discards the equivalent number machine words.

If you are experimenting with RDRAND and RDSEED and you want to discard actual bytes (and not machine words), then you will need to modify the Crypto++ sources.

Exceptions

The RDRAND class will throw a RDRAND_Err exception, while the RDSEED class will throw a RDSEED_Err. They only throw an exception when a suitable implementation cannot be located at compile time.

At runtime the availability must be checked with either HasRDRAND() or HasRDSEED(). The generator will throw an exception during construction if the generators are not available.

If you call GenerateBlock and a generator is not available, then a SIGILL will result.

Vendor Support

Intel provided the RDRAND circuit in late 2012, while AMD provided equivalent support in June 2015. According to the AMD Programmers Manual, AMD provides both RDRAND and RDSEED circuit.

The compilers that support the instructions are:

  • Clang added RDRAND in July 2012, Clang 3.2
  • GCC added RDRAND in December 2010, GCC 4.6
  • Intel added RDRAND in September 2011, ICC 12.1
  • Microsoft added RDRAND in August 2012, VS2012
  • Microsoft added RDSEED in November 2013, VS2013
  • Sun added RDRAND in November 2014, Sun Studio 12.4
  • Sun added RDSEED in June 2016, Sun Studio 12.5

If you know of a compiler that supports the instruction but is missing, then please discuss it on the mailing list.

ASM vs Intrinsics

The source files allow either an ASM implementation or Intrinsics. The ASM is more flexible because it does not require the compiler to support the rdrand or rdseed instructions. However, the Intrinsics are enabled by default because most toolchains from the last 5 years or so support them.

The library's assembly code is usually a little faster than the intrinsics. That's because the assembly code generates four machine-word blocks, and then reduces to single machine-words for tail bytes. The 4-word blocks save about a dozen compares and jumps, and it provides about an 8% to 10% increase in performance. For example, a RDRAND generator which nominally runs at 198 MiB/s would increase to about 215 MiB/s.

You can enable ASM on Unix and Linux compatibles, like OS X and Solaris, with the following:

rm -f rdrand*.o
./rdrand-nasm.sh
HAS_NASM=1 make -j 4

rdrand-nasm.sh builds the object files with NASM. The three object file artifacts are rdrand-x86.o, rdrand-x32.o and rdrand-x64.o. HAS_NASM=1 ensures rdrand.o is built against NASM's object files.

CPU Opcodes

Earlier it was stated ... the ASM source files have the opcodes hard coded into the .CODE section. Here are the relevant opcodes from the MASM/MASM64 sources:

Call_RDRAND_EAX:
    DB 0Fh, 0C7h, 0F0h

Call_RDRAND_RAX:
    DB 048h, 0Fh, 0C7h, 0F0h

Call_RDSEED_EAX:
    DB 0Fh, 0C7h, 0F8h

Call_RDSEED_RAX:
    DB 048h, 0Fh, 0C7h, 0F8h

You can cross check the opcodes using an assembler like YASM. Simply view the listing file created with the following program:

; rdseed.asm:   a RDSEED program for NASM and YASM
;
; assemble:	nasm -f {win32|win64|elf} -l rdseed.lst rdseed.asm -o rdseed.o

        SECTION .text        ; code section
        global main          ; make label available to linker 
main:                        ; standard entry point
	
        rdseed  eax             ; rdseed rax 

        mov	ebx,0		; exit code, 0=normal
        mov	eax,1		; exit command to kernel
        int	0x80		; interrupt 80 hex, call kernel

Performance

The throughput of the RDRAND and RDSEED generators vary wildly depending on processor family, cpu sub-architecture and processor manufacturer. Additionally, RDSEED appears to run from 1/2 to 1/5 the rate of RDRAND on Intel hardware. Below is a comparison of data gathered using the Crypto++ benchmark program. Results were cross-validated with Jack Lloyd's Botan.

The benchmark program is basic, and it only uses a single thread running on a single core. Performance can be easily improved by spinning up additional pthreads to perform work on available cores.

Processor RDRAND
MiB/s
RDRAND
Cycles/Byte
RDSEED
MiB/s
RDSEED
Cycles/Byte
Comment
Athlon 845 X4 1 4119 - - AMD Bulldozer v4 @ 3.5 GHz
Ryzen 7 1700X 11 282 11 283 AMD Ryzen 7 @ 3.4 GHz
Celeron J3455 6 251 3 419 Low end Celeron @ 1.5 GHz
Atom Z3735 9 145 - - Low end Atom @ 1.3 GHz
Core i5-3200 212 11 - - Ivy Bridge (3rd gen) @ 2.6 GHz
Xeon E5-2666 87 32 - - Haswell (4th gen) @ 2.9 GHz
Core i7-4980 78 34 - - Haswell (4th gen) @ 2.8 GHz
Core i5-5300 67 39 15 150 Broadwell (5th gen) @ 2.3 GHz
Core i5-6400 66 48 25 121 Skylake (6th gen) @ 2.7 GHz
Core XX-7xxx ? ? ? ? Kabylake (7th gen) @ x.x GHz

Sample Program

The first example program guards the use of a RDRAND generator. It also uses member_ptr from smartptr.h to avoid warnings (auto_ptr) and missing classes (unique_ptr) among C++03 and C++11.

member_ptr<RandomNumberGenerator> prng(HasRDRAND() ? new RDRAND : new AutoSeededRandomPool);
SecByteBlock key(AES::DEFAULT_KEYLENGTH), iv(AES::BLOCKSIZE);

prng->GenerateBlock(key, key.size());
prng->GenerateBlock(iv, iv.size());

The second example shows how you could XOR a RDRAND generator with another generator.

class CombinedRNG : public RandomNumberGenerator
{
public:
    CombinedRNG(RandomNumberGenerator& rng1, RandomNumberGenerator& rng2)
        : m_rng1(rng1), m_rng2(rng2) {}

    bool CanIncorporateEntropy () const
    {
        return m_rng1.CanIncorporateEntropy() ||
            m_rng2.CanIncorporateEntropy();
    }

    void IncorporateEntropy (const byte *input, size_t length)
    {
        if (m_rng1.CanIncorporateEntropy())
            m_rng1.IncorporateEntropy(input, length);
        if (m_rng2.CanIncorporateEntropy())
            m_rng2.IncorporateEntropy(input, length);
    }

    void GenerateBlock (byte *output, size_t size)
    {
        RandomNumberSource(m_rng1, size, true, new ArraySink(output, size));
        RandomNumberSource(m_rng2, size, true, new ArrayXorSink(output, size));
    }

private:
    RandomNumberGenerator &m_rng1, &m_rng2;
};

int main (int argc, char* argv[])
{
    RDRAND prng1;
    AutoSeededRandomPool prng2;
    CombinedRNG prng3(prng1, prng2);

    RandomNumberSource src(prng3, 32, true, new HexEncoder(new FileSink(cout)));
    std::cout << std::endl;

    return 0;
}

The final example shows how you could extract entropy from a RDRAND generator, and use it with a key derivation function.

int main (int argc, char **argv)
{
    SecByteBlock key(AES::DEFAULT_KEYLENGTH);

    RDRAND rdrand;
    rdrand.GenerateBlock(key, key.size());

    cout << "Pre-extraction:" << endl;
    StringSource(key, key.size(), true, new HexEncoder(new FileSink(cout)));
    std::cout << std::endl;

    HKDF<SHA256> kdf;
    kdf.DeriveKey(key, key.size(), key, key.size());

    cout << "Post-extraction:" << endl;
    StringSource(key, key.size(), true, new HexEncoder(new FileSink(cout)));
    std::cout << std::endl;

    return 0;
}

The final example produces output similar to below.

$ ./test.exe
Pre-extraction:
651CEA46CE5E469AFCF79BE2F67DEB0C
Post-extraction:
AAAEA9FF9D0A83A1E7573391474B98AB

Downloads

RDRAND.zip - Class files and ASM files for RDRAND and RDSEED. The files can be used with earlier versions of Crypto++, like 5.6.2.