Adding a Hash

From Crypto++ Wiki
Jump to: navigation, search

Crypto++ can be a challenge when attempting to add new algorithms, especially for new users. This article will discuss how to add a new hash algorithm to the library. There's nothing revolutionary about this article. Rather, it walks you through the steps you would likely take on your own while explaining why the library does some things. It may also help with understanding the use of Curiously Recurring Template Pattern in the library.

The example below adds IdentityHash, which is a hash that copies the first N bytes of input to an internal buffer and then provides it as the hashed data. N is a template parameter, and it represents the digest size of the hash. The new hash derives from HashTransformation as a base class.

While not readily apparent, the IdentityHash may be useful for "raw" signing a hash under a private key. For example, the technique was used at Sign precomputed hash with ECDSA or DSA on Stack Overflow to sign an existing hash. From the security engineering perspective, you should avoid doing this because it disgorges the message to be signed from the digest. Put another way, you may not know what you are signing.

Modern hashes, like BLAKE2, are designed to operate both with and without a key. If you have a hash that can operate both ways, then use a MessageAuthenticationCode instead of a HashTransformation as a base class.

HashTransformation

HashTransformation is the base class to use for hash classes. The interface is defined in cryptlib.h and most methods have a default implementation. There are three or four items that you need to add to get a working hash.

The easiest way to determine what you need is try to compile a class which derives from HashTransformation. The methods you need to provide are pure virtuals without a body, and they will cause a compile error. In the case of IdentityHash:

$ cat test.cxx
#include "cryptlib.h"
using namespace CryptoPP;

class IdentityHash : public HashTransformation
{
};

int main(int argc, char* argv[])
{
    IdentityHash hash;
    return 0;
}

The compile results in:

test.cxx:4:7: note:   because the following virtual functions are pure within  IdentityHash’:
 class IdentityHash : public HashTransformation
       ^~~~~~~~~~~~
In file included from test.cxx:1:0:
cryptlib.h:949:15: note:        virtual void CryptoPP::HashTransformation::Update(const byte*, size_t)
  virtual void Update(const byte *input, size_t length) =0;
               ^~~~~~
cryptlib.h:976:23: note:        virtual unsigned int CryptoPP::HashTransformation::DigestSize() const
  virtual unsigned int DigestSize() const =0;
                       ^~~~~~~~~~
cryptlib.h:1045:15: note:       virtual void CryptoPP::HashTransformation::TruncatedFinal(CryptoPP::byte*, size_t)
  virtual void TruncatedFinal(byte *digest, size_t digestSize) =0;
               ^~~~~~~~~~~~~~

Required Functions

From the earlier compile results there are three functions you must implement: DigestSize, Update, and TruncatedFinal. DigestSize is a runtime function that returns the size of the digest. Update is the business logic of a hash and it usually implements the algorithm. The function buffers the input if the input is too small, and it can be called multiple times.

TruncatedFinal is the method that finalizes the hash. It may pad the lost block of buffered input and then it completes processing the input. It also resets the hash for the next input. TruncatedFinal can output 0 to DigestSize() - 1 bytes. That means you can get the full digest, or you can ask for a partial digest. Other function that finalize the hash, like Final, are routed into TruncatedFinal.

DigestSize

Because we need to provide a value for DigestSize, IdentityHash needs a template parameter that gets returned when DigestSize is called. So the first change to make is:

template <unsigned int HASH_SIZE = 32>
class IdentityHash : public HashTransformation
{
public:
    virtual unsigned int DigestSize() const
    {
        return HASH_SIZE;
    }
};

In real life, you will probably use a CRYPTOPP_CONSTANT because the digest size is fixed, and you don't need the template parameter. We revisit CRYPTOPP_CONSTANT below.

Update

The next item which needs tending is Update. Update is usually where your algorithm is implemented. Its also where buffering usually occurs. IdentityHash buffers the first N bytes of input to use as the digest when TruncatedFinal is called. Our hash just copies bytes into an accumulator.

The change would look like similar to below. m_digest is the accumulator, and m_idx track where to write and what's been written. The extra gyrations try to ensure unexpected parameters and wrap is handled gracefully.

template <unsigned int HASH_SIZE = 32>
class IdentityHash : public HashTransformation
{
public:
    virtual unsigned int DigestSize() const
    {
        return HASH_SIZE;
    }

    virtual void Update(const byte *input, size_t length)
    {
        size_t sz = STDMIN(STDMIN<size_t>(DIGESTSIZE, length),
                                          DIGESTSIZE - m_idx);
        if (sz)
            ::memcpy(&m_digest[m_idx], input, sz);
        m_idx += sz;
    }

private:
    SecByteBlock m_digest;
    size_t m_idx;
};

TruncatedFinal

TruncatedFinal finalizes the hash and copies the result to the caller. It may pad the last block of buffered input and then completes processing the input. It also validates the requested digest size. In the case of IdentityHash it also adds some business logic to ensure HASH_SIZE bytes are input.

The change would look like similar to below. ThrowIfInvalidTruncatedSize is built into the library. It uses DigestSize to validate the requested size and throws an exception if the size is invalid.

Copying the hash to the buffer supplied by the user is guarded for a NULL pointer. Using a NULL pointer with memcpy is undefined behavior in C and C++. Some users may call TruncatedFinal with a NULL pointer to reset the hash. In fact, the default implementation of Restart in cryptlib.h does so.

template <unsigned int HASH_SIZE = 32>
class IdentityHash : public HashTransformation
{
public:
    virtual unsigned int DigestSize() const
    {
        return HASH_SIZE;
    }

    virtual void Update(const byte *input, size_t length)
    {
        size_t sz = STDMIN(STDMIN<size_t>(DIGESTSIZE, length),
                                          DIGESTSIZE - m_idx);
        if (sz)
            ::memcpy(&m_digest[m_idx], input, sz);
        m_idx += sz;
    }

    virtual void TruncatedFinal(byte *digest, size_t digestSize)
    {
        // Validate input
        if (m_idx != HASH_SIZE)
            throw Exception(Exception::OTHER_ERROR, "Input size must be " + IntToString(HASH_SIZE));

        // Validate output
        ThrowIfInvalidTruncatedSize(digestSize);

        // Copy the input to output
        if (digest)
            ::memcpy(digest, m_digest, digestSize);

        // Reset for next hash
        m_idx = 0;
    }

private:
    SecByteBlock m_digest;
    size_t m_idx;
};

Additional Members

DigestSize, Update, and TruncatedFinal provide the meat and potatoes of a Crypto++ hash. We call it the "meat and potatoes" because it the business logic and implementation of the hash algorithm.

There are a few loose ends to tie up before the hash can be used in the library. They would be discovered when you use IdentityHash in a real program.

Construction

As you study the code you probably noticed we don't depend on constructions very much. That's by design and the reasons are not discussed here. However, the object still needs some initialization since the constructor is where Init occurs in the Crypto++ implementation of Init/Update/Final model.

For initialization IdentityHash needs the accumulator properly sized and m_idx set to an initial value. The initialization would look as shown below.

One thing to keep in mind when designing your class is, the testing framework does not know how to select an overloaded constructor to get you object into a certain state. The constructor should be simple and get the object into a state its ready to start processing data.

More complex algorithms, like block ciphers operated in a mode of operation, have particular methods the testing framework calls for tasks like setting keys and initialization vectors. If more information needs to be passed to an object, then NameValuePairs are used to pass the additional information. However, HashTransformation does not use them.

template <unsigned int HASH_SIZE = 32>
class IdentityHash : public HashTransformation
{
public:
    IdentityHash() : m_digest(HASH_SIZE), m_idx(0) {}

    virtual unsigned int DigestSize() const
    {
        return HASH_SIZE;
    }

    virtual void Update(const byte *input, size_t length)
    {
        size_t sz = STDMIN(STDMIN<size_t>(DIGESTSIZE, length),
                                          DIGESTSIZE - m_idx);
        if (sz)
            ::memcpy(&m_digest[m_idx], input, sz);
        m_idx += sz;
    }

    virtual void TruncatedFinal(byte *digest, size_t digestSize)
    {     
        if (m_idx != HASH_SIZE)
            throw Exception(Exception::OTHER_ERROR, "Input size must be " + IntToString(HASH_SIZE));

        ThrowIfInvalidTruncatedSize(digestSize);

        if (digest)
            ::memcpy(digest, m_digest, digestSize);

        m_idx = 0;
    }

private:
    SecByteBlock m_digest;
    size_t m_idx;
};

Restart

Initializing and restarting some hash functions are non-trivial. Some hashes provide a Restart function that's called in the constructor and TruncatedFinal. Its up to you when you want to add a Restart function.

Restart is declared in cryptlib.h, and the default implementation performs:

virtual void Restart()
    {TruncatedFinal(NULLPTR, 0);}

Our implementation of IdentityHash could provide an override which performs:

virtual void Restart()
    {m_idx = 0;}

DIGESTSIZE

All Crypto++ hashes provide a constant called DIGESTSIZE. Its a compile time constant, and its often used by DigestSize at runtime. With the standard constant DIGESTSIZE in place, we can switch to it instead of the non-standard HASH_SIZE.

template <unsigned int HASH_SIZE = 32>
class IdentityHash : public HashTransformation
{
public:
    CRYPTOPP_CONSTANT(DIGESTSIZE = HASH_SIZE)

    IdentityHash() : m_digest(DIGESTSIZE), m_idx(0) {}

    virtual unsigned int DigestSize() const
    {
        return DIGESTSIZE;
    }
    
    ...
};

StaticAlgorithmName

StaticAlgorithmName returns the name of the hash. Its used extensively throughout the library, and it mostly surfaces under the benchmark programs. Curiously Recurring Template Pattern provides polymorphic behavior for the static function.

template <unsigned int HASH_SIZE = 32>
class IdentityHash : public HashTransformation
{
public:
    CRYPTOPP_CONSTANT(DIGESTSIZE = HASH_SIZE)

    static const char * StaticAlgorithmName()
    {
        return "IdentityHash";
    }
 
    ...
};

AlgorithmName

AlgorithmName can be used to fine tune the algorithm name. In the case of IdentityHash the digest size can be added. If StaticAlgorithmName provides the complete name, then AlgorithmName is not needed.

template <unsigned int HASH_SIZE = 32>
class IdentityHash : public HashTransformation
{
public:
    CRYPTOPP_CONSTANT(DIGESTSIZE = HASH_SIZE)

    static const char * StaticAlgorithmName()
    {
        return "IdentityHash";
    }

    std::string AlgorithmName() const
    {
        return std::string(StaticAlgorithmName()) + "-" + IntToString(DIGESTSIZE*8));
    }
 
    ...
};

More Members

IdentityHash was fairly simple and it only needed a few member functions to become operational. Real algorithms often need to use more facilities from the library. Some of them are listed below (and some of them are not even class members).

CRYPTOPP_NO_VTABLE

You will often see CRYPTOPP_NO_VTABLE used in class declarations. It is a preprocessor macro and used to help flatten objects by removing intermediate object vtables.

CRYPTOPP_NO_VTABLE is used with Microsoft compilers on Windows. On Windows the macro expands to __declspec(novtable); while on other platforms it is empty. CRYPTOPP_NO_VTABLE should only be applied to pure interface classes, meaning classes that will never be instantiated on their own.

OptimalDataAlignment

OptimalDataAlignment allows you specify how you would like data aligned. Some algorithms can operate efficiently on bytes, while others need aligned for 32-bit or 64-bit words, and still others need 16-byte alignment for SSE2.

cryptlib.h provides a default implementation of OptimalDataAlignment for BlockTransformation, StreamTransformation and HashTransformation. From cryptlib.cpp:

unsigned int HashTransformation::OptimalDataAlignment() const
{
    return GetAlignmentOf<word32>();
}

Using IdentityHash

With all the pieces in place the sample program would look as follows.

$ cat test.cxx
#include "cryptlib.h"
#include "secblock.h"

#include <iostream>
#include <string>

using namespace CryptoPP;

template <unsigned int HASH_SIZE = 32>
class IdentityHash : public HashTransformation
{
public:
    CRYPTOPP_CONSTANT(DIGESTSIZE = HASH_SIZE)

    static const char * StaticAlgorithmName()
    {
        return "IdentityHash";
    }

    IdentityHash() : m_digest(DIGESTSIZE), m_idx(0) {}

    virtual unsigned int DigestSize() const
    {
        return DIGESTSIZE;
    }

    virtual void Update(const byte *input, size_t length)
    {
        size_t sz = STDMIN(STDMIN<size_t>(DIGESTSIZE, length),
                                          DIGESTSIZE - m_idx);
        if (sz)
            ::memcpy(&m_digest[m_idx], input, sz);
        m_idx += sz;
    }

    virtual void TruncatedFinal(byte *digest, size_t digestSize)
    {
        if (m_idx != DIGESTSIZE)
            throw Exception(Exception::OTHER_ERROR, "Input size must be " + IntToString(DIGESTSIZE));

        ThrowIfInvalidTruncatedSize(digestSize);

        if (digest)
            ::memcpy(digest, m_digest, digestSize);

        m_idx = 0;
    }

    std::string AlgorithmName() const
    {
        return std::string(StaticAlgorithmName()) + "-" + IntToString(DIGESTSIZE*8));
    }

private:
    SecByteBlock m_digest;
    size_t m_idx;
};

int main(int argc, char* argv[])
{
    std::string message(32, 'A');

    IdentityHash<32> hash;
    hash.Update((const byte*)message.data(), message.size());

    std::string digest(32, 0);
    hash.TruncatedFinal((byte*)digest.data(), digest.size());

    std::cout << "Message: " << message << std::endl;
    std::cout << " Digest: " << digest << std::endl;

    return 0;
}

Running the program produces the expected output:

$ ./test.exe
Message: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
 Digest: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

A more complete example using a random private key to sign the precomputed hash is shown below.

int main(int argc, char* argv[])
{
    AutoSeededRandomPool prng;

    ECDSA<ECP, IdentityHash<32> >::PrivateKey privateKey;
    privateKey.Initialize(prng, ASN1::secp256r1());

    std::string message(32, 'A'), signature;
    ECDSA<ECP, IdentityHash<32> >::Signer signer(privateKey);

    StringSource ss(message, true,
                        new SignerFilter(prng, signer,
                            new HexEncoder(new StringSink(signature))
                        ) // SignerFilter
                    ); // StringSource

    std::cout << "Signature: " << signature << std::endl;

    return 0;
}

Algorithm Testing

Once you have an algorithm cut-in you usually want to test it. The following thee sections details how to perform testing and evaluation within the Crypto++ testing framework.

Algorithm Registration

The Crypto++ test framework includes an object registry for testing and benachmarks. IdentityHash is a good example of how to register an algorithms because AlgorithmName always returns IdentityHash, and not IdentityHash-256, IdentityHash-512, etc.

To register algorithm variations by name, open regtest2.cpp and the following to register 32-byte and 64-byte variants of IdentityHash. They are used below in Library Testing.

RegisterDefaultFactoryFor<HashTransformation, IdentityHash<32> >("IdentityHash-256");
RegisterDefaultFactoryFor<HashTransformation, IdentityHash<64> >("IdentityHash-512");

Algorithm Validation

A real hash should have test vectors, and the vectors should be exercised by the cryptest.exe program. Adding the functionality requires five steps. First, open validate.h and a declaration for TestIdentityHash.

Second, open test.cpp and add a call to ValidateIdentityHash in Validate at the bottom of the source file:

bool Validate(int alg, bool thorough, const char *seedInput)
{
    ...
    switch(alg)
    {
        case 0: result = Test::ValidateAll(thorough); break;
        ...
        case 500: result = Test::ValidateIdentityHash(); break;
    }
}

Third, open validat1.cpp and add ValidateIdentityHash to the function ValidateAll:

bool ValidateAll(bool thorough)
{
    bool pass=TestSettings();
    ...

    pass=ValidateIdentityHash() && pass;
    ...
}

Fourth, open validat2.cpp and add the implementation. In the case of IdentityHash we can use known answers:

bool ValidateIdentityHash()
{
    std::cout << "\nIdentityHash validation suite running...\n\n";
    return RunTestDataFile(CRYPTOPP_DATA_DIR "TestVectors/identhash.txt");
}

Fifth, add the following to TestVectors/identhash.txt. Be mindful of whitespace because the Crypto++ parser is sensitive to the location of new lines when parsing. An empty blank line indicates the start of a new algorithm, and its easy to add an inappropriate one at the wrong time.

AlgorithmType: MessageDigest
Source: Calculated offline with Crypto++ library
Name: IdentityHash-256
Comment: 32-byte hash
Message: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Digest: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Test: Verify
Name: IdentityHash-256
Comment: 32-byte hash, 64-byte input
Message: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Digest: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Test: Verify
Name: IdentityHash-512
Comment: 64-byte hash
Message: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Digest: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Test: Verify
Name: IdentityHash-512
Comment: 64-byte hash, 96-byte input
Message: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Digest: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 
Test: Verify

Algorithm Benchmarking

To add IdentityHash to the benchmarking gear, open bench2.cpp and add the following around the other hashes.

BenchMarkByNameKeyLess<HashTransformation>("IdentityHash-256");
BenchMarkByNameKeyLess<HashTransformation>("IdentityHash-512");