Std::byte

From Crypto++ Wiki
Jump to navigation Jump to search

C++17 added std::byte in 2016 by way of P0298R0, A byte type definition. The Crypto++ library also provides a byte in the global namespace. Additionally, Windows provides a byte in their Windows Kit. Compile problems can arise under certain conditions when the language's byte or Windows' byte collides with the library's byte. A secondary problem for the library is, the language's byte is not semantically equivalent to the library's byte.

The incompatibilities introduced by C++17 were design choices, like ensuring a std::byte was neither a character type or integer type, and only bit operations are allowed on it. Generally speaking, they were good design choices that intersected badly with some libraries and programs. Not all libraries were affected. For example, Jack Lloyd's Botan uses uint8_t rather than a byte, so there was no clash of the types.

Romain Geissler performed the early testing and made a couple of pull requests to reconcile the issues with the library and user programs when using C++17 and std::byte were in effect and std symbols were placed in the global namespace; see PR 438, Use ::byte instead of byte. This page will discuss how and why the Crypto++ library was modified to minimize user discomfort when using both C++17 and the Crypto++ library. Uri Blumenthal, Romain Geissler, Marcel Raad and Jeffrey Walton worked on finding the change that caused the least amount of pain.

We opened Issue 442, Test C++17 byte change with dry runs from various projects to document the changes and estimate the impact on dependent projects. In Issue 442, we called for testers based on GitHub projects with activity in 2016 and 2017. We also performed the fix on several open source projects so we could witness the pain points first hand.

If you need to setup a rig to test the changes below, then Fedora 26 ships with GCC 7.1. You can also find GCC 7 in Debian 10/Buster (Debian 9 with Testing repo enabled). For Debian, you must apt-get install gcc-7 g++-7. Its not clear which version of the Microsoft compilers support C++17 and std::byte.

The change detailed below occurred at Crypto++ 6.0 because it broke ABI compatibility. The check-in occurred at 00f9818b5d8e.

The Problem

The C++17 language provides a std::byte. Crypto++ also provides a byte and its in the global namespace. The situation gives rise to at least two problems. The first problem is a collision of symbols under some circumstances. The second is, C++17 std::byte is not semantically equivalent to the library's byte type.

Symbol collision

The first problem arises when std::byte and Crypto++ byte clash. It can happen, for example, if the std namespace is dumped into the global namespace (the using namespace std; below). It can also happen when Windows SDK and Kits are used in a program. It was first reported by Romain Geissler; see PR 437, Fix C++17 build and PR 438, Use ::byte instead of byte.

using namespace std;
int main(int argc, char* argv[])
{
    // Ambiguous: is it std::byte or Crypto++ byte?
    byte block[16];
    ...
}

The Crypto++ byte has been around since the early 1990s. Its presence predates C++ namespaces by about 5 years. When C++ added namespaces around 1998, all Crypto++ symbols were added to the CryptoPP namespace. However, byte remained in the global namespace.

The decision to leave byte in the global namespace was documented in code comments, and probably had something to do with Microsoft platforms, early C++ compilers and name lookups (from config.h):

// put in global namespace to avoid ambiguity with other byte typedefs
typedef unsigned char byte;

Semantic equivalence

A secondary problem is, the C++ std::byte is not semantically equivalent to the Crypto++ byte. For example, the following code will not compile using std::byte. Similar code would compile using the Crypto++ byte, including integral operations like addition.

$ cat -n byte.cxx
     1  #include <cstddef>
     2
     3  int main(int argc, char* argv[])
     4  {
     5    std::byte b1{0x00};
     6
     7    std::byte b2 = {0x00};
     8
     9    std::byte b3{0x00}; b3 |= 0xAA;
    10
    11    std::byte b4{0x00}; b4 += 0xAA;
    12
    13    return 0;
    14  }

Attempting to compile it results in:

$ g++ -std=c++17 byte.cxx -o test.exe
byte.cxx: In function ‘int main(int, char**)’:
byte.cxx:7:23: error: cannot convert ‘int’ to ‘std::byte’ in initialization
   std::byte b2 = {0x00};
                       ^
byte.cxx:9:26: error: no match for ‘operator|=’ (operand types are ‘std::byte’ and ‘int’)
   std::byte b3{0x00}; b3 |= 0xAA;
                       ~~~^~~~~~~
In file included from byte.cxx:1:0:
/usr/include/c++/7/cstddef:136:3: note: candidate: constexpr std::byte& std::operator|=(std::byte&, std::byte)
   operator|=(byte& __l, byte __r) noexcept
   ^~~~~~~~
/usr/include/c++/7/cstddef:136:3: note:   no known conversion for argument 2 from ‘int’ to ‘std::byte’
byte.cxx:11:26: error: no match for ‘operator+=’ (operand types are ‘std::byte’ and ‘int’)
   std::byte b4{0x00}; b4 += 0xAA;
                       ~~~^~~~~~~

The Decision

The compromise to accommodate C++17 std::byte was move the Crypto++ byte from the global namespace into the CryptoPP namespace. The change occurred at check-in 00f9818b5d8e and it will be available in Crypto++ 6.0. The library's change addresses the root cause, but it could cause issues in some user programs.

Fortunately, the issues in user programs is easily addressable in many instances. A user program can simply add typedef unsigned char byte; to a common header, like stdafx.h, and the previous balance is restored. See Fixing Programs below for more alternatives.

Root Cause

Placing symbols in other namespaces was the root cause of the problem. More precisely, the library adding byte type to the global namespace. From the C++ engineering perspective, the library's byte always belonged at CryptoPP::byte.

Prior to the clash, we were mostly OK with supplying a byte in the global namespace for convenience. It played well with Microsoft's byte which was also placed in the global namespace. However the C++17 changes showed how fragile adding symbols in a namespace other than our own could be.

Alternatives

While studying the issue and contemplating changes, the library had several alternatives to choose from. They included the following.

  1. Do nothing. Let users figure it out.
  2. Leave byte, and scope function parameters with ::byte. This is Geissler's patch.
  3. Remove byte. Avoid collisions in library, let users figure out the rest.
  4. Add CryptoPP::byte. Avoid collisions in library.
  5. Add function overloads. Accept std::byte in addition to CryptoPP::byte.
  6. Use std::byte. Cast to std::byte for copy and assignment. Cast to integral for non-bitops.
  7. Leave byte, add CryptoPP::byte. Avoid collisions in library, let users figure out the rest.

Geissler's patch scoped all uses of ::byte in the library. It was a good, intermediate solution to the issue under C++17. However, the patch was effectively treating the symptom and not the underlying problem. The underlying problem was the library adding symbols in a namespace other than its own. Ultimately we chose to address the underlying problem.

Fixing Programs

Some user programs which link to the Crypto++ library will need to be fixed. There are several ways a user program can be fixed and some of the alternatives are listed below. Allowing users to make a choice is important because the library does not want to make policy decisions that bind user programs. We apologize for forcing other projects into action over this.

The work-arounds listed below are in random order, and do not indicate an order of preference. We expect Supply byte definition to be an expedient fix, but it may not be a good long term fix. It may not be a good long term fix because the program is doing what got the library into trouble.

Use SecByteBlock

Some uses of the former unscoped byte are a little fast and loose because sensitive information was not zeroized, and a SecByteBlock should have been used. If the data is sensitive, then consider switching to a SecByteBlock. SecByteBlock has both a begin() and size(), so it can be used almost everywhere you are using a byte array.

int main(int argc, char* argv[])
{
    // formerly byte block[16];
    CryptoPP::SecByteBlock block(16);
    CryptoPP::OS_GenerateRandomBlock(false, block.begin(), block.size());
    ...
}

Use Crypto++ byte

If you don't handle sensitive information and you want to use the library's byte, then CryptoPP::byte is available as shown below.

int main(int argc, char* argv[])
{
    // formerly byte block[16];
    CryptoPP::byte block[16];
    CryptoPP::OS_GenerateRandomBlock(false, block, sizeof(block));
    ...
}

In fact, Crypto++ used a similar strategy for test.cpp because Microsoft Windows Kits provide a byte in the global namespace that creates the ambiguous symbol. Also see Issue 447, Windows and "error C2872: 'byte' : ambiguous symbol" and Commit 00e133745663.

Use <stdint.h> uint8_t

You can simply use uint8_t in your programs. Some libraries, like Botan, side stepped the problem by using uint8_t.

$ cat test.cxx
#include "cryptlib.h"
#include "osrng.h"
#include "hex.h"

#include <iostream>
#include <cstdint>

int main(int argc, char* argv[])
{
    using namespace CryptoPP;
    AutoSeededRandomPool prng;

    uint8_t key[32];
    prng.GenerateBlock(key, sizeof(key));

    std::cout << "Key: ";
    StringSource(key, 32, true, new HexEncoder(new FileSink(std::cout)));
    std::cout << std::endl;

    return 0;
}

Supply byte definition

A user program supplying its own definition for a byte is probably the easiest solution to fill the void created by removing byte from the global namespace. It gets you back to where things were before the library made its change, and allows you to deal with declarations like using namespace std how you see fit.

typedef unsigned char byte;

int main(int argc, char* argv[])
{
    byte block[16];
    CryptoPP::OS_GenerateRandomBlock(false, block, sizeof(block));
    ...
}

The library used typedef unsigned char byte; for Kalyna's anonymous namespace at check-in fe637956388f8dd6. The change insulated Kalyna from byte related changes in outer scopes. It also means Kalyna's code has access to at least two definitions of a byte. The first at ::byte or CryptoPP::byte, and the second at CryptoPP::<kalyna anonymous>::byte.

When we were investigating the impact of the change, we also examined popular projects like Pycryptopp. Pycryptopp needed the byte added to three source file just like Kalyna (and Pycryptopp did not need it for all project files). Also see Pull Request 43 on the Pycryptopp GitHub.

You can get fancier, if desired. In the code below, an unscoped byte alias is created with using byte = CryptoPP::byte if Crypto++ is 6.0 or later and C++11 or later are available. Otherwise, the user program simply provides its own compatible definition of a byte in the global namespace.

#include <cryptopp/config.h>
#if (CRYPTOPP_VERSION >= 600) && (__cplusplus >= 201103L)
    using byte = CryptoPP::byte;
#else
    typedef unsigned char byte;
#endif

int main(int argc, char* argv[])
{
    byte block[16];
    CryptoPP::OS_GenerateRandomBlock(false, block, sizeof(block));
    ...
}

Scope use of byte

Romain Geissler's patch scoped all uses of byte in the library. It was a good, intermediate solution to the issue under C++17. Geissler's patch also had the benefit of enduring using namespace std;. However, the patch does not treat the underlying problem of adding symbols in a namespace other than the library's namespace.

You can find the patch at Pull Request 438, Use ::byte instead of byte. Below is an example of the change in signature for a library function. A similar change occurred in every library source file which used byte.

Former, unscoped byte:

namespace CryptoPP
{
    OS_GenerateRandomBlock(bool blocking, byte* output, size_t size);
};

Patched, scoped byte:

namespace CryptoPP
{
    OS_GenerateRandomBlock(bool blocking, ::byte* output, size_t size);
};

using namespace X

By far, using namespace std is one of the most criticized statements in a program during review. As early as 2000 Herb Sutter advised against it at Migrating to Namespaces. In the context of Crypto++, using namespace CryptoPP falls into the same category. User programs should not do it. With that said, we understand it shows up in some user programs.

The Crypto++ project slowly changed the library to follow Sutter's advice. The change for the library was minimal because most uses of using namespace std; were limited to several source files (*.cpp files) in its test framework (test.cpp, datatest.cpp, validate.cpp, benchmark.cpp, etc). The last library check-in for the goal occurred at 80c4033baf09.

Sutter's advice and the byte collision mean user programs should change:

using namespace std;
int main(int argc, char* argv[])
{
    cout << "Hello World" << endl;
    ...
}

To:

int main(int argc, char* argv[])
{
    std::cout << "Hello World" << std::endl;
    ...
}

Or:

int main(int argc, char* argv[])
{
    using std::cout;
    using std::endl;

    cout << "Hello World" << endl;
    ...
}

Investigation

Before providing the check-in at 00f9818b5d8e, several alternatives were tested to see how well they integrated and how much discomfort they could cause a typical user program. Testing occurred on Fedora 26 with GCC 7.1 and the -std=c++17 compiler option. The baseline user program is shown below and its written according to Herb Sutter's recommendations in Migrating to Namespaces, and includes:

  • avoid using directives entirely, especially in header files
  • never write namespace using declarations in header files
  • user code should prefix all standard library names with std::

Internally, the Crypto++ library does not violate Sutter's rules in header files. The Crypto++ test program and source files, like test.cpp, do use using namespace CryptoPP, however.

$ cat test.cxx
#include "cryptlib.h"
#include "secblock.h"
#include "osrng.h"
#include "files.h"
#include "hex.h"

#include <iostream>
#include <cstddef>

void GenerateBlock(byte* block, size_t size)
{
  CryptoPP::OS_GenerateRandomBlock(false, block, size);
}

int main(int argc, char* argv[])
{
  CryptoPP::HexEncoder encoder(new CryptoPP::FileSink(std::cout));
  CryptoPP::SecByteBlock block1(16);
  std::byte block2[16];

  GenerateBlock(block1, block1.size());
  GenerateBlock(reinterpret_cast<byte*>(block2), sizeof(block2));

  std::cout << "Block 1: ";
  encoder.Put(block1, block1.size());
  encoder.MessageEnd();
  std::cout << std::endl;

  std::cout << "Block 2: ";
  encoder.Put(reinterpret_cast<const byte*>(block2), sizeof(block2));
  encoder.MessageEnd();
  std::cout << std::endl;

  return 0;
}

Variations of the program were tested with using statements. They were:

  • Crypto++ byte in global namespace
  • using std::byte
  • using CryptoPP::byte
  • using std::byte and using CryptoPP::byte
  • using namespace std

using byte = std::byte

The first attempt to uptake std::byte was a using byte = std::byte in the library. The attempt failed miserably because the library treats byte as an integral type and performs both bit and math operations on them. The library's use of byte is incompatible with the standard.

Below is an example of the failure introduced by using byte = std::byte from misc.h.

In file included from cryptlib.cpp:19:0:
misc.h: In function ‘void CryptoPP::IncrementCounterByOne(byte*, unsigned int)’:
misc.h:1118:12: error: no match for ‘operator++’ (operand type is ‘byte {aka std::byte}’)
   carry = !++inout[i];
            ^~~~~~~~~~
misc.h: In function ‘void CryptoPP::IncrementCounterByOne(byte*, const byte*, unsigned int)’:
misc.h:1134:33: error: no match for ‘operator+’ (operand types are ‘const byte {aka const std::byte}’ and ‘int’)
   carry = ((output[i] = input[i]+1) == 0);
                         ~~~~~~~~^~
misc.h: In function ‘byte CryptoPP::BitReverse(byte)’:
misc.h:1847:22: error: no match for ‘operator&’ (operand types are ‘byte {aka std::byte}’ and ‘int’)
  value = byte((value & 0xAA) >> 1) | byte((value & 0x55) << 1);
                ~~~~~~^~~~~~
...

CryptoPP::byte

Moving byte into the CryptoPP namespace produced initial errors due to use of anonymous namespaces. For example:

g++ -DNDEBUG -g2 -O3 -std=c++17 -Wall -Wextra -fPIC -march=native -pipe -c kalynatab.cpp
kalyna.cpp: In function ‘void {anonymous}::MakeOddKey(const word64*, CryptoPP::word64*)’:
kalyna.cpp:76:11: error: ‘byte’ does not name a type
     const byte* even = reinterpret_cast<const byte*>(evenkey);
           ^~~~
kalyna.cpp:77:5: error: ‘byte’ was not declared in this scope
     byte* odd = reinterpret_cast<byte*>(oddkey);
     ^~~~
...

Once the source files were cleaned up at Cpmmit fe637956388f8dd6 the library files compiled as expected. User programs experienced the expected 'byte' was not declared in this scope messages due to the namespace change.

using CryptoPP::byte

using CryptoPP::byte; is a natural continuation of moving to CryptoPP::byte for user programs. Unfortunately using byte = CryptoPP::byte; produced the error below due to using namespace std; and using namespace CryptoPP;. In fact, using byte = CryptoPP::byte; made the problem worse because it introduced a third symbol into the name lookup by way of an alias.

test.cxx: In function ‘int main(int, char**)’:
test.cxx:27:3: error: reference to ‘byte’ is ambiguous
   byte block3[16];
   ^~~~
test.cxx:12:28: note: candidates are: using byte = CryptoPP::byte
 using byte = CryptoPP::byte;
                            ^
In file included from stdcpp.h:48:0,
                 from cryptlib.h:97,
                 from test.cxx:1:
/usr/include/c++/7/cstddef:64:14: note:                 enum class std::byte
   enum class byte : unsigned char {};
              ^~~~

A twist on Windows with a Windows Kit installed is:

1>  test.cpp
1>test.cpp(159): error C2872: 'byte' : ambiguous symbol
1>          could be 'c:\program files (x86)\windows kits\8.0\include\shared\rpcndr.h(164) : unsigned char byte'
1>          or       'c:\users\cryptopp\config.h(203) : CryptoPP::byte'

According to StoryTeller on Stack Overflow at byte and ambiguous symbol due to using declarations?:

A using declaration (as opposed to a type alias) will not resolve the issue [of ambiguous symbols]. Both identifiers will be visible to unqualified name lookup. Neither is defined in the enclosing declarative region.

Function Overloads

A potential path to avoid user casting when passing a std::byte to a function that expected a byte or CryptoPP::byte was to provide overloads in key classes. For example, in class BufferedTransformation:

class BufferedTransformation : public Algorithm, public Waitable
{
public:
    virtual ~BufferedTransformation() {}
    ...

    size_t Put(const byte *inString, size_t length, bool blocking=true)
        {return Put2(inString, length, 0, blocking);}

#if CRYPTOPP_CXX17
    size_t Put(const std::byte *inString, size_t length, bool blocking=true)
        {return Put2(reinterpret_cast<const byte*>(inString), length, 0, blocking);}
#endif

    ...
};

This made it somewhat easier for user code to use both std::byte and CryptoPP::byte by moving casts into the library, but it was an incomplete remediation as some overloads were missing. Additionally, there was no way to overload some functions, like those where the byte* is a return value. For example, virtual byte * CreatePutSpace(size_t &size).

Finally, the conditional compilation in the header leaves something to be desired. We don't want users to witness hacks that may or may not be present. We would prefer they be hidden away in a source file and forgotten about once compilation of the library completed.

using namespace std

The code that caused the most problems during testing was user code with using namespace std. While Sutter strongly discourages it, we expect it to show up in a fair number of places in user code. This is also the impetus for Romain-Geissler's Pull Request 438.

When using namespace std was present in user code, one of the better decisions for the library was to move byte into the CryptoPP namespace. The test code was changed as follows.

$ cat test.cxx
#include "cryptlib.h"
#include "secblock.h"
#include "osrng.h"
#include "files.h"
#include "hex.h"

#include <iostream>
#include <cstddef>

using namespace std;

void GenerateBlock(CryptoPP::byte* block, size_t size)
{
  CryptoPP::OS_GenerateRandomBlock(false, block, size);
}

int main(int argc, char* argv[])
{
  CryptoPP::HexEncoder encoder(new CryptoPP::FileSink(std::cout));
  CryptoPP::SecByteBlock block1(16);
  std::byte block2[16];

  GenerateBlock(block1, block1.size());
  GenerateBlock(reinterpret_cast<CryptoPP::byte*>(block2), sizeof(block2));

  std::cout << "Block 1: ";
  encoder.Put(block1, block1.size());
  encoder.MessageEnd();
  std::cout << std::endl;

  std::cout << "Block 2: ";
  encoder.Put(reinterpret_cast<const CryptoPP::byte*>(block2), sizeof(block2));
  encoder.MessageEnd();
  std::cout << std::endl;

  return 0;
}

The code in main could avoid the extra casts by overloading the exemplary GenerateBlock:

// Original
void GenerateBlock(CryptoPP::byte* block, size_t size)
{
  CryptoPP::OS_GenerateRandomBlock(false, block, size);
}

// Overload
void GenerateBlock(std::byte* block, size_t size)
{
  CryptoPP::OS_GenerateRandomBlock(false, reinterpret_cast<CryptoPP::byte*>(block), size);
}