# Character Set Considerations

This page will attempt to discuss design considerations for the user. For expediency, the simplest solution is to not define _UNICODE and UNICODE. Then everything uses narrow characters.

## Data is Neutral

Crypto++ is generally a neutral library. That is, when the Crypto++ Library operates on data (even when the data is housed in a string), the data is being interpreted as a byte[]. A better (but less portable) abstraction is a Rope. Consider the following fragment, presuming the File is storing binary data:

string sink;
FileSource( filename, true, new StringSink( sink ) );

There is no regard to wide or narrow - data is data. Next, suppose it is desired to hash the data:

MD5 hash;
hash.Put( (byte*)sink.c_str(), sink.size() );
hash.MessageEnd();
...

The hash operates on a stream of bytes - the stream could be binary data, narrow characters (which the hash regards as byte[]), or wide characters (which the hash regards as byte[]). The programmer only needs to specify the number of bytes (size) to hash. The hash is indifferent.

Finally, suppose the previous example Hex Encoded the hash before storing it in a narrow string. The program could be either Unicode, SBCS, or MBCS. The next sections discuss this issue.

## Crypto++ is Narrow

There are times when one will requires passing a string to Crypto++. These times would include Named Parameters and Filenames. In this case, one of two situations arise.

### Wide to Narrow

Wide to Narrow conversion can further be decomposed into two cases:

• using the Standard C++ Library
• using the Win32 API

#### Using the Standard C++ Library

Users of Visual Studio 6.0 and earlier are at a handicap. Bjarne Stroustrup devoted Appendix D: Locales of his work to issues similar to these (complete with Sample code). However, the code does not compile with VS 6.0. The following will work for the reader.

// Courtesy of Tom Widmer (VC++ MVP)
std::wstring StringWiden( const std::string& narrow ) {

std::wstring wide;
wide.resize( narrow.length() );

typedef std::ctype<wchar_t> CT;
CT const& ct = std::_USE(std::locale(), CT);

// Non Portable
//   Iterators should not be used as pointers (works in VC++ 6.0)
//   ct.widen( narrow.begin(), narrow.end(), wide.begin() );

// Portable
// ct.widen(&narrow[0], &narrow[0] + narrow.size(), &wide[0]);

// Portable
ct.widen(narrow.data(), narrow.data() + narrow.size(), wide.data());

return wide;
}

#### Using the Win32 API

See MSDN for examples of using MultiByteToWideChar.

### Narrow to Wide

Narrow to Wide conversion can further be decomposed two cases:

• using the Standard C++ Library
• using the Win32 API

#### Using the Standard C++ Library

// Courtesy of Tom Widmer (VC++ MVP)
std::string StringNarrow( const std::wstring& wide ) {

typedef std::ctype<wchar_t> CT;

std::string narrow;
narrow.resize( wide.length() );

CT const& ct = std::_USE(std::locale(), CT);

// Non-Portable
// ct.narrow( wide.begin(), wide.end(), '_', narrow.begin() );

// Portable
ct.narrow( &wide[0], &wide[0] + wide.length(), '_', &narrow[0] );

return narrow;
}

#### Using the Win32 API

See MSDN for examples of using WideToMultiByteChar.

## Application is Wide

Due to the predominace of Windows NT and family, the author exclusively uses the Unicode character set. With that in mind, the following is a typical Design Overview. Notice that anything data related is omitted - a byte[] is a byte[].

Windows API ⇔ Application is fairly generic. The Application will use L"" rather than the _T("") macro. This means conversion are occuring frequently if UNICODE and _UNICODE are not defined.

Generally, the Crypto ⇔ Application conversion is StringWiden(...) for items such as digests. An exception is the occasional need for narrowing a filename.

## Caveats

The Win32 API switches between narrow and wide character set based on UNICODE. The Standard C++ Library switch occurs based on _UNICODE. This will rear its head when one outputs using cout. One may receive memory addresses rather than strings on the console (in Visual C++ 6.0). Either #define both, or #define neither (and use cout or wcout accordingly). A similar behavior used to occur in database code.

When using wide.resize( narrow.length() ) (and the narrow version), do not use length() + 1 - the resulting string will have an additional NULL added. This will break some substring and most string matching code.