The concept of data authentication appeared in the 1970s in the banking industry. The problem was studied in detail by the ANSI X9 commitee. Banks did not want to transmit data and allow an attacker to flip a bit undetected. In this situation, the attacker would not decrypt the message, instead he or she would only flip a bit so that the encrypted message "Post $100" would be changed to "Post $800".
Many developers make the mistake of only encrypting data. For those who include integrity assurances, it can be difficult to incorporate correctly. The apparent reason for not including authentication data is that most sample code presented in technical sources only offers an example of the encryption (and perhaps decryption) function, void of any context. For those who are including authenticity assurances, the details and interaction can be tricky to implement correctly.
Data Integrity and Authenticity
In 2001, Hugo Krawczyk published The Order of Encryption and Authentication for Protecting Communications. In the paper, Krawczyk examined three commonly used methods of combining data integrity and authenticity. Each method was used in a well known protocol. Note that the list below does not include simple encryption.
- Authenticate then Encrypt (AtE) - SSL
- Encrypt then Authenticate (EtA) - IPSec
- Encrypt and Authenticate (E&A) - SSH
The results of the paper showed that Encrypt then Authenticate (IPSec) was secure,
as was Authenticate then Encrypt (SSL) under certain constructions as was Authenticate then Encrypt (SSL) when used with a stream cipher. Update: in 2014, Krawczyk revisited his results, and found that SSL with a block cipher in CBC mode was insecure due to a misunderstanding in the way the plaintext was encoded and padded. The paper also showed that Encrypt and Authenticate (SSH) was insecure.
The two provably safe Authenticate then Encrypt constructions are:
Note well: even though SSL uses a block cipher in CBC mode, it is not secure because of the way it applies padding to a message. Sapienti sat: POODLE and friends.
|HAC, 9.6†||-||h = Hash(m), C = Enc(m||h)||C|
|AtE††||SSL||a = Auth(m), C = Enc(m||a)||C|
|EtA||IPSec||C = Enc(m), a = Auth(C)||C||a|
|E&A||SSH||C = Enc(m), a = Auth(m)||C||a|
†Handbook of Applied Cryptography, Section 9.6
††In 2014, Krawczyk revistied TLS CBC mode encryption and determined it was not secure due to the way the padding and MAC was applied. See Re: [TLS] Last Call: <draft-ietf-tls-encrypt-then-mac-02.txt> (Encrypt-then-MAC for TLS and DTLS) to Proposed Standard
In 1996, David Wagner and Bruce Schneier published Analysis of the SSL 3.0 Protocol. In the paper, Wagner and Schneier introduced the Horton Principal which is the notion of semantic authentication. Semantic authentication simply means to authenticate what was meant, and not what was said.
For example, suppose there is plain text which is to be protected. The plain text is padded to the size of the block cipher and then encrypted. The operation of padding begs the question, What should be authenticated? The plain text or plain text + padding? According to Wagner and Schneier, both the plain text and padding would be authenticated (what was meant), and not just the plain text (what was said).
NIST, through SP 800-38C and SP800-38D, specifies two block cipher modes of operation (CCM and GCM) which offer both confidentiality and authenticity. In addition to CCM and GCM, Crypto++ offers EAX which was a NIST candidate during the selection process. Algorithms providing confidentiality and authenticity can be divided into two categories: authenticated encryption (AE) and authenticated encryption with additional data (AEAD). The two NIST modes, CCM and GCM, and the proposed mode, EAX are AEAD algorithms. Each encrypts and authenticates plain text data (in addition to authenticated-only data), which produces cipher text with an authentication code. If an attacker were to flip a bit, the decryption and verification routine would detect the modification using the authentication code.
The three modes offer to authenticate separate data, known as additional authenticated data or AAD. The additional authenticated data is not encrypted - it is only authenticated. The AAD can be persisted in clear text, or communicated unencrypted (for example, an IP Address and Port in a network data packet). Because the data will be authenticated, an attacker can flip a bit and the verification process will detect the modification.
Revisiting the original example, the improved version is as follows. The sample program performs authenticated encryption (not authentication over additional authenticated data). As with before, it is presumed that buffers will not be an issue. Note, however, that exception handling has been omitted for clarity.
string plaintext, ciphertext; ... GCM< AES >::Encryption enc; enc.SetKeyWithIV( key, sizeof(key), iv, sizeof(iv) ); AuthenticatedEncryptionFilter aef( enc, new StringSink( ciphertext ) ); // AuthenticatedEncryptionFilter aef.Put( plaintext.data(), plaintext.size() ); aef.MessageEnd();
After executing the sample code above, ciphertext is a concatenation of the encrypted data and the authenticator. Because the message is protected using AES and GCM, it will be safe for a very long time. To decrypt the data, the following would be performed.
string ciphertext, plaintext; ... GCM< AES >::Decryption dec; dec.SetKeyWithIV( key, sizeof(key), iv, sizeof(iv) ); AuthenticatedDecryptionFilter adf( dec, new StringSink( plaintext ) ); // AuthenticatedDecryptionFilter adf.Put( ciphertext.data(), ciphertext.size() ); adf.MessageEnd();
The two samples demonstrate all that is required to ensure both data confidentiality and data integrity. The implementation is orders of magnitude stronger than encryption alone. Full details of using Crypto++ objects such as EAX, CCM, GCM, AuthenticatedEncryptionFilter, AuthenticatedDecryptionFilter, and StringSink can be found through out the wiki.