End-to-End security for sensitive data using CMS (aka PKCS7)

Table of Contents

When processing highly sensitive data, Cryptographic Message Syntax (CMS, formerly PKCS#7) can be used to apply as an additional layer to transport security. CMS lets you control how data is protected at every stage of its life and along its processing chain through an end-to-end security approach. CMS supports various security services, and in this article, we’re going to focus on the encryption capabilities. I’ll show you how CMS encrypts data and manages keys using symmetric and asymmetric cryptography.

Introduction #

Transport layer security (TLS) is the de-facto standard to secure data transfers between clients and servers. In many system architectures, the server terminating the TLS session is not the final receiving application. Oftentimes that server is some kind of reverse proxy or load balancer (e.g. Azure Front Door, some Nginx) which forwards data to the actual intended application server. Other times, data is published on a message broker (e.g. NATS or Azure Event Hub) which itself might store data for a period of time to later forward it to the data consumers.

In these scenarios, your data is not end-to-end encrypted but different system components, which might not be under your control, suddenly play a crucial part in data exposure and enlarge your attack surface significantly. When you are processing highly sensitive data (e.g. health or financial information), protecting the data that can easily cause massive damage if exposed to unauthorized parties across its entire lifecycle is crucial. You can either

take ownership of the entire architecture and apply security measures at every step of the data lifecycle or
encrypt data end-to-end which can reduce the attack surface by having a defined set of components that can actually decrypt the data for processing.

In the light of an ever increasing number of cloud service providers (CSP), the first option might not be feasible if the services offered do not fulfill the right security capabilities. Even if your CSP trustworthy enough, you still need to take responsibility for your data exposure. Following the end-to-end encryption approach can help you to

take control of your data exposure,
reduce the number of needless plaintext data exposures and
lift the burden of configuring, implementing and testing security measures at every step of the data lifecycle.

Instead of rolling your own encapsulation mechanism to encrypt your data, I will walk you through the various encryption capabilities of Cryptographic Message Syntax (CMS). But before we get into the details, let’s first cover the basics of CMS and its usage in the industry.

Primer on Cryptographic Message Syntax #

Cryptographic Message Syntax (CMS) has its legacy with the PKCS#7 standard (RFC 2315) and the name PKCS#7 sticks around although most implementations implement CMS as specified in RFC 5652. CMS follows an encapsulation approach that can be used together with any data and adds security services like:

Signing the data by one or more signers
Encrypting data to one or more recipients using several key management mechanisms

CMS does not replace secure transport protocols like TLS but can be used to add another layer of security by securing the data that is transferred via TLS. If no secure transport is available, CMS can ensure security in store-and-forward use cases (e.g. firmware or log file signing) or in non-interactive protocols.

CMS data structures are defined using ASN.1 and CMS messages are exchanged using the ASN.1 encoding rules. RFC 7468, section 9 mandates BER encoding but some parts of a CMS message also explicitly require DER encoding. If you want to learn more about ASN.1 and its encoding rules, check out my article Convert X509v3 certificate from ASN1 DER to PEM.

You might have already used CMS without being aware of it since CMS is used widely in different applications and across many industries. Let me give you some examples:

S/MIME (Secure/Multipurpose Internet Mail Extensions) uses CMS for secure email communication (see RFC 8551).
CMS is the basis for the Time Stamping Protocol (TSP) as defined in RFC 3161.
CMS is the basis for CAdES (CMS Advanced Electronic Signatures, see also RFC 5125) which in turn is one of the allowed mechanisms for providing electronic identification and trust services for electronic transactions under the eIDAS regulation (see also european eSignature).
CAdES is also used for signing PDFs ( see PDF and Digital Signatures - PDF Association).
Apple uses CMS to secure its Code Signing Provisioning Profiles.

With those examples given, it should not come as a surprise that CMS also has widespread support in software libraries. Bouncycastle provides CMS support for Java and C#. OpenSSL/LibreSSL provides a CLI and API for CMS. Rust support is available through the cms create. Windows gets support via the Win32 CryptoAPI. Support on Apple platforms is available through Cryptographic Message Syntax Services

To some extend, even secure elements like HSMs (Hardware Security Modules) support CMS. Accessing those services via an PKCS#11 API is possible if supported (see PKCS #11, section 2.44 CMS)

CMS Encryption Capabilities #

Although CMS can be used for several security services, I will focus on the encryption capabilities in this article. Before we dive deeper, we need to understand how CMS distinguishes between the different security services. The top-most structure in a CMS message is the ContentInfo structure (see RFC 5652 section 3):

ContentInfo ::= SEQUENCE {
	contentType ContentType,
	content [0] EXPLICIT ANY DEFINED BY contentType OPTIONAL
}

The contentType field is an OID that defines the type of the content field. This is the decisive field to determine which security service has been applied to the content. For signing, the contentType value is signedData but since we focus on encryption, we will look at the content types encryptedData, envelopedData and authEnvelopedData. The types differ in the way the data is encrypted and how content encryption keys are managed.

We will explore each content type in more detail in the following.

Hey, do you like what you are reading? Subscribe and don't miss any news from my blog. No spam, just good reads.

Your subscription could not be saved. Please try again.

Thanks! Please confirm your subscription in the confirmation mail.

Content Type `encryptedData` #

RFC 5652 section 8 defines encryptedData. Management of any encryption keys is not part of the content type and it is up to the application to deal with distribution of encryption keys. With that, the content is not a complex data structure but a simple EncryptedData structure as follows:

EncryptedData ::= SEQUENCE {
    version CMSVersion,
    encryptedContentInfo EncryptedContentInfo,
    unprotectedAttrs [1] IMPLICIT UnprotectedAttributes OPTIONAL 
}

The most interesting field is EncryptedContentInfo which carries the actual ciphertext but more importantly, the information about the encryption algorithm used. This information is crucial for decrypting the data. The structure is as follows:

EncryptedContentInfo ::= SEQUENCE {
    contentType ContentType,
    contentEncryptionAlgorithm ContentEncryptionAlgorithmIdentifier,
    encryptedContent [0] IMPLICIT EncryptedContent OPTIONAL 
}

Another contentType is present in this structure and allows an application to determine how to process the data once it has decrypted it. The most easiest case is data which indicates that the decrypted data is just some arbitrary data (e.g. your encrypted PNG image). But it might also be another complex CMS data structure for another security service (e.g. signedData).

RFC 5652 does not define any algorithm identifier. This is done by RFC 3565 which defines AES-CBC as a valid algorithm. Though, your implementations may also support other algorithms. The OID 2.16.840.1.101.3.4.1 lists several children for other AES modes (e.g. AES-ECB although you should absolutely avoid this one).

Content Type `envelopedData` #

While encryptedData has no notion of key management, RFC 5652 section 6 introduces envelopedData which features several mechanisms for dealing with encryption keys. Before digging deeper, we need to define some terms that are required to understand the key management mechanisms:

Originator: The party creating a CMS message for one or more recipients.
Recipient: The party receiving the originator’s CMS message.
Content Encryption key (CEK): A CEK is used for encrypting a payload. Exactly one CEK exists per CMS message, regardless of the number of recipients.
Key Encryption Key (KEK): A KEK is used to encrypt the CEK for one or more recipients of a CMS message.

The following figure summarizes the basic idea of envelopedData: The originator generates exactly one CEK that is used to encrypt the payload once. To then securely distribute the CEK to one or more recipients, the CEK is encrypted per recipient using an per recipient specific KEK. The KEK is derived from something (more on this soon) that is shared between the originator and the recipient. With that, every recipient receives an enveloped message with

a message that has been encrypted once using a CEK,
one or more recipient-individually encrypted CEKs

With that information, every recipient can again derive the KEK, decrypt the CEK and then decrypt the ciphertext.

CMS Enveloped Data uses a CEK and a KEK to provide confidentiality

The important question is: What is the something that is shared between the originator and the recipient? This depends on the actual key management mechanism in use. RFC 5652 section 6.2 defines four key management mechanisms in total that can be categorized in mechanisms based on

shared symmetric secrets and
asymmetric key pairs.

The foundation for these key management mechanism is the following data structure:

EnvelopedData ::= SEQUENCE {
    version CMSVersion,
    originatorInfo [0] IMPLICIT OriginatorInfo OPTIONAL,
    recipientInfos RecipientInfos,
    encryptedContentInfo EncryptedContentInfo,
    unprotectedAttrs [1] IMPLICIT UnprotectedAttributes OPTIONAL
}

This data structure is mostly equal to EncryptedData but with the addition of RecipientInfos and OriginatorInfo. Depending on the key management mechanism, those fields are populated with the something that is shared between originator and recipient which is then used to derive the KEK that encrypts the CEK.

Key Management Mechanisms based on shared secrets #

Two key management mechanisms are purely based on shared secrets and are straight forward to understand:

Symmetric Key-Encryption Key: Via an out-of-band mechanism, the originator and the recipient have knowledge of a shared symmetric key. This key is used as the KEK that encrypts the CEK (e.g. during manufacturing, the symmetric key is securely stored on a device and a cloud solution also has access to the exact same key).
Password-Based Encryption: The KEK is derived from a password which is then used to encrypt the CEK. This is typically used in scenarios where the recipient is a human and the password is entered by the recipient.

Both mechanisms do not populate the OriginatorInfo since the originator has nothing to share with the recipient. The RecipientInfos however is used to either transport an identifier for the symmetric KEK or information about the configuration of PBKDF2.

To try out both mechanisms using OpenSSL, see the examples in the follow-up article here.

Key Management Mechanisms based on public key cryptography #

The other two mechanisms are based on public key cryptography and require asymmetric key pairs. They assume that an originator has access to the public keys of the recipients (e.g. using certificates that are distributed via an out-of-band mechanism). With the public keys given, key transport and key agreement schemes are used for key management. Both are described in the following.

Key Transport

NIST Glossary defines Key Transport as a mechanism for key-establishment between two parties. The originator generates a symmetric key and

[…] the key is encrypted using the public key of the receiver and subsequently decrypted using receiver’s private key.

In the context of CMS, key transport mechanisms are only described for the RSA cryptosystem. One of the first key transport mechanisms defined is RSAES-PKCS1-v1_5 (see RFC 3370). Using this mechanism is not recommended and even has been disallowed since 2023 by NIST (see NIST SP 800-131A Rev. 2) due to a known vulnerability. Instead, use either

RSAES-OAEP as defined for CMS in RFC 3570 or
RSA-KEM as defined for CMS in RFC 5990.

The field OriginatorInfo is again unused since there is no information to share from the originator to the recipient. Only the RecipientInfos is populated per recipient with the algorithm information (e.g. RSA-OAEP) and the actual generated KEK.

For an example using OpenSSL, see the follow-up article here.

Key Agreement

Like key transport, key agreement mechanisms are also key-establishment procedures but they require information contributed by the originator as well as the recipient (see also NIST Glossary). Widely used mechanisms are based on the ideas of Diffie-Hellman (DH) where a shared secret can be derived by two parties using their asymmetric key pairs. For this the public keys need to be exchanged but never the private keys which must remain secret.

In CMS, key agreements algorithms are only specified for ECC (Elliptic Curve Cryptography). Many of the standardized mechanisms make use of Elliptic Curve Diffie Hellman (ECDH). The basis is laid in RFC 5753 and support for the more specialized curves X25519 and X448 has been added through RFC 8418. A notable change between both RFCs is that support for ECMQV, which is an authenticated EC key agreement, and Co-Factor ECDH has not been added as algorithms for X25519 and X448 curves. The common remainder is then Standard ECDH as described in SEC1v2. We will focus on this mechanism in the following.

Both RFCs describe that Standard ECDH is to be used in an Ephemeral-Static fashion meaning that the originator generates an ephemeral key pair for every message but uses the recipient’s static public key for ECDH (also known as ECDH-ES). In NIST SP800-56A r3 terms this scheme is called C(1e, 1s, ECC DH) where 1e stands for one ephemeral key pair and 1s for one static key pair. This scheme prevents that only a single message can be decrypted if the originator loses the ephemeral private key (i.e. key freshness but no forward-secrecy). The downside is that the recipient has no way to authenticate the originator from the CMS message itself. Possible approach to deal with this can be an additional signature or a transport of the CMS message over an authenticated channel (see further NIST SP800-56A r3 section 7.4 and 7.6 for an in-depth discussion of properties and assurances).

When using Standard ECDH with the goal to encrypt another key (like we do with our CEK), “Wrapped Key Transport Scheme” shall be implemented (see SEC1v2 section 5.2). To this end, the scheme setup then comprises several primitives:

ECDH to derive a shared secret K
A key derivation function (KDF) that derives our KEK from K and some DER encoded data
A key wrap function to encrypt our CEK using the generated KEK

SEC1v2 section 3.6 allows several KDFs but only ANSI X9.63 for (new) general-purpose applications. Concerning key wrap, SEC1v2 section 3.9 allows for NIST AES key wrap algorithms when using AES block ciphers. When looking at RFC 5753 and RFC 8418, this means the following algorithms are allowed for Standard ECDH:

KDF: ANSI X9.63 KDF with SHA1, SHA224, SHA256, SHA384 or SHA512
Key Wrap: AES-128, AES-192, AES-256 key wrap

This time, the OriginatorInfo is populated with the ephemeral public key that has been used for the ECDH key agreement. The RecipientInfos is populated with information helping the recipient to locate the static public key that has been used by the originator (e.g. in case of certificates, the issuer and serial number are used).

With that, we have a good understanding of what it means to use the key agreement mechanism in CMS. The algorithms composition gets quite complex which is the reason why algorithm and key management profiling is crucial to reduce complexity and to ensure interoperability.

For an example using OpenSSL, see the follow-up article here.

Content Type `authEnvelopedData` #

Authenticated Encryption with Associated Data (AEAD) is a cryptographic primitive that provides both confidentiality and integrity. Native CMS support with a dedicated data structure for AEADs has been added in RFC 5083 with the content type authEnvelopedData. All key management mechanisms described in envelopedData can be used without changes.

Below is the data structure that again builds on the EnvelopedData. Additions are of course a mac field for the Message Authentication Code (MAC) which is crucial for an AEAD scheme. In addition, authAttrs provides the ability to add data that is authenticated but not encrypted. This is useful for metadata that should be protected from tampering but does not need to be kept confidential.

AuthEnvelopedData ::= SEQUENCE {
    version CMSVersion,
    originatorInfo [0] IMPLICIT OriginatorInfo OPTIONAL,
    recipientInfos RecipientInfos,
    authEncryptedContentInfo EncryptedContentInfo,
    authAttrs [1] IMPLICIT AuthAttributes OPTIONAL,
    mac MessageAuthenticationCode,
    unauthAttrs [2] IMPLICIT UnauthAttributes OPTIONAL 
}

What is missing are AEAD schemes. For this, RFC 5084 defines AES-CCM and AES-GCM as AEAD schemes for CMS. A valuable addition is also AES-GCM-SIV which is a nonce-misuse resistant variant of AES-GCM that is described in RFC 8452. Besides the AES world, there is also ChaCha20-Poly1305 that is defined for CMS in RFC 8103.

Summary #

CMS’ encryption capabilities are vast and powerful catering to a wide range of use cases. Let’s briefly summarize the key points and use cases:

encryptedData is a simple format with meta information about the encryption algorithm. You application needs to deal with key management and depending on the AES mode used, you need a convention for your application where IVs and MACs are stored. Either, concatenate them with the ciphertext (encryptedContent) or store them in the unprotectedAttrs. If you need advanced key management, envelopedData and authEnvelopedData are the way to go. For both, the same key management mechanisms are available but authEnvelopedData adds a standardized field for storing MACs. The key management mechanisms support you in handling previously shared symmetric keys and deriving symmetric keys from user passwords. If a recipient is addressed using an RSA key pair, key transport mechanisms like RSA-OAEP are used. Key agreement mechanisms, on the other hand, are used for recipients with ECC key pairs. One approach is to use Standard ECDH in an Ephemeral-Static fashion which is also known as C(1e, 1s, ECC DH).

That’s all for today. I hope you this article gives you a good idea on how CMS can help you to add end-to-end security to your sensitive data. If you have any questions or feedback, feel free to reach out to me on my socials below.

If you loved this, I’d be so grateful if you’d subscribing to my newsletter! That way, you’ll be the first to know about new posts and updates.