End-to-End security for sensitive data using CMS (aka PKCS7)
Table of Contents
Introduction #
Transport layer security (TLS) is the de-facto standard to secure data transfers between clients and servers. In many system architectures, the server terminating the TLS session is not the final receiving application. Oftentimes that server is some kind of reverse proxy or load balancer (e.g. Azure Front Door, some Nginx) which forwards data to the actual intended application server. Other times, data is published on a message broker (e.g. NATS or Azure Event Hub) which itself might store data for a period of time to later forward it to the data consumers.
In these scenarios, your data is not end-to-end encrypted but different system components, which might not be under your control, suddenly play a crucial part in data exposure and enlarge your attack surface significantly. When you are processing highly sensitive data (e.g. health or financial information), protecting the data that can easily cause massive damage if exposed to unauthorized parties across its entire lifecycle is crucial. You can either
- take ownership of the entire architecture and apply security measures at every step of the data lifecycle or
- encrypt data end-to-end which can reduce the attack surface by having a defined set of components that can actually decrypt the data for processing.
In the light of an ever increasing number of cloud service providers (CSP), the first option might not be feasible if the services offered do not fulfill the right security capabilities. Even if your CSP trustworthy enough, you still need to take responsibility for your data exposure. Following the end-to-end encryption approach can help you to
- take control of your data exposure,
- reduce the number of needless plaintext data exposures and
- lift the burden of configuring, implementing and testing security measures at every step of the data lifecycle.
Instead of rolling your own encapsulation mechanism to encrypt your data, I will walk you through the various encryption capabilities of Cryptographic Message Syntax (CMS). But before we get into the details, let’s first cover the basics of CMS and its usage in the industry.
Primer on Cryptographic Message Syntax #
Cryptographic Message Syntax (CMS) has its legacy with the PKCS#7 standard (RFC 2315) and the name PKCS#7 sticks around although most implementations implement CMS as specified in RFC 5652. CMS follows an encapsulation approach that can be used together with any data and adds security services like:
- Signing the data by one or more signers
- Encrypting data to one or more recipients using several key management mechanisms
CMS does not replace secure transport protocols like TLS but can be used to add another layer of security by securing the data that is transferred via TLS. If no secure transport is available, CMS can ensure security in store-and-forward use cases (e.g. firmware or log file signing) or in non-interactive protocols.
CMS data structures are defined using ASN.1 and CMS messages are exchanged using the ASN.1 encoding rules. RFC 7468, section 9 mandates BER encoding but some parts of a CMS message also explicitly require DER encoding. If you want to learn more about ASN.1 and its encoding rules, check out my article Convert X509v3 certificate from ASN1 DER to PEM.
You might have already used CMS without being aware of it since CMS is used widely in different applications and across many industries. Let me give you some examples:
- S/MIME (Secure/Multipurpose Internet Mail Extensions) uses CMS for secure email communication (see RFC 8551).
- CMS is the basis for the Time Stamping Protocol (TSP) as defined in RFC 3161.
- CMS is the basis for CAdES (CMS Advanced Electronic Signatures, see also RFC 5125) which in turn is one of the allowed mechanisms for providing electronic identification and trust services for electronic transactions under the eIDAS regulation (see also european eSignature).
- CAdES is also used for signing PDFs ( see PDF and Digital Signatures - PDF Association).
- Apple uses CMS to secure its Code Signing Provisioning Profiles.
With those examples given, it should not come as a surprise that CMS also has widespread support in software libraries. Bouncycastle provides CMS support for Java and C#. OpenSSL/LibreSSL provides a CLI and API for CMS. Rust support is available through the cms create. Windows gets support via the Win32 CryptoAPI. Support on Apple platforms is available through Cryptographic Message Syntax Services
To some extend, even secure elements like HSMs (Hardware Security Modules) support CMS. Accessing those services via an PKCS#11 API is possible if supported (see PKCS #11, section 2.44 CMS)
CMS Encryption Capabilities #
Although CMS can be used for several security services, I will focus on the encryption capabilities in this article.
Before we dive deeper, we need to understand how CMS distinguishes between the different security services. The top-most
structure in a CMS message is the ContentInfo
structure
(see RFC 5652 section 3):
ContentInfo ::= SEQUENCE {
contentType ContentType,
content [0] EXPLICIT ANY DEFINED BY contentType OPTIONAL
}
The contentType
field is an OID that defines the type of the content
field. This is the decisive field to determine
which security service has been applied to the content
. For signing, the contentType
value is signedData
but since
we focus on encryption, we will look at the content types encryptedData
, envelopedData
and authEnvelopedData
. The
types differ in the way the data is encrypted and how content encryption keys are managed.
We will explore each content type in more detail in the following.
Hey, do you like what you are reading? Subscribe and don't miss any news from my blog. No spam, just good reads.
Content Type encryptedData
#
RFC 5652 section 8 defines encryptedData
. Management of any
encryption keys is not part of the content type and it is up to the application to deal with distribution of
encryption keys. With that, the content
is not a complex data structure but a simple EncryptedData
structure as
follows:
EncryptedData ::= SEQUENCE {
version CMSVersion,
encryptedContentInfo EncryptedContentInfo,
unprotectedAttrs [1] IMPLICIT UnprotectedAttributes OPTIONAL
}
The most interesting field is EncryptedContentInfo
which carries the actual ciphertext but more importantly, the
information about the encryption algorithm used. This information is crucial for decrypting the data. The structure is
as follows:
EncryptedContentInfo ::= SEQUENCE {
contentType ContentType,
contentEncryptionAlgorithm ContentEncryptionAlgorithmIdentifier,
encryptedContent [0] IMPLICIT EncryptedContent OPTIONAL
}
Another contentType
is present in this structure and allows an application to determine how to process the data once
it has decrypted it. The most easiest case is data
which indicates that the decrypted data is just some arbitrary
data (e.g. your encrypted PNG image). But it might also be another complex CMS data structure for another security
service (e.g. signedData
).
RFC 5652 does not define any algorithm identifier. This is done by RFC 3565
which defines AES-CBC
as a valid algorithm. Though, your implementations may also support other algorithms. The
OID 2.16.840.1.101.3.4.1 lists several children for other AES modes (e.g.
AES-ECB although you should absolutely avoid this one).
Content Type envelopedData
#
While encryptedData
has no notion of key
management, RFC 5652 section 6 introduces envelopedData
which features several mechanisms for dealing with encryption keys. Before digging deeper, we need to define some terms
that are required to understand the key management mechanisms:
- Originator: The party creating a CMS message for one or more recipients.
- Recipient: The party receiving the originator’s CMS message.
- Content Encryption key (CEK): A CEK is used for encrypting a payload. Exactly one CEK exists per CMS message, regardless of the number of recipients.
- Key Encryption Key (KEK): A KEK is used to encrypt the CEK for one or more recipients of a CMS message.
The following figure summarizes the basic idea of envelopedData
: The originator generates exactly one CEK that
is used to encrypt the payload once. To then securely distribute the CEK to one or more recipients, the CEK is
encrypted per recipient using an per recipient specific KEK. The KEK is derived from something (more on this soon)
that is shared between the originator and the recipient. With that, every recipient receives an enveloped message
with
- a message that has been encrypted once using a CEK,
- one or more recipient-individually encrypted CEKs
With that information, every recipient can again derive the KEK, decrypt the CEK and then decrypt the ciphertext.
The important question is: What is the something that is shared between the originator and the recipient? This depends on the actual key management mechanism in use. RFC 5652 section 6.2 defines four key management mechanisms in total that can be categorized in mechanisms based on
The foundation for these key management mechanism is the following data structure:
EnvelopedData ::= SEQUENCE {
version CMSVersion,
originatorInfo [0] IMPLICIT OriginatorInfo OPTIONAL,
recipientInfos RecipientInfos,
encryptedContentInfo EncryptedContentInfo,
unprotectedAttrs [1] IMPLICIT UnprotectedAttributes OPTIONAL
}
This data structure is mostly equal to EncryptedData
but with the addition of RecipientInfos
and OriginatorInfo
.
Depending on the key management mechanism, those fields are populated with the something that is shared between
originator and recipient which is then used to derive the KEK that encrypts the CEK.
Key Management Mechanisms based on shared secrets #
Two key management mechanisms are purely based on shared secrets and are straight forward to understand:
- Symmetric Key-Encryption Key: Via an out-of-band mechanism, the originator and the recipient have knowledge of a shared symmetric key. This key is used as the KEK that encrypts the CEK (e.g. during manufacturing, the symmetric key is securely stored on a device and a cloud solution also has access to the exact same key).
- Password-Based Encryption: The KEK is derived from a password which is then used to encrypt the CEK. This is typically used in scenarios where the recipient is a human and the password is entered by the recipient.
Both mechanisms do not populate the OriginatorInfo
since the originator has nothing to share with the recipient.
The RecipientInfos
however is used to either transport an identifier for the symmetric KEK or information about
the configuration of PBKDF2.
Key Management Mechanisms based on public key cryptography #
The other two mechanisms are based on public key cryptography and require asymmetric key pairs. They assume that an originator has access to the public keys of the recipients (e.g. using certificates that are distributed via an out-of-band mechanism). With the public keys given, key transport and key agreement schemes are used for key management. Both are described in the following.
Key Transport
NIST Glossary defines Key Transport as a mechanism for key-establishment between two parties. The originator generates a symmetric key and
[…] the key is encrypted using the public key of the receiver and subsequently decrypted using receiver’s private key.
In the context of CMS, key transport mechanisms are only described for the RSA cryptosystem. One of the first key
transport mechanisms defined is RSAES-PKCS1-v1_5
(see RFC 3370).
Using this mechanism is not recommended and even has been disallowed since 2023 by NIST (see
NIST SP 800-131A Rev. 2) due to a known vulnerability. Instead, use
either
The field OriginatorInfo
is again unused since there is no information to share from the originator to the
recipient. Only the RecipientInfos
is populated per recipient with the algorithm information (e.g. RSA-OAEP) and
the actual generated KEK.
Key Agreement
Like key transport, key agreement mechanisms are also key-establishment procedures but they require information contributed by the originator as well as the recipient (see also NIST Glossary). Widely used mechanisms are based on the ideas of Diffie-Hellman (DH) where a shared secret can be derived by two parties using their asymmetric key pairs. For this the public keys need to be exchanged but never the private keys which must remain secret.
In CMS, key agreements algorithms are only specified for ECC (Elliptic Curve Cryptography). Many of the standardized
mechanisms make use of Elliptic Curve Diffie Hellman (ECDH). The basis is laid
in RFC 5753 and support for the more specialized curves X25519
and X448
has been added through RFC 8418. A notable change between both
RFCs is that support for ECMQV, which is an authenticated EC key agreement, and Co-Factor ECDH has not been added as
algorithms for X25519
and X448
curves. The common remainder is then Standard ECDH as described
in SEC1v2. We will focus on this mechanism in the following.
Both RFCs describe that Standard ECDH is to be used in an Ephemeral-Static fashion meaning that the originator
generates an ephemeral key pair for every message but uses the recipient’s static public key for ECDH (also known as
ECDH-ES). In NIST SP800-56A r3 terms this scheme is
called C(1e, 1s, ECC DH)
where 1e
stands for one ephemeral key pair and 1s
for one static key pair. This scheme
prevents that only a single message can be decrypted if the originator loses the ephemeral private key (i.e. key
freshness but no forward-secrecy). The downside is that the recipient has no way to authenticate the originator from the
CMS message itself. Possible approach to deal with this can be an additional signature or a transport of the CMS message
over an authenticated channel (see further NIST SP800-56A r3 section 7.4 and 7.6 for an in-depth discussion of
properties and assurances).
When using Standard ECDH with the goal to encrypt another key (like we do with our CEK), “Wrapped Key Transport Scheme” shall be implemented (see SEC1v2 section 5.2). To this end, the scheme setup then comprises several primitives:
- ECDH to derive a shared secret K
- A key derivation function (KDF) that derives our KEK from K and some DER encoded data
- A key wrap function to encrypt our CEK using the generated KEK
SEC1v2 section 3.6 allows several KDFs but only ANSI X9.63 for (new) general-purpose applications. Concerning key wrap, SEC1v2 section 3.9 allows for NIST AES key wrap algorithms when using AES block ciphers. When looking at RFC 5753 and RFC 8418, this means the following algorithms are allowed for Standard ECDH:
- KDF: ANSI X9.63 KDF with SHA1, SHA224, SHA256, SHA384 or SHA512
- Key Wrap: AES-128, AES-192, AES-256 key wrap
This time, the OriginatorInfo
is populated with the ephemeral public key that has been used for the ECDH key
agreement. The RecipientInfos
is populated with information helping the recipient to locate the static public key that
has been used by the originator (e.g. in case of certificates, the issuer and serial number are used).
With that, we have a good understanding of what it means to use the key agreement mechanism in CMS. The algorithms composition gets quite complex which is the reason why algorithm and key management profiling is crucial to reduce complexity and to ensure interoperability.
Content Type authEnvelopedData
#
Authenticated Encryption with Associated Data (AEAD) is a cryptographic primitive that provides both confidentiality and
integrity. Native CMS support with a dedicated data structure for AEADs has been added
in RFC 5083 with the content type authEnvelopedData
. All key management
mechanisms described in envelopedData can be used without changes.
Below is the data structure that again builds on the EnvelopedData
. Additions are of course a mac
field for the
Message Authentication Code (MAC) which is crucial for an AEAD scheme. In addition, authAttrs
provides the ability to
add data that is authenticated but not encrypted. This is useful for metadata that should be protected from tampering
but does not need to be kept confidential.
AuthEnvelopedData ::= SEQUENCE {
version CMSVersion,
originatorInfo [0] IMPLICIT OriginatorInfo OPTIONAL,
recipientInfos RecipientInfos,
authEncryptedContentInfo EncryptedContentInfo,
authAttrs [1] IMPLICIT AuthAttributes OPTIONAL,
mac MessageAuthenticationCode,
unauthAttrs [2] IMPLICIT UnauthAttributes OPTIONAL
}
What is missing are AEAD schemes. For this, RFC 5084 defines AES-CCM and AES-GCM as AEAD schemes for CMS. A valuable addition is also AES-GCM-SIV which is a nonce-misuse resistant variant of AES-GCM that is described in RFC 8452. Besides the AES world, there is also ChaCha20-Poly1305 that is defined for CMS in RFC 8103.
Summary #
CMS’ encryption capabilities are vast and powerful catering to a wide range of use cases. Let’s briefly summarize the key points and use cases:
encryptedData
is a simple format with meta information about the encryption algorithm. You application needs to
deal with key management and depending on the AES mode used, you need a convention for your application where IVs and
MACs are stored. Either, concatenate them with the ciphertext (encryptedContent
) or store them in
the unprotectedAttrs
. If you need advanced key management, envelopedData
and authEnvelopedData
are the way to
go. For both, the same key management mechanisms are available but authEnvelopedData
adds a standardized field for
storing MACs. The key management mechanisms support you in handling previously shared symmetric keys and deriving
symmetric keys from user passwords. If a recipient is addressed using an RSA key pair, key transport mechanisms like
RSA-OAEP are used. Key agreement mechanisms, on the other hand, are used for recipients with ECC key pairs. One
approach is to use Standard ECDH in an Ephemeral-Static fashion which is also known as C(1e, 1s, ECC DH)
.
That’s all for today. I hope you this article gives you a good idea on how CMS can help you to add end-to-end security to your sensitive data. If you have any questions or feedback, feel free to reach out to me on my socials below.