Concepts
- Authentication
- Encryption
- States of the data
- Encryption type
- Key exchange
- Message digests
- Digital signatures
- Certificates
Kerckhoffs principle: "The effectiveness of the system must not depend on its design remaining secret.".
Authentication
Authentication is the process of confirming the authenticity claimed by an entity. We basically have two types:
- author authentication: the author is who they say they are. It can be achieved with a digital signature.
- data authentication: the data has not been modified. It can be achieved with a message digest algorithm.
Authentication does not mean we have encryption.
Encryption
Cryptography (from the Greek "kryptos" - hidden, secret - and "graphin" - writing. Therefore it would be "hidden writing") is the study of ways to convert information from its form original to an incomprehensible code, so that it is incomprehensible to those who do not know this technique.
In cryptography terminology, we find the following elements:
- The original information that must be protected and that is called plaintext.
- Encryption is the process of converting plaintext into unreadable text, called ciphertext or cryptogram.
- Keys are the basis of cryptography, and are strings of numbers with mathematical properties.
States of the data
In a secure system, when data is to be used (in use) it must be plaintext. But if it is stored or transmitted, it must be kept secret. These are the states:
- At Rest: when stored digitally on a device.
- In transit: when they are moving between devices or network points.
- In Use: When an application is in use, unprotected.
When we protect data at rest, we prepare our system for the eventuality that it can be read after an attack on the machine it is on. When we protect data in transit, we do so because we know that anyone can listen to network traffic.
If our encrypted data is exposed, and they gain access to the associated secret keys (either now or later), they will be able to see it in plain sight.
Sometimes we can avoid having to save them. For example, if we can ask for them every time (eg master key), or compare them with digests (eg passwords).
Encryption type
There are two large groups of encryption algorithms, depending on whether the keys are unique or in pairs:
- Symmetric: algorithms that use a single private key to encrypt information and the same private key to decrypt it.
- Asymmetric: those that have two keys, one public and one private, which allows encryption with either one and decryption with the other. They are mainly used for two purposes:
- Public key encryption: A message encrypted with a public key can only be decrypted with the private key.
- Digital signature: A message signed with the private key can be verified by anyone with the public key.
The symmetric key has drawbacks. First, you need a key for each source/destination pair. Second, you need a secure way to share them. The asymmetric key has the advantage that you can share the public part of the key, since without the private part you can't do anything, but the algorithms are slower and have limitations on the size of the message to be encrypted. For example, RSA has a maximum message of `floor(n/8)-11 bytes'.
Encryption can also be block or stream:
- Block: Encryption of fixed size blocks of data. They can be asymmetrical or symmetrical.
- Stream: encryption of a stream (current) of bits or bytes. They are symmetrical.
Unless otherwise stated, we are only talking about block ciphers.
Symmetric encryption: Both parties share a private key
Asymmetric encryption: Only the owner of the keys (the receiver) can decrypt the data
Key exchange
As we have already seen, symmetric encryption has the problem of sharing the key between the two parties. And asymmetric, it does not allow to encrypt very large blocks.
The solution is to use both types of encryption in combination.
- Asymmetric cryptography allows us to share a private key securely.
- Symmetric cryptography allows us to encrypt longer and faster messages.
The simplest key exchange can be done with RSA public key encryption: one party encrypts the shared secret with the other's public key. The problem is that this action is not "forward secret": if someone gets the private key in the future, and has saved the secret conversation, they could decrypt it.
However, the most common algorithm for key exchange is Diffie-Hellman (DH), present in the preamble of most communications with symmetric encryption.
For two parts A and B, the process is as follows:
- A and B generate two public/private key pairs, Apub/Apri and Bpub/Bpri.
- The two parties exchange the public keys Apub and Bpub.
- In private, each party combines the public keys received with the private ones (Apub+Bpri, Bpub+Apri). The essential feature of DH is that this combination generates the same secret!
Message digests
A "message digest" or hash is a sequence of bytes produced when a set of data is passed through an engine. A key is not always required for this engine to operate. Some well-known ones: MD5, SHA-256. Its properties are:
- It is deterministic (same result for the same input).
- It is fast.
- The inverse function is not feasible.
- A small change in input causes a large change in output.
- Two different entries cannot have the same hash.
A digest allows us to protect the integrity of a message.
Let's see how SHA-256 works on Linux over a small "hello world!" text. If you try it, you will see that the result is instant (2). Since it is 256 bits, it generates a 32-byte digest (in hex). See how changing one character completely changes the hash (4).
$ echo 'hello world!' | openssl sha256
(stdin)= ecf701f727d9e2d77c4aa49ac6fbbcc997278aca010bddeeb961c10cf54d435a
$ echo 'hello,world!' | openssl sha256
(stdin)= 4c4b3456b6fb52e6422fc2d1b4b35da2afbb4f44d737bb5fc98be6db7962073f
If you are looking for the summary of "hello world!" or "123456" you will find it on the net. Two conclusions: for one algorithm and one input, we have the same output (1). We don't have the reverse function (3), but there are summary tables for matching texts, which are used to figure out credentials.
If we want to protect integrity and authenticity, we can use MAC (message authentication code). Basically, these are secure digests encrypted with a private key that needs to be shared between the two parties in order to verify the communication.
We also have the Key Derivation Functions (KDF), a hash that allows one or more secrets to be derived from another smaller one, such as a password. KDFs allow extending keys (key stretching) into longer ones.
In the following diagram you can see two uses of KDFs:
- To save a hash of a password, and to be able to check if it is entered correctly.
- To generate symmetric algorithm keys from a password.
Digital signatures
The digital signature is an encryption mechanism to authenticate digital information. The mechanism used is public key cryptography. That is why this type of signature is also called a public key digital signature. They are used to guarantee three aspects: authenticity, integrity and non-repudiation.
This is the process to obtain a digital signature:
- A message digest is calculated for the input data.
- The summary is encrypted with the private key.
This is the process to verify a digital signature:
- A message digest is calculated for the input data.
- The summary of the digital signature is decrypted with the public key.
- The two summaries are compared. If they are the same, the signature is correct.
Digital Signature: The owner of the keys (the issuer) sends proof of the original data
Certificates
If a digital document is signed using a private key, the recipient must have the public key to verify the signature. The problem is that a key does not indicate who it belongs to. Certificates solve this problem: a well-known entity (Certificate Authority: CA) verifies ownership of the public key sent to you.
A certificate contains:
- The name of the entity by which the certificate was issued.
- The public key of this entity.
- The digital signature that verifies the information in the certificate, made with the issuer's private key.
A certificate could contain this information:
"Certificate authority 2276172 certifies that John Doe's public key is 217126371812."
Problem: The certificate issuer has a public key that we need to trust. And they can be chained. The last of the chain is self-signed by herself. Then we need to accept certain CAs as trusted, and usually Java (as well as browsers) have a list of trusted CAs.
The TLS/SSL server certificate is the most common. The client uses the Certification Path Validation Algorithm:
- The subject (CN) of the certificate matches the domain it connects to.
- The certificate is signed by a trusted CA.
How do you establish secure communication between your browser and an HTTPS server?
- When the browser connects to the server it downloads its certificate, which contains its public key.
- The browser has the public keys of the trusted CAs, and checks if the certificate has been signed by a trusted CA.
- The browser checks that the domain that appears in the certificate is the same with which it communicates with the server.
- The browser generates a symmetric key, encrypts it with the server's public key and sends it.
- Asymmetric encryption ends and symmetric encryption with the shared key begins.