[Concepts] HTTPS Connections: Overview

PART 1: Overview & intro to the TLS handshake!

Okay so today I'm going over HTTPS and trying to re-re-remember things I've forgotten and stuff. I thought it might be fun to try to explain it at a higher-level-ish.

What's HTTPS?

HTTPS is simply "HTTP over TLS", i.e. "HTTP/TLS". It's the same ol' HTTP underneath, it's just wrapped in TLS which is like a protective layer on top that enables encrypted communication.

SSL is a previous version of TLS, but many people refer to the two as the same thing.

HTTP is a protocol, which describes how two computers can transfer data between each other (generally in a request-response format).

You can wrap HTTP communication in TLS. While the data carrying HTTP messages can't be understood by eavesdroppers because of TLS, the two parties communicating (the client and the server) have the means to decrypt and understand each other.

Why HTTPS?

The main advantages are:

  • Verifying identities: Verify clients are talking to the server they believe they are
  • Securing communication: Given that you've verified your communication partner's identity, you can also ensure that no one else can understand your conversation except each other

What's the difference between HTTP and HTTPS connections?

Well, let's look at HTTP communication. We'll pretend we're a sentient computer in someone's home network (an IoT door lock perhaps) and we're watching the foolish mortals browse the web. Someone goes on kitten.bianc.at using Chrome:

This is plain HTTP. Here, the communication between the browser and the server is as follows:

  1. Browser initiates connection with server
  2. Browser sends HTTP request saying "please, GET kitten.bianc.at/ for me"
  3. Server reads the request, then figures out what data to send to the client, then sends that data (the page) with also includes a status of 200 OK

Here's what we see with our super convenient internet traffic monitoring tool1 (with the less relevant portions poorly whited out). We can see immediately that everything is sent in plain text. The convo below highlights steps 2 and 3:

This web developer doesn't have TLS enabled on their site! As a watcher, we can read the exchanged messages; it's not encrypted or anything. Also, the browser then doesn't do any kind of verification that it's talking to who it thinks it's talking to (what if it's actually gooddog.com pretending to be kitten.bianc.at?)

With how monitored the internet is nowadays (e.g. by work, ISP, your parents, the government?) people don't feel super comfortable with that, and there's decreasing use cases for non-encrypted communication. In this example it's just sending a plain HTML "Welcome to Nginx" page, but what if that had more important info, like someone's tax return or emo poetry (or a password even)?

Thankfully, the webadmin of kitten.bianc.at realizes their folly, and uses letsencrypt to handily-dandily add TLS to their site. HTTPS is now enabled.

Let's watch the same request, but now over made over HTTPS:

See the "Secure" sign? 🔒 Chrome says everything is good and secure.

Here's what the packet monitoring sees (it's okay don't read it now it'll make sense later):

Suddenly there's tons more talking before we even get any HTML from the server (I'm guessing the HTML being sent is the one highlighted in a gold box but since I can't read it I can't be 100% sure). However, unlike before, we can't read the messages, since they're now encrypted. Looking at the contents just shows nonsensical data.

Here, the communication between the browser and the server is different, in that it starts with the TLS Handshake. The handshake is a set of steps that client and server take to establish if and how to create a secure communication, and it has three main steps:

  1. Hello
  2. Certificate Verification
  3. Key exchange

Here's a terribly un-cropped version of the same image above with a (rough) outline of the steps occurring:

1. Hello step (starting a connection):

The hello step goes like this:

  1. The browser sends a ClientHello message, which includes information about what kind of cipher suites it supports (i.e. ways it knows how to encrypt stuff), TLS version, etc. Just introducing itself.
  2. The server sends its ServerHello and similar information, and also a decision on which cipher suite to use.

A peek into the ClientHello message and it's translation.

2. Certificate Verification (verifying identity)

This will be it's whole own post later (maybe) where we can go into it into more detail! This deserves a separate post.

Anyway, the main idea is that now that the browser and the server have started talking, the server then sends a certificate, aka the "SSL Certificate". This certificate contains important information for validating identity, and also a public key.

The client checks the certificate to see if it's trustworthy by checking the Certificate Authorities it trusts, or checking the legitimacy of "intermediate certificates", i.e. a Chain of Trust. If everything is all good, we move on to the third step.

3. Key Exchange (enabling secure communication)

Here's a very succinct description of this step (from the RFC):

 The general goal of the key exchange process is to create a
 pre_master_secret known to the communicating parties and not to
 attackers. The pre_master_secret will be used to generate the
 master_secret (see Section 8.1). The master_secret is required to
 generate the Finished messages, encryption keys, and MAC keys (see
 Sections 7.4.9 and 6.3). By sending a correct Finished message,
 parties thus prove that they know the correct pre_master_secret.

Okay, so all that means is:

  1. During the key exchange step, the parties need to create the pre-master secret (also called the pre-master key). This can be done in a couple of ways, the most common being the client generating a random sequence of a set length, and encrypting it under the server's public key which the client received earlier from the SSL Certificate.
  2. The client sends this encrypted key to the server2, and the server decrypts it with its private key. Therefore, if that encrypted key is intercepted during transmission, it can't be read unless the person intercepting (maybe some internet overseer) also has the server's private key. This is an example of asymmetric encryption.
  3. The pre-master key that both the parties can read is converted (using a math formula) into the master secret (or master key), which the browser and the server use to encrypt and decrypt their outgoing and incoming messages (respectively) for the rest of the session. This is an example of symmetric encryption.

Once this step is completed, we're pretty much done establishing the secure connection and the rest of the messages are unreadable to outside parties (since they're encrypted). Woo! The keys are valid for the entire session. We're a good IoT lock with a clear conscience.

Later we'll get more into the different technical bits and pieces.


Mistakes? Grammar/spelling? Comments? You can always @ me on Twitter!

Footnotes and etc:

  1. If you've never used Wireshark before, you're in for such a treat. Download it right now and be fascinated.

  2. An example of a different kind of key exchange that doesn't send a key over the network is the Diffie-Hellman Key Exchange

Further reading:

Future possible similar posts maybe:

  • What is HTTP?
  • Understanding packets
  • DTLS
  • Decrypting TLS through Wireshark

Edits:

  • Sunday, September 10, 2017: Typo fixes