Webs of Trust and Public Keys. Oh My!
Introduction
Unless you haven’t been paying attention, you might have heard of things like https or SSL, and if you use a web browser at all will undoubtedly have seen https at the front of pretty much all web addresses these days. It wasn’t always the case. We have the Electronic Frontier Foundation to thank for the fact that it is so widespread (take a look at that link. It has an https in front of it!) Let’s explore this, and while we’re at it, some attacks and potential defences.
Let’s start with what http stands for: basically it’s the hypertext transfer protocol, which thanks to Sir Tim Berners-Lee became the way in which the World Wide Web shared information from servers to our nice little web browsers on the many devices that we use today. The various different ways in which it has changed over the years since it first appeared in 1989 and made the Internet the wonderful place it now is are not important to this book. Just think of it as a way to share information on the web. Now, you will have hopefully noticed that I mentioned http, not https.
Smashing.
The “s” is important – it basically stands for “secure” and it’s important. What it does is enable secure communication between those pesky web sites and our beautiful browsers. The things sent using https are encrypted between the site and browser which means that it is hard (I won’t say impossible) to eavesdrop on what is being shared, or indeed to change it. We’ll get back to the possible attacks on it in a moment, but before we do, consider that it enables secure communication so that things like your Google Mail, or DuckDuckGo searches, or online banking and so on to actually happen.
Encryption
It’s a big deal.
How does it work? Well, that SSL I mentioned just now is basically it (although these days it’s called TLS). SSL stands for secure socket layer (and TLS stands for transport layer security – basically, meet the new boss, same as the old boss). SSL/TLS actually enable a bunch more than just secure http – for example secure oice over IP telephony – but what is important for us here is basically how the whole thing works.
It’s all to do with cryptography, of course. I talked a bit about cryptography in the Trustless Systems chapter, so I won’t go too much into it here, but there is one thing I didn’t mention, and that was using keys and certificates. Okay, two things (cue Monty Python sketch).
So, on the assumption you know how hashes work, and if you don’t, head to the Trustless Systems chapter to find out, let’s dive into public and private keys and such first. Technically, you’d call the field public key cryptography and it’s actually a pretty neat thing because it allows even small people like us to get some privacy.
There’s a couple of ways of doing this, so let’s jump into the first, old way, and you will be able to see the issues. It goes like this. Alice wants to send Bob a message. She uses a shared key to encrypt the message. Did I hear you say, “Wait, stop there! What is this magic of which you speak?”
Okay, so encryption in four paragraphs or less. Actually this is nothing new, people have been using encryption for many many years, probably as long as there have been people. You might remember your own childhood when you used a bit of trickery to, for example, shift characters up the alphabet so that E meant A, F meant B and so on, and then used this little cypher (okay, in North America you’d probably say cipher, but whatever) to encrypt your message. Let’s see now. If Alice and Bob were using this, their shared key would be the number of letter shifts (in this case, five, because the fifth letter is the first), Alice would use this little shift algorithm to encrypt her message to E HKRA UKQ, send it to Bob, who shifted the letters back to decrypt it. I’ll leave you to figure out what Alice’s message to Bob was.
As it happens, this kind of thing was used by the Ancient Greeks with what are called Scytales. It’s pretty neat. Easily broken? Sure. But that’s basically what shared secret (symmetric) cryptography looks like. Oh, there are lots of fancy bells and whistles that involve rather complicated mathematics and proofs, but they aren’t really too important for the purposes of this chapter (or this book), which is about trust, not security, and anyway I’m not cryptographer so I’d probably get it wrong.
So there are lots of examples of this symmetric key cryptography around, from those Scytales to the Enigma machine the Germans used in World War Two and beyond. It has the advantage of being quite quick but the disadvantage of needing to keep the keys secret. There are some answers to that shared secret thing that involve their own bit of cryptography though (look up Diffie-Hellman key exchange or RSA).
Now, aysymmetric encryption is the basis of things like PGP (Pretty Good Privacy), digital signatures, and things like that. It’s also the backbone of that https I was talking about at the start of this little chapter. It’s also called public key cryptography. That word “public” is important, as you will see. Bear with me, it’s actually not too complicated, but it did take me ages to be able to figure it out when I first saw it – I am clearly quite slow. Actually, symmetry and all of its associated words are one of my kryptonite spelling words – I always get them wrong. Asymmetric encryption basically relies on two different keys which are cryptographically generated (which basically means “clever mathematics”).
Let’s imagine Alice and Bob again. Alice uses an encryption algorithm. There are plenty around – RSA is one of them, and you’ve heard of that. ECC (elliptic curve cryptography) is another, and it’s often used in cellular communication. Alice gives the algorithm a random number (the bigger the number, the better) and the algorithm gives her two keys. The first is a private key, which Alice has to keep secret. The second is a public key which she can share. Meanwhile, Bob does the same thing.
In order to send a message to Bob that is private, Alice will need his public key. Skipping over the details of that for a moment, let’s imagine for instance that Bob includes it in his email signature (it’s just a jumbled mess of text, so that’s possible). Alice grabs Bob’s public key and uses her encryption tool to encrypt the message she wants to send with Bob’s public key. The result will be a big midge-modge of text because it’s been encrypted. She can now send this to Bob.
Bob receives a message from Alice which is, well, a midge-modge of characters and thinks to himself “Ahh, Alice sent me an encrypted message! Cool! I wonder what it says!”. He heads to his encryption tool[1] and pastes the text into it. He also[2] puts in his private key and if he is sensible the password he has used to secure it[3]. He clicks on the decrypt button and gets the message Alice sent.
Okay, so that’s pretty cool, because now people like you and me can do really secure messaging.
Curious about doing this for yourself? Well there are various tools around that allow you to, for instance, encrypt your email or other messages. Some are online (like https://smartninja-pgp.appspot.com/#) and some are tools to download and use on your computer[4].
Right, back to the asymmetric bit. It’s called asymmetric because there are two keys for each person – a public one that is shared and allows people to encrypt messages for you, and a private one that you keep secret and use to decrypt messages to you. You can also use it for things like signatures, so that you know the message came from the person who says sent it. In this case, let’s pick on Alice and Bob again. Bob needs to be sure that the message he gets was actually sent by Alice – after all, it could have some pretty personal stuff in there and he doesn’t want to get all mixed up. He asks Alice to sign her messages from now on. Alice is a little confused so looks up how it all works.
Here goes!
First, Alice has to create the message she wants to send. “That’s not so hard,” she thinks. The next step is something you’ve seen before, it’s a hash! So, Alice has to put her message through a hash function (like the SHA-256 one we saw in the Trustless Systems chapter). It’s a one-way hash, which basically means, as you already know, that the hash can’t be used to get back to the data that was used. “But that means that Bob won’t be able to see the message!” says Alice. This is indeed true. We’ll get there.
Anyway, the hashed message, plus information about the algorithm used to hash it (like SHA-256 for instance), is encrypted using Alice’s private key. This bit is important. The only person who should have a copy of Alice’s private key is Alice, right? Right. So if it is signed (encrypted) with this private key, Bob will know that Alice must have sent it. What Alice just created is the digital signature of the message. At this stage, if Alice sent the message as regular text along with this signature, what could Bob do?
Right—he can use Alice’s public key to decrypt the signature. How this works isn’t too important, but as long as Bob has Alice’s public key, it’ll work. This of course means two things: first, everyone with Alice’s public key could do this and second, so can Bob.
When Bob decrypts the signature, what does he get? He gets the hash of the message plus details of the hashing algorithm used. It’s a one-way hash! What good is that?! Well, at this stage, what Bob could do is to run the message (which Alice sent as plain text) through the same hashing algorithm. He would get a hash, right? The thing is, if the hash didn’t match the one Alice sent, he would know that the message has been altered. Cool eh? On the other hand, if the hashes do match, he would know that Alice’s message is the one she sent.
But there is of course the problem that the message Alice sent was sent in plain text. This kind of destroys the purpose of having privacy in the first place! It does give us non-repudiation though. If you get a message and the digital signature of the message, it doesn’t necessarily matter if the message is in plain text. If you can decrypt the signature with the sender’s public key and run the plain text message through there same hash function, you will know that the message you got was the message that was sent, and you can prove it. This itself is quite valuable. But it doesn’t make Alice’s message private, which is what she wants to do!
What should Alice do?
We already know the answer to this. She can use the regular encryption stuff she already knows! We can skip one of the steps she just did actually because she doesn’t really need it.
Here we go. Again.
She takes her message and puts it through the hash function. Then, she takes the hash, the message she wants to send to Bob, and the hashing algorithm used, and then encrypts the lot with her own private key. She then encrypts that with Bob’s public key. She can now send this all to Bob. Phew.
One thing: did you see the step that was missed out? The regular digital signature step was to hash the message and then encrypt this hash (and what the algorithm used was) using Alice’s private key. In the steps take above, she doesn’t create a digital signature of the message like that. Instead, she takes the message, the hash of the message and the algorithm used and hashes all that together with her private key. The end result is pretty much the same: Bob knows (and can prove) the message came from Alice and also that it wasn’t changed en route.
How? When Bob gets the message, he can decrypt using his private key. This will get him the message digitally signed by Alice (basically, more useless text). He can take this and decrypt it using Alice’s public key. If this works he knows that Alice sent the message. Bob then has the original of the message, the hashed version of the message, and the algorithm Alice used. Smashing – he can run the original of the message through the same algorithm and get himself a hash of it. If the two hashes match, he knows that Alice not only sent the message but also that it wasn’t changed. Bob and Alice now go and take a headache pill because this is all quite confusing.
If Bob really wanted to, he could get Alice to authenticate herself like this: Create a random message, encrypt using Alice’s public key and send to Alice. Alice is the only one who can decrypt this because she has her private key. She does this and sends the random message back to Bob who, if it matches, knows that it was decrypted using Alice’s key and so, well, this must be Alice!
Back to https and TLS (SSL). Oh yes, you’d forgotten we were there hadn’t you! It kind of works the same way as what I just talked about, but there are a couple of other steps to think about. It’s all to do with certificates! In practice, the public key of a person is a bit unwieldy and anyway, whilst Bob might trust Alice to be who she says she is, that digital signature likely won’t hold up in court. We need a way for Alice to prove beyond a doubt that the public key she is passing out actually does belong to her. If we had this, whenever she signed a document everyone would know that she had signed it even if they didn’t know her (and indeed even if they didn’t really trust her).
This is managed through the use of certificates. A certificate is basically a a credential that is given to you by what is known as a certificate authority, or CA. Certificates follow a standard format called X.509. Basically they contain that the public key of the CA plus proof you are the owner of that key. Basically this is in the form of the public key of the person who is being authenticated, hashed and encrypted by the CA’s private key, which is pretty much the digital signature of the CA who issued the certificate. There are actually lots of CAs around that can issue certificates. How does that work? Each CA is part of a hierarchy of CAs all the way back to what is known as a root. Each step through this hierarchy is basically a trust step. That’s why the CA is known as a trusted third party (TTP).
The root CA issues certificates to CAs underneath it who it trusts, they issue CAs below them with credentials, and so on, until you get to the one that you can get a certificate from if you can prove you are who you say you are to the CA in question. In other words, the CA basically says, “Yep, this is Steve” and you can be sure that the public key in that certificate belongs to me! Did you spot that trust word?
Lots of root certificates are stored in browsers (or for browsers) on every machine around the world, so that when the browser goes to a website it should be able to be sure that the website is who it says it is, because if you remember, the signature on the certificate is made because the private key of the CA was used to encrypt the hash of the public key of the website. So, if you have the public key of the CA[5] you can decrypt the signature in the certificate, which gives you the hash, and then take the public key the server actually sent you and hash this. If the hashes are the same, you basically know that the CA signed the public key the server just sent you, right? And you know nobody could have copied it because all you got was the hash of the public key from the certificate authority, not the public key itself (so the server couldn’t know what the key was, unless it owned it).
I think that kind of answers the question about proving things, but if you really think about it, it’s potentially open to all kinds of problems. Most importantly, the CA signs the certificate and you trust the CA. Can you see where this is going? If you haven’t read the bit about system trust in the Trustless Systems chapter it might be a good time to do that now. We’ll come back to this too.
Now that we know how the whole symmetric and asymmetric thing works, and we have a decent enough grasp of certificates, we can jump right into https[6]. It works, as I have already said, by allowing websites to prove that they are who they say they are, as well as allowing your browser and the website to be able to encrypt the information that is sent between them.
Why is this useful? Imagine online banking without it. Or writing a Google email that was sensitive for whatever reason[7]. It’s a big deal. It works in a couple of stages, and it’s all to do with that TLS, or SSL protocol.
Here goes! There are actually a few ways this is done, but let’s pick one, it’s the RSA key exchange algorithm and it’s the most popular one. Also, bear in mind this is quite simplified. This isn’t a security book, it’s about trust. If you want security classes, there are many around.
First, you ask your browser to go to someplace[8]. You could click on a link, or a bookmark, or even type in the address, it’s all the same to me (and the browser). Let’s call the browser a client from now on. It’ll make more sense (to me anyway).
Second, the client basically says, “Hello!” to the server. It’s polite to say hello, but in this case the hello contains information like, “This is the version of TLS I support,” “These are the cipher suites I support”[9], “Here is a random number.” The random number is important, as you will see. We’ll call it the client random number. The server gets all this stuff and so the third step is for it to reply (it would seem churlish not to). The reply contains a, “Hello back, small computer!” (Well, not really but close) as well as, “Here is my certificate,” the certificate that was given to it by a CA, “This is the cipher suite we will use,” and another random number, which we will call the server random number.
With this information, the client is now able to authenticate the server. It does this by checking the certificate with the certificate authority. In practice it has all those root certificates stored on your computer someplace. The certificate the server sent the client is signed with the CA’s private key. To check it, the client grabs the public key of the CA from its store of root certificates and checks the signature against the one that the server sent in the certificate.
How?
By decrypting the signature in the certificate sent by the server using the CA’s public key, and checking the hash of the server’s public key against the hash of the public key that is stored in the digital signature that was signed by the CA. All being well, the hashes will agree.
What does this give us? It gives us the fact that the public key the server sent us is the same as the public key that the CA signed. Nothing else. The server is still not authenticated at this point. Whoa, now there’s a trusted third party.
Right, now the client can create a random number and encrypt it using the public key of the server (which the client knows the server has because it sent it to us, and anyway its hash is in the digital signature of the certificate). The client sends this encrypted final random number to the server. The server, if it knows the private key, can decrypt the random number. If it was trying to pretend it was something it wasn’t, it wouldn’t have the private key that was associated with the public key it just sent out (which the client knows belongs to a specific server because, well, the CA signed it and it all matched). If this is the case, the client would know that not only is the server the owner of a certificate signed by the CA, it is also the owner of the private key associated with the public key that the CA signed in that certificate.
Phew.
The final random number is not sent back to the client at this time. What would be the point? In fact, the final random number that was decrypted, the client random number the client sent in the first step, and the server random number the server sent back in the second, are used together to create a session key. Because everyone has the same numbers, the session keys should be the same on both client and server. If they’re not, we’ll know someone is trying to listen in (a Man in the Middle).
The session keys are then used to encrypt a, “Finished!” message from the client and one from the server. Since they are both the same key, we’ve got symmetric encryption happening now (which is faster) and the session can continue with all traffic between the clients and the server encrypted by the session key. Ta-da!
It’s a pretty interesting thing, but it is not without its problems. One of these has to do with that trusted third party idea. All it takes for a fake CA to spring up and give fake, yet valid, certificates, is to grab the private key of one of the real CAs. This is in fact exactly what happened to DigiNotar several years ago, with the result that fake Gmail certificates were created and used to allow spying (basically) on the emails being sent using Gmail in Iran (for some users). That bit about being in an authoritarian country and so on that I mentioned earlier? There you go.
Sure, there is a way to revoke certificates but in this particular case the fragility of the system was exposed, because it takes time (and money) to revoke all the certificates in all the browsers and operating systems all over the world (to be safe). Quite frankly, it’s a nightmare. Not to mention, there are various other vulnerabilities that can (and indeed have) spring up. We needn’t go into them here.
The thing is, https gives a sense of trust in the websites that you are visiting. At the very least it is supposed to give you the assurance that the website you are talking to is the one it says it is, and that nobody can listen in the middle, amongst a few other things. It is clear (I hope) that it does a fair job at all of this. After all, it’s so much better than nothing at all. But it relies on a hierarchy of trusted third parties. And where we talk about trust in this setting, what we really mean is that you have no choice.
Somewhere along the line someone (the people who make browsers, for instance) has decided which certificates are good and which won’t be used. Plus, you’re assuming that the certificates you have are still good. Certificates have a shelf life. If a certificate expires your web browser should tell you when you go to visit a site that uses it. Actually it does, but then it gives you the option of carrying on regardless, sometimes easily and for other browsers with quite a lot of difficulty. Fair enough.
Anyway, there’s one thing that rears its head that should be remembered, and that is that this hierarchy is pretty fragile. In response to this, and the rather more distributed nature of the Internet and associated networks that exist, something called the web of trust (WoT) has grown. It’s like this: anyone can issue a certificate to anyone else that they know to say, “Sure, this is Steve”.
“What madness is this?” you might be asking. After all, if Charlie issues me a certificate, what on earth does that tell you? Not much. But what if you knew Charlie and trusted him? And what if not just Charlie, but Dave and Ernest and Samantha all vouched for me? The more people who vouch for me, the better, right? The thing about WoT certificates is that they can be vouched for (verified) by a number of people. Each one can basically sign the certificate to say, “Yep, this is Steve.” These signatures can come from people you know and trust, which makes it better, and from people who are known and trusted across the planet (and trusted by other strongly trusted people, and so on). Sharing the trust can often come from things like key signing parties, where people get together physically (so they know they exist, right?) to share codes that will allow them to verify the signatures[10].
The decision to trust a certificate (and thus the person giving it to you) is your own, and could be based on your own requirements. For example, you may say that you won’t trust a certificate unless it is at least partially trusted by more than 6 others, or at least fully trusted by one fully trusted other (or more).
Actually, if you bothered to use that PGP stuff I put up earlier in this chapter, you already used WoT stuff. PGP and the web of trust are very closely linked. In fact, they were both created by Phil Zimmerman, a computer scientist. In his words (from the PGP 2.0 manual):
–Phil Zimmerman, PGP 2.0 Manual
Okay, so this all sounds very nice and utopian (Thomas More would indeed be proud) but it does indeed have its advantages: you get to choose who you will trust, for one thing, and how much, so it’s not a million miles away from how trust really works, right? Sure, it has its problems and some of them are actually social, which is pretty neat. After that, it basically allows certain things to happen, according to how much, or if, you trust the other person (like sharing encrypted information knowing that Alice is Alice and Bob is Bob).
Just don’t lose your keychain, or your own private key. Why? Because if you lost your private key and someone else got it, they could basically sign certificates as if they were you (which brings us back to that DigiNotar example). Whilst this might not be quite as horribly dangerous as the PKI (centralized) one when keys are lost, it can still be very problematic. More, if someone sends you a message that is encrypted using your public key and you don’t have the private one, well, you can’t decrypt it. But someone who is not you could if they stole it, and they could pretend to be you too.
Could you use it in real life to help you trust someone? Well, if by trust you mean ensure that this person is who they say they are and make sure the messages I’m sending and receiving are private and authenticated then sure. If you mean other stuff like lend them money or take investment advice or whatever, perhaps you need to look a little further? And if you are using it for that, bear in mind that this is not what it was intended for.
Security and trust are difficult bedfellows. In the Trustless Systems chapter I talk a bit about zero trust, which in fact is really just trust as it should be. Most of the time when security talks about things like trustworthy or trusted what it really means is you don’t get a choice. Use it. Or, to be more fair, this is secure and you can use it without worrying. This is in fact the opposite of what trust really means. It’s also incorrect because, as we have already learned, there is always a handsome prince.
- W, in this instance it’s for decrypting! ↵
- because he might have more than one ↵
- Ut never hurts to have a password – if the key is lost then at least you have a little more insurance that even if someone finds it or steals it or whatever the password will mean they can’t use it to decrypt stuff ↵
- . A quick search for PGP tools for your favourite computer will bring you some, but be careful to make sure you are getting things from a reputable provider. ↵
- You do, it’s already on your machine. ↵
- Finally, Steve! ↵
- Perhaps because you are a dissident in an authoritarian country, or a journalist protecting a source. ↵
- If the address starts with https it’s a clue that what follows will happen. ↵
- Like, you know, which encryption algorithms I can use. ↵
- If you are wondering how that stood up in COVID lockdowns, so am I. ↵