112 lines
3.7 KiB
Plaintext
Raw Normal View History

2025-01-12 00:52:51 +08:00
---
title: "Fun with bignum: how RSA encryption works"
date: "`r Sys.Date()`"
vignette: >
%\VignetteEngine{knitr::rmarkdown}
%\VignetteIndexEntry{Fun with bignum: how RSA encryption works}
\usepackage[utf8]{inputenc}
output:
html_document
---
```{r setup, include=FALSE}
library(openssl)
knitr::opts_chunk$set(echo = TRUE)
```
Primitive types such as `int` or `double` store numbers in exactly 4 or 8 bytes, with finite precision. This suffices for most applications, but cryptography requires arithmetic on very large numbers, without loss of precision. Therefore OpenSSL uses a __bignum__ data type which holds arbitrary sized integers and implements all basic arithmetic and comparison operators such as `+`, `-`, `*`, `^`, `%%`, `%/%`, `==`, `!=`, `<`, `<=`, `>` and `>=`.
One special case, the [__modular exponent__](https://en.wikipedia.org/wiki/Modular_exponentiation) `a^b %% m` can be calculated using `bignum_mod_exp`, even when `b` is too large for calculating `a^b`.
```{r}
# create a bignum
y <- bignum("123456789123456789")
z <- bignum("D41D8CD98F00B204E9800998ECF8427E", hex = TRUE)
# size grows
print(y * z)
# Basic arithmetic
div <- z %/% y
mod <- z %% y
z2 <- div * y + mod
stopifnot(z2 == z)
stopifnot(div < z)
```
RSA involves a public key and a private key. The public key should be known by everyone and is used for encrypting messages. Messages encrypted with the public key can only be decrypted in a reasonable amount of time using the private key. In RSA, this asymmetry is based on the practical difficulty of factoring the product of two large prime numbers.
### RSA key generation
An RSA key-pair is generated as follows (adapted from [wikipedia](https://en.wikipedia.org/wiki/RSA_(cryptosystem))):
- Choose two distinct prime numbers $p$ and $q$. Keep these secret.
- Compute the product $n = p*q$. This $n$ value is public and used as the modulus.
- Compute $\phi(n) = (p 1)(q 1)$.
- Choose an integer $e$ smaller than $\phi(n)$ such that $e$ and $\phi(n)$ are coprime. OpenSSL always uses $65537$.
- Compute a value for $d$ such that $(d * e)\pmod{\phi(n)} = 1$.
OpenSSL has a key generator that does these things for us.
```{r}
(key <- rsa_keygen(512))
(pubkey <- key$pubkey)
```
Usually we would use `rsa_encrypt` and `rsa_decrypt` to perform the encryption:
```{r}
msg <- charToRaw("hello world")
ciphertext <- rsa_encrypt(msg, pubkey)
rawToChar(rsa_decrypt(ciphertext, key))
```
Let's look at how this works under the hood.
### How RSA encryption works
The `data` field of the private key extracts the underlying bignum integers:
```{r}
key$data
```
You can verify that the equations above hold for this key. The public key is simply a subset of the key which only contains $n$ and $e$:
```{r}
pubkey$data
```
In order to encrypt a message into ciphertext we have to treat the message data as an integer. The message cannot be larger than the key size. For example convert the text `hello world` into an integer:
```{r}
m <- bignum(charToRaw("hello world"))
print(m)
```
To encrypt this message $m$ into ciphertext $c$ we calculate $c = m^e\pmod n$. Using the public key from above:
```{r}
e <- pubkey$data$e
n <- pubkey$data$n
c <- (m ^ e) %% n
print(c)
```
This number represents our encrypted message! It is usually exchanged using base64 notation for human readability:
```{r}
base64_encode(c)
```
The ciphertext can be decrypted using $d$ from the corresponding private key via $m = c^d \pmod{n}$. Note that `c^d` is too large to calculate directly so we need to use `bignum_mod_exp` instead.
```{r}
d <- key$data$d
out <- bignum_mod_exp(c, d, n)
rawToChar(out)
```
The only difference with the actual `rsa_encrypt` and `rsa_decrypt` functions is that these add some additional padding to the data.