62 lines
1.8 KiB
Plaintext
Raw Permalink Normal View History

2025-01-12 00:52:51 +08:00
---
title: "Cryptographic Hashing in R"
date: "`r Sys.Date()`"
vignette: >
%\VignetteEngine{knitr::rmarkdown}
%\VignetteIndexEntry{Cryptographic Hashing in R}
\usepackage[utf8]{inputenc}
output:
html_document
---
```{r, echo = FALSE, message = FALSE}
knitr::opts_chunk$set(comment = "")
library(openssl)
```
The functions `sha1`, `sha256`, `sha512`, `md4`, `md5` and `ripemd160` bind to the respective [digest functions](https://docs.openssl.org/1.1.1/man1/openssl-dgst/) in OpenSSL's libcrypto. Both binary and string inputs are supported and the output type will match the input type.
```{r}
md5("foo")
md5(charToRaw("foo"))
```
Functions are fully vectorized for the case of character vectors: a vector with n strings will return n hashes.
```{r}
# Vectorized for strings
md5(c("foo", "bar", "baz"))
```
Besides character and raw vectors we can pass a connection object (e.g. a file, socket or url). In this case the function will stream-hash the binary contents of the connection.
```{r}
# Stream-hash a file
myfile <- system.file("CITATION")
md5(file(myfile))
```
Same for URLs. The hash of the [`R-installer.exe`](https://cran.r-project.org/bin/windows/base/old/4.0.0/R-4.0.0-win.exe) below should match the one in [`md5sum.txt`](https://cran.r-project.org/bin/windows/base/old/4.0.0/md5sum.txt)
```{r eval=FALSE}
# Stream-hash from a network connection
as.character(md5(url("https://cran.r-project.org/bin/windows/base/old/4.0.0/R-4.0.0-win.exe")))
# Compare
readLines('https://cran.r-project.org/bin/windows/base/old/4.0.0/md5sum.txt')
```
## Compare to digest
Similar functionality is also available in the **digest** package, but with a slightly different interface:
```{r}
# Compare to digest
library(digest)
digest("foo", "md5", serialize = FALSE)
# Other way around
digest(cars, skip = 0)
md5(serialize(cars, NULL))
```