R2 presigning using Web Crypto

I was using R2, the S3 compatible bucket by Cloudflare that advertises zero-egress fees.

A natural use case is to allow third parties to upload content into your bucket for you to manage, index and order. Providing a suitable controlled interface for such is always a bit of a challenge.

Cloudflare provides an excellent way to run serverless functions called Workers. Standalone they are impressive since they are running as V8 isolates, many in a single process, similar to how Chromium tabs work.

Tangent

In Chromium each process can have multiple V8 isolates running at the same time, however for security considerations is locked to documents from a single site.

process process «isolate»v8 «isolate»v8 «isolate»v8 «isolate»v8 «isolate»v8 «isolate»v8 «isolate»v8 same-site tabs

Ok, not entirely true, but it's a good way to get the point across. Mobile devices that are a bit more resource constrained are somewhat isolated, with several heuristics used, such as sites that are most likely to have user-specific information.

So, I just said that they are grouped as same-sites. That's because origins are a bit different. Origin is defined to include scheme, hostname and port, while site only means scheme and eTLD+1.

# Origin:

https://www.example.co.uk:443
------ ------------------ ---
Scheme + Hostname [FQDN] + Port

# Site:

https://www.example.co.uk:443
------      -------------
Scheme   +    eTLD+1 

What are eTLDs? They are subdomains that associated with the TLD that act as if they effectively a TLD by itself. One example of an eTLD is co.uk, where co is a subdomain for the .uk TLD. In fact, .uk wasn't allowed to be registered directly until 2014, no wonder co.uk is so popular.

So, why not use same-origin isolation instead? Apparently a Web API called document.domain allows you to effectively modify the origin at runtime, with the side-effect of making it an unreliable metric to cluster tabs. Why is it there? By allowing a few subdomains to say they are part of the same superdomain, they can easily communicate by relaxing the same-origin policy. Of course, there are weird holes with this approach, so Chromium is trying to get rid of it.

Webkit just follows a simple one process per tab model instead of dealing with these wacky definitions, where two webpages are never consolidated into the same rendering process, even under high memory pressure and even if they share an eTLD+1 in their URLs. Instead, Webkit spawns a new rendering process for each tab until the system runs out of memory.

Why do they do this? To place the burden of preventing timing attacks on the OS.

Cloudflare loves to advertise to use Workers for everything, so they suggest just route all requests to a Worker and let it firewall access to your bucket. However, Workers suffer with the problem of a max request body size of a few hundred MB, so it wasn't the right fit. I was looking for allowing to upload larger sized content, a few GB.

browserworkerbucketRequests pre-signed URL for PUTSends pre-signed URL for PUTPUT with pre-signed URL

What if there is some way to directly allow the third party person to access operating on R2 without a middle person? Voila, it's called pre-signed URLs, a URL that provides temporary access to a resource.

Where do you get the URL? You create one with some cryptographic magic. I can use the Worker to create the pre-signed URL, so that if someone wants to PUT some object into my code they can call the Worker and get the URL to upload to the bucket directly.

I better get to implementing the logic to generating this magical pre-signed URL. Cloudflare fortunately provides some documentation, courtesy of Kian.

import { getSignedUrl } from '@aws-sdk/s3-request-presigner';
console.log(await getSignedUrl(S3, new PutObjectCommand({Bucket: 'my-bucket-name', Key: 'dog.png'}), { expiresIn: 3600 }))

Import @aws-sdk and then call getSignedUrl. Seems mysterious! I bet it uses some Node.js library internally.

if (this.signerOptions.runtime !== "node")
     throw new Error("This request requires signing with SigV4Asymmetric algorithm. It's only available in Node.js");
return this.getSigv4aSigner().sign(requestToSign, options);

Yup, it does, not surprising. Let's reimplement it using the Web Crypto interface, a standardised interoperable standard, instead. Even Node.js has implemented the Web Crypto interface, along with other standard web platform APIs. Apparently, there's a runtime interoperability group called WinterCG with the members of Cloudflare, Deno, Vercel, and you guessed it Node.js, to jointly try to fix the messed up runtimes.

After going through countless abstraction layers, and weirdly vague yet highly specific AWS documentation, I somewhat implemented it. It's not perfect or efficient by any means. After implementing I realised that both S3 and R2 don't verify the signed hash of the object, so I'm using UNSIGNED-PAYLOAD here for clarity, while they hopefully resolve the regression. Hardcoded for 900 seconds timeout.

async function generatePresignedPutUrl(fileName) {

    const date = new Date();
    const dateIso = date.toISOString().replace(/:/g, "").replace(/-/g, "").replace(/\./g, "").slice(0, -4) + "Z";
    const dateShort = dateIso.split("T")[0];

    const url = "<bucketname.accountid.r2.cloudflarestorage.com>"
    const accessKey = "<accesskey>"
    const secretAccessKey = "<secretaccesskey>"
    const region = "auto"
    const service = "s3"

    const canonicalHeaders = "host:" + url 
	const queryParams = "X-Amz-Algorithm=AWS4-HMAC-SHA256" + "&X-Amz-Content-Sha256=" + "UNSIGNED-PAYLOAD" 
	+ "&X-Amz-Credential=" + accessKey + "%2F" + dateShort + "%2F" + region + "%2F" + service + "%2Faws4_request" 
	+ "&X-Amz-Date=" + dateIso + "&X-Amz-Expires=900&X-Amz-SignedHeaders=host&x-id=PutObject"
   
    const canonicalRequest = "PUT" + "\n" +
        "/" + fileName + "\n" +
        queryParams + "\n" +
        canonicalHeaders + "\n\n" +
        "host" + "\n" +
        "UNSIGNED-PAYLOAD";

    const canonicalRequestDigest = await crypto.subtle.digest("SHA-256", toByteArray(canonicalRequest))
    const canonicalRequestDigestHex = toHexString(canonicalRequestDigest)

    const stringToSign = "AWS4-HMAC-SHA256" + "\n" +
        dateIso + "\n" +
        dateShort + "/" + region + "/" + service + "/aws4_request" + "\n" +
		canonicalRequestDigestHex;
		
    var signingKey = `AWS4${secretAccessKey}`;
    for (const signable of [dateShort, region, service, "aws4_request"]) {
        signingKey = await HMAC(signingKey, signable);
    }

    const signature = await HMAC(signingKey, stringToSign);
    return `https://${url}/` + fileName + "?" + queryParams + "&X-Amz-Signature=" + toHexString(signature)
}

So this uses the SigV4 technique for signing the requests, with a signing key created using the secret key, and the request as the string to sign. This prevents the key from being exposed, the request being reused, and data tampering in transit. I also learned that curl supports SigV4!?

Anyway, that's all for now. Hv fun!!