This is what causes XXE vulnerabilities

XML external entity (XXE) attacks are one of the OWASP Top Ten security risks. XXE attacks are caused by XML parsers that have entity processing enabled. Here’s an example of a simple Ruby program that has entity processing enabled in Nokogiri, its XML parser:

This allows our XML parser to read the contents of our local filesystem, the key point being that this occurs because the NOENT flag is enabled. When we run the program, we see contents of /etc/passwd (limited to the first 10 lines for brevity):

$ ruby xxe.rb | head -n 10
<?xml version="1.0"?>
<!DOCTYPE root [
<!ENTITY users SYSTEM "file:///etc/passwd">
]>
<root>
    <child>##
# User Database
# 
# Note that this file is consulted directly only when the system is running
# in single-user mode.  At other times this information is provided by

If we ran the program with NOENT disabled, we’d see the following:

$ ruby xxe.rb | head -n 10
<?xml version="1.0"?>
<!DOCTYPE root [
<!ENTITY users SYSTEM "file:///etc/passwd">
]>
<root>
    <child>&users;</child>
</root>

In this case, we see that there’s still a reference to the users entity, and we haven’t read the contents of our local filesystem.

This raises a question: what does NOENT actually mean?

At first glance, the naming is a bit counterintuitive. NOENT looks like it means something like “no entities,” but we are processing our users external entity when that flag is enabled.

Luckily, we don’t have to search far in Nokogiri’s source code to see how NOENT is used. Nokogiri is partially implemented in Java, and we can find this code snippet in its XmlDomParserContent.java source file:

In the same file, we find FEATURE_NOT_EXPAND_ENTITY defined like so:

To summarize what we’ve discovered so far: when NOENT is enabled, the FEATURE_NOT_EXPAND_ENTITY feature is turned off, and this is when we see our entity expanded with contents from the local filesystem. When NOENT is disabled, the FEATURE_NOT_EXPAND_ENTITY feature is turned on, and we don’t read contents from the local filesystem.

That’s a lot of consecutive negatives! Let’s reword it for clarity: when our flag is enabled, the feature which expands entities is turned on. Put this way, the behaviour is a bit more clear – we see the contents of the local filesystem because our entity-expanding feature is enabled.

Still, this doesn’t answer our original question – why the name NOENT? To answer that, we can look at Apache documentation related to the FEATURE_NOT_EXPAND_ENTITY definition shown previously. Under the definition of the http://apache.org/xml/features/dom/create-entity-ref-nodes feature, we expect the following behaviour when FEATURE_NOT_EXPAND_ENTITY is set to true:

Create EntityReference nodes in the DOM tree. The EntityReference nodes and their child nodes will be read-only.

And when it’s set to false:

Do not create EntityReference nodes in the DOM tree. No EntityReference nodes will be created, only the nodes corresponding to their fully expanded sustitution [sic] text will be created. 

In other words, when NOENT is enabled, it means that we don’t expect to see an EntityReference node in our parsed content, and our parser should replace an entity with its definition (in the case of our example, replace the users node with /etc/passwd). If NOENT is disabled, it means we do expect to see our entity in our parsed content, and so we still see a reference to users in the output of our parser.

In conclusion: the NOENT flag does mean “no entities”, as in, “no references to entities should exist in our parsed XML.” This is why our parser replaces it with the contents of /etc/passwd. This naming convention leaves plenty of room for confusion, which is why fixing the names of parser flags is actually on the Nokogiri roadmap!

Start your journey towards writing better software, and watch this space for new content.

Bypass GitHub’s search API rate limit by 27% (with just five lines of code!)

GitHub’s search API is a powerful tool, but its search functionality is heavily rate limited compared to the rest of its API. GitHub’s general rate limit on its API is 5000 requests per hour (roughly 83 requests per minute), while the rate limit for search requests is documented at only 30 requests per minute. This can be restrictive in some use cases, but with just five lines of code, we can increase this limit to over 40 requests per minute!

(At this point, some readers may be concerned that “over 40” divided by “30” is not, in fact, an increase of 27%. Read on to find out the source of this discrepancy!)

To begin, let’s clarify those aforementioned rate limits – these are limits on requests that we’ve associated with an access token connected to our GitHub account, also known as authenticated requests. We can also query the GitHub API using unauthenticated requests (ie. without an access token), but at a much lower rate limit – GitHub only allows 10 unauthenticated search requests per minute.

However, GitHub tracks these authenticated and unauthenticated rate limits separately! This is by design, which I confirmed with GitHub via HackerOne prior to posting. To increase our effective rate limit, we can write our application code to combine our authenticated and unauthenticated API requests. Our application can make an authenticated request, and if that authenticated request fails due to rate limiting, we can retry that request again without authentication. This effectively increases our rate limit by 10 requests per minute.

Let’s illustrate with two separate code snippets –  the first using only authenticated requests, and the second using both authenticated and unauthenticated requests. In both of these snippets, we try to make 50 requests in parallel to the GitHub search API via Octokit’s search_repositories method.

In this first snippet, we expect to see 30 requests succeed (returning a Sawyer::Resource) and 20 fail (returning an Octokit error), given the documented rate limit.

Run it, and we see this output:

$ ruby authenticated_only.rb
36 requests succeeded, 14 requests failed

Oddly enough, GitHub does not appear to strictly adhere to its documented rate limit of 30 requests per minute, but our premise still holds – we can’t make all 50 requests due to GitHub’s rate limiting.

Now, let’s run the second snippet, which is five lines of code longer than our previous snippet. In this snippet, if a request using our authenticated client fails, we retry the same request using an unauthenticated client.

We see the following output:

$ ruby authenticated_and_unauthenticated.rb
46 requests succeeded, 4 requests failed

As predicted, we’ve successfully increased our rate limit from 36 to 46 requests per minute, a 27% increase from what we could achieve previously.

I really did expect to put the number 33% in this blog post’s title, not 27%. – it’s unclear to me why my authenticated client can make 36 successful requests, when the search API limit is documented at 30. I observed some variation on the output of this script too, ranging from 40 to 46 successful requests.

Going back to our performance gains – is this method effective for every application using the GitHub search API? No, probably not – 10 additional requests per minute is inconsequential in a large production application at scale. In that case, there are other techniques available to avoid hitting the GitHub search API rate limit. Some examples include caching your search results from the GitHub API, or rotating GitHub credentials to multiply your effective rate limit.

However, what if you’re using GitHub’s search API at a small scale? For example, you may be using the search API in a script that runs in your local development environment, or in some sort of internal tooling. In such a scenario, you may just be occasionally hitting the authenticated request limit, but haven’t reached a point where you need a more scalable solution. In that case, these five lines of code may give you a good “bang for your buck” in solving rate limiting issues.

Start your journey towards writing better software, and watch this space for new content.

XXEs in Golang are surprisingly hard

Go is one of my favourite programming languages, and I’ve enjoyed working with it while levelling up my skills in application security. Out of curiosity, I wanted to find out – how easily can we cause an XML External Entity (XXE) attack in a Golang application? As it turns out, doing so is surprisingly hard.

I won’t dive too deeply into how XXE attacks work, as there’s plenty of material available on that. As a quick recap, though:

  • As a language feature, XML allows us to define a key-value mapping called an entity, which our parser uses to do string substitution on instances of that entity within a document.
  • An external entity works similarly, but the parser will load content from an external URI (this includes filesystem contents and results of HTTP calls) when doing its string substitution.
  • A poorly configured XML parser will read and use an external entity definition from an untrusted input, allowing an attacker to gain access to sensitive data on the filesystem (such as /etc/passwd, or any other file the parser has permission to read).

With that, let’s begin!

Finding a working payload

Let’s start by finding a payload that we know works in another programming language’s XML parser. Take the simple Ruby script below (note that we need to explicitly enable external entities with Nokogiri::XML::ParseOptions::NOENT):

Run it and show the first ten lines of output:

$ ruby xxe.rb | head -n 10
<?xml version="1.0"?>
<!DOCTYPE root [
<!ENTITY users SYSTEM "file:///etc/passwd">
]>
<root>
    <child>##
# User Database
# 
# Note that this file is consulted directly only when the system is running
# in single-user mode.  At other times this information is provided by

Success! Within the <child> element, we see the header of my /etc/passwd file, so this is a working XXE payload. Let’s try using it in Go.

First attempt

Now that we’ve verified our payload in a Ruby script, let’s try doing the same thing in Go. We’ll use encoding/xml, Go’s built-in XML parsing library.

Run it, and we get… 

$ go run firstattempt.go
XML parsed into struct: {&users;}

Okay, so Decoder simply treated &users; as a string literal. That’s not what we want, but what if we tried setting Decoder.strict = true?

$ go run firstattempt.go
Error: XML syntax error on line 5: invalid character entity &users;
exit status 1

This is slightly better – we tried to parse &users; as an entity, but Decoder doesn’t recognize it as a valid entity. Why is that?

Writing to Decoder.Entity

If we peek into the docs of encoding/xml, we see that Decoder has a property called .Entity. It’s a map[string]string type that lets us define custom entities:

Let’s try setting .Entity. As an aside, this alone would make it difficult to carry out an XXE attack – a developer would have needed to explicitly set the .Entity map while coding their application – but, as we’ll see, there’s a lot more standing in an attacker’s way. We’ll modify our source code to write to .Entity:

Run our modified program:

$ go run entitymap.go 
{SYSTEM 'file:///etc/passwd'}

Okay, so Decoder did the substitution, but the file path is being treated as a string literal. Our parser isn’t fetching an external resource as we expect it to.

Reading encoding/xml’s source

The encoding/xml library is pretty small, so let’s dive into it! We’re able to find out pretty quickly while searching for entity and entities that encoding/xml doesn’t do much with our entity map. In fact, this is the only reference to it:

After this, we don’t see any calls to os.Open(), http.Get(), or anything else that would allow us to fetch an external resource. A simple string substitution is all that this library does with our .Entity map.

Confirmation via dtrace

Our source code tells us that we’re not opening /etc/passwd, but let’s double check this with dtrace! Well, I’ll be using dtruss, a similar tool for macOS. By viewing the system calls that our program makes, we should be able to tell if /etc/passwd is being read by our parser.

$ go build entitymap.go && sudo dtruss ./entitymap  2>&1 | grep passwd
XML parsed into struct: {SYSTEM 'file:///etc/passwd'}
write(0x1, "XML parsed into struct: {SYSTEM 'file:///etc/passwd'}\n\0", 0x36)            = 54 0

$ go build entitymap.go && sudo dtruss ./entitymap  2>&1 | grep open  
open("/dev/dtracehelper\0", 0x2, 0xFFFFFFFFEFBFF040)             = 3 0
open("/dev/urandom\0", 0x0, 0x0)                 = 3 0

Grepping for passwd, we just see that we write out our string literal from our entity map, and don’t actually open the file. Grepping again for open confirms this.

Conclusion

We can’t carry out an XXE attack on Golang applications using encoding/xml, since that library doesn’t handle external entities according to the XML language specification! The docs for encoding/xml describe it as “a simple XML 1.0 parser,” which sort of implies this, but I couldn’t find any docs that explicitly call out the lack of external entity processing.


It’s unclear to me whether the designers of Golang made this decision from an application security standpoint, or if they simply decided that it wouldn’t be worth the developer time to implement this XML language feature. While I find this design decision surprising, I do agree with it – the OWASP Top 10 recommends turning off external entity processing by default, and most XML documents don’t deal with external entities.


Of course, some Golang apps can still be vulnerable to XXE. You can easily find Go bindings for libxml2, which is a full-featured XML parser and has support for external entities. This is why XXEs in Golang are merely surprisingly hard, rather than impossible 🙂 But, by default, most developers will use the built-in encoding/xml library, which makes the entire ecosystem more secure.

Start your journey towards writing better software, and watch this space for new content.