Poking at Chinese Firmware
I recently spent some time reversing a pre-historic smart device as a research exercise. While poking at the firmware binaries I stumbled on a bug inside how the device parses the config XML file.
The vendor hasn’t issued a statement yet(shocker) so I will try to the best of my ability to recreate a similar vulnerable function without accidentally disclosing any real detail that can be used to identify the device.
Understanding XML: Structure Without Safety
XML (eXtensible Markup Language) is a markup language that uses a hierarchical structure of elements enclosed in angle brackets.
Think of XML as a structuring system where information gets organized into labeled elements that can hold both data and other elements.
1 | <device> |
XML elements can also contain attributes which provide additional metadata about the element, an attribute appears inside the opening tag and consists of a name-value pair:
1 | <device type="thermostat" model="v2.1">Smart Home Device</device> |
Nowadays XML is often compared and somewhat replaced by JSON or YAML which are considered lighter and simpler alternatives, but it remains a standard in government, healthcare, telecom, IoT, and finance systems where strict schemas and validation are critical.
However, XML’s verbose nature and complex parsing requirements create more opportunities for security vulnerabilities. XML parsers must handle opening and closing tags, attributes, namespaces, character encoding, and various formatting edge cases.
Learning from Critical Vulnerabilities
Two critical XML parsing vulnerabilities demonstrate just how dangerous these flaws can be in production systems.
CVE-2016-1834 affected libxml2 <2.9.4, a widely used XML parsing library. The vulnerability is a heap based buffer overflow in the xmlStrncat function which allowed attackers to execute remote code or cause a memory corruption based denial of service attack. The vulnerability was especially significant on Apple platforms, though unpatched Linux systems using libxml2 were also at risk.
CVE-2019-5063 affected OpenCV 4.1.0. It also involves a heap based buffer overflow in the XML parser, triggered when processing very long or unrecognized character entities in XML files and copying it into a fixed size buffer without proper bounds checking:
1 |
|
Config Parsing in IoT Devices
To see how XML buffer overflows can happen in IoT devices, let’s look at a simple configuration parser for device credentials. This example reflects the same type of flaws found in CVE-2016-1834 and CVE-2019-5063, but in the context of parsing default admin credentials from an XML file.
Many IoT devices keep default credentials and network settings in (ideally) encrypted XML files that are loaded during startup. A typical parser might look like this:
1 |
|
The full vulnerable parser can be found here.
With our parser compiled, we can inspect exactly what happens when an attacker supplies malicious input.
1 | pwndbg xml_demo |
Once inside we can just step over instructions until we hit the part where our trustworthy password is copied into memory.
1 | ─────────────────────────────────[ SOURCE (CODE) ]───────────────────────────────── |
At this point we’re right at the instruction that causes the overflow, so once we step over into the next instruction and inspect the memory at the struct.
1 | pwndbg> n |
Setting this memory view side by side with config.xml we are able to directly see the hex representation of the ASCII characters we’ve overflown the memory with:A=0x41*32, B=0x42*36, C=0x43*32, D=0x44*4
This example was a significant oversimplification of how IoT/smart home devices utilize XML for parsing credentials, but it gets the point across: blindly trusting XML and dumping it into fixed-size buffers is risky.
In real-world applications, it can crash devices, overwrite important memory, or even open the door to more serious exploits just as we’ve seen with the two CVEs covered and many more that have been discovered over the years.