Nuno Santos

Call for Master's Students 2020/21

[ Back | Home | Students | SecSys | Publications ]

Area 4: WEB SECURITY

Have you heard of WebAssembly? Well, this is the next big thing happening on the Web! Up until recently, web programming has been entirely dominated by JavaScript, especially at the client-side (i.e., web pages that run code on the browser). At the server-side, PhP and Java were the preferred languages for quite some time, but have lost some ground in favor of JavaScript, especially with the appearance of NodeJS (being incorporated into web frameworks such as MEAN stack). But things have changed: there's a new kid on the block.

WebAssembly: A new kid on the block

But what is WebAssembly? WebAssembly (aka Wasm) is a new binary format for web applications. It compiles to a binary representation that is very close to assembly -- not quite the similar, but very close. This means that we can write web applications that run very very fast, almost nearly at native speed, and perform very sophisticated 3D rendering and CPU intensive operations like gaming, data analysis, cryptographic operations, etc.: right on the browser! That is why eBay is using it for barcode scanners, Google is using it for speeding up Google Earth, etc. Some say it's the end of mobile apps as we know them: no need to install them anymore, all we need to do is visit a web page that contains the respective Wasm code and run it.

Another fantastic thing is that you don't write applications directly on WebAssembly, no, you compile from code written in high-level languages like C/C++ -- yes, you've heard me well -- Go, or Rust, and it will generate the Wasm binary. (Curious about it? Try the hello world.) Through a JavaScript API, web applications can have modules written in JavaScript, and invoke WebAssembly code for performance-sensitive tasks. This has an amazing benefit: all those fantastic open source projects written in C/C++ can be cross-compiled to WebAssembly. Just check out this list of projects! Which means that in a blink of an eye, thousands of desktop applications can easily be ported to run on the browser.

Vulnerabilities! What was old, is new again. Oh my...

And here is exactly where things get interesting from the security point of view! Remember all those typical vulnerabilities that you've learned in SIRS or SS, e.g., buffer overflows, string format errors, use-after-free, etc. etc.? Most of the code snippet examples that you've seen in classes were written in C/C++, and not by accident. The reason is that C/C++ is an unsafe language, which is why people have progressively adopted safer languages, e.g., Java, for writing robust software. Problem? If you're compiling from unsafe languages, then the vulnerabilities in the original code may transition to the generated Wasm binary. That's right. As a result, you start seeing interesting phenomena, e.g., cross-site scripting attacks by explointing a buffer overflow inside the Wasm module of a web page! Isn't it ironic? :) And so, the big picture looks something like this:

Before, a web developer had to be concerned about security vulnerabilities in JavaScript; now, he also needs to worry about WebAssembly coding flaws. In other words, the attack surface has expanded. Besides, given that the performance and portability benefits are so good, people have started porting it to run also on servers, think NodeJS, and even for writing smart contracts (Ethereum, although in a slightly different flavor). Which means that the attack surface has expanded on the server-side too.

And what's new is old again: the same old kid in the block -- the bad guy

Yes, that right, we improve performance, and portability, but the security hasn't improved that much. On the contrary. Do you think it's just a matter of buggy code written in C/C++ languages that will eventually disappear? Nope! New yet-to-be-discovered vulnerabilities will surface for sure. Given that Wasm code interacts with the JavaScript runtime, there's a lot of opportunities for things going bad, real bad. Naturally, hackers are constantly in the lookout for things to go bad.

And this is only part of the story, the part of flawed but benign applications. But what about malware? Yes, those of you with sharp hacker instinct might have guessed it by now: Wasm is close to native code, so... it is perfect for... obfuscating malware! And for doing great (not-so-legal) compute intensive things like, yes, cryptojacking! Naturally, there's much more malware designers can do with one more tool (WebAssembly) in their arsenal, e.g., writting web keyloggers that get passed through antivirus / intrusion detectors / vulnerability scanners, etc. e.g. without a single dent.

A whole new exciting area to work on: WebAssembly security

So, this is where we're currently: starting a new line of research on WebAssembly security.

Nobody in the research community has looked into this yet, so there's a lot of potential for us (and you) to make real difference and lead the way. Ultimately our goal will be to develop new techniques and tools to detect and fix security vulnerabilities in Wasm code. And also, because it's a lot of fun, we will ultimately try to generate exploits, automatically. You press a button, and the tool will tell you not only that there's a vulnerability in the code, but will also generate a synthetic example that demonstrates that that vulnerability can be exploited by an attacker, which means it's critical. As a result, it's better off that the software manufacturer fixes it otherwise... you know the story.

Given that Wasm code doesn't run by itself but it is launched through JavaScript, we will need to be able to detect vulnerabilities in JavaScript, and vulnerabilities that result from the interaction between Wasm code and Javascript code. If we manage to do this first for the client-side, we can also apply the same techniques to the server side, and vice-versa. We hit two birds with one stone.

Last but not least, we cannot detect or fix a problem if we don't understand the problem. What problem? The extent of the vulnerabilities in web applications that use WebAssemble, for instance. We don't even know the exact topology of these vulnerabilities in real Wasm assembly code. This is because security flaws are typically highly dependent on the programming language where the program has been written. Mind you: nobody understands any of these things as of yet. Therefore, WebAssembly security is really an exciting area: everything is yet to be done.

How do we plan to get there?

Well, this is not to be done in a single master's thesis :) In fact there's a lot of master's and PhD's theses in the making here. We have to go one step at a time. How?

Our plan of attack is twofold. First, in order to study and characterize the prevalence and typology of vulnerabilities, we plan to build a web crawler for collecting a large bulk of Wasm code samples; this will give us the humus for our analysis and help us to learn from what's out there in the wild. Second, to detect new vulnerabilities we need to develop a set of new techniques and tools for analyzing WebAssembly. We will be focusing on two: pattern-based static analysis and symbolic execution. For those who have attended SS, you know what I'm talking about. If you didn't attend SS, you can ask me more about it.

This is of course a monumental task that will keep us drooling and sweating for the next couple of years. That's why we are pairing up with some great researchers. Along with my students, I have the pleasure of embarking in this endeavour with two fantastic faculty: José Fragoso Santos, and Pedro Adão.

Currently, we have three students who are paving the way: one PhD student (Tiago Brito), and two masters students (Carolina Costa and Pedro Lopes). Tiago is focused on standalone JavaScript, and Carolina and Pedro on standalone WebAssembly. Carolina and Pedro are performing exploratory work on the two key areas I've mention above: symbolic execution (Carolina), and pattern-based static analysis (Pedro). They're doing great job and their results are really encouraging! And so we want to continue their work.

Master's thesis topics

Which brings us to our last subject of this document. There are three thesis topics that we will propose for 2020/21 in this context:

WS1:This thesis focuses on building a symbolic execution engine.
WS2:This topic is about using symbolic execution for exploit generation.
WS3:This proposal will deal with symbolic runtime calls for exploit generation.

Since they share the common goal of designing tools for WebAssembly vulnerability detection, these theses will be realized in close collaboration with each other, and will be jointly supervised by José Fragoso Santos, Pedro Adão, and me. Our PhD students will also work closely with us, namely Tiago Brito, Nuno Sabino, and Pedro Lopes. We describe each topic in more detail in the main page. If your are interested, go have a look.

[ Back ]