[an error occurred while processing this directive]

Skip to Content

Computer Science Colloquia

Monday, April 16, 2012
Pieter Hooimeijer
Advisor: Wes Weimer
Attending Faculty: Dave Evans, chair; Michael Hill, minor representative; Sriram Rajamani, and Alf Weaver

10:00 AM, Rice Hall, Rm. 242

Ph.D. Dissertation Presentation
Automatically Describing Program Structure and Behavior


String-related defects are among the most prevalent and costly in modern software development. For example, in terms of frequency, cross-site scripting vulnerabilities have long surpassed traditional exploits like buffer overruns. The state of this problem is particularly disconcerting because it does not just affect legacy code: developing web applications today even when adhering to best practices and using modern library support remains error-prone.

A number of program analysis approaches aim to prevent or mitigate string-related defects; examples include static bug detectors and automated testcase generators. Traditionally, this work has relied on ad hoc built-in algorithms to reason about string-manipulating code. This arrangement is suboptimal for two reasons: first, it forces researchers to re-invent the wheel for each new analysis; and second, it does not encourage the independent improvement of domain-specific algorithms for handling strings.

In this dissertation, we present research on specialized decision algorithms for string constraints. Our high-level approach is to provide a constraint solving interface; a client analysis can use that interface to reason about strings in the same way it might use a Boolean satisfiability (SAT) solver to reason about binary state. To this end, we identify a set of string constraints that (a) captures common programming language constructs, and (b) permits efficient solving algorithms. We show that this constraint language is sufficiently expressive to model input constraints for known SQL injection vulnerabilities in a corpus of real-world PHP code. We provide a core solving algorithm together with a machine-checkable proof of its soundness and completeness relative to the semantics of the constraint language.

Next, we focus on performance. We evaluate a variety of datastructures and algorithms in a controlled setting to inform our choice of each. Our final approach is based on two insights: (1) string constraints can be cast as an explicit search problem, and (2) to solve these constraints, we can instantiate the search space lazily through incremental refinement. These insights lead to substantial performance gains relative to competing approaches; our experimental results show our prototype to be several of magnitude faster across several published benchmarks.

Program analysis research increasingly makes use of external constraint solving tools to perform tasks like bug detection, testcase generation, code optimization, and code synthesis. Existing constraint solvers exist for domains like bitvector logic and integer arithmetic. Our research represents a first step in providing similar support for strings.