As the Internet has grown in popularity, security vulnerability detecting and testing are undoubtedly becoming crucial parts for commercial software, especially for web service applications. Vulnerability scanners, both commercial and open-source (i.e., SAINT, eEye, Nessus, etc.), were developed to achieve this goal. However, the absence of a well defined assessment benchmark makes the efficient evaluation of these scanners nearly impossible. With ongoing researches on new vulnerability scanners, the demand for such an assessment benchmark is urgent. We are working on developing VULCANA, a set of open-source web service applications with systematically injected vulnerabilities. The idea is that different vulnerability scanners can be used to scan the benchmark, and the percentage of detected vulnerabilities together with the resource consumption are used to provide reasonable evaluation.
In Spring 2009, we developed a prototype framework called Baseline, which is described in our tech report. The idea of BaseLine is that we tried to coach the users to pick the right Web Vulnerability Scanner by letting them set up a baseline for potential qualified scanners. We can then test the scanner with the baseline, revealing its effectiveness and efficiency in detecting the user’s most “care-about” vulnerabilities.
Brief Introduction of Baseline
Most of existing benchmarks use the scanners to scan a manually crafted website with a number of known vulnerabilities, and rate the scanners based on the percentage of successful detection. These benchmarks are only capable of judging which scanner is better in the matter of how well the scanners can detect the fixed set of vulnerabilities the benchmarks picked with static selection criteria. They suffer from drawbacks by neglecting the critical questions: Does the benchmark properly reflect the user’s security requirements; does it reflect the user’s actual deployment environment? In helping the users choose the right scanners, answering these questions is as crucial as evaluating the effectiveness and efficiency of the scanners. In this paper, we propose an approach called Baseline that addresses all of these problems: We implement a ranking system for dynamically generating the most suitable selection of weaknesses based on the user’s needs, which serves as the baseline that a qualified scanner should reach/detect. Then we pair the ranking system with a testing framework for generating test suites according to the selection of weaknesses. This framework maps a weakness into an FSM (Finite State Machine) with multiple end states that represent different types/mutations of exploitations of the weakness and each transition from state to state determined by scanner behavior, the framework then combines the FSMs of the selected weaknesses into a mimicked vulnerable website. When a scanner scans the “vulnerable” website, the transitions between the states are recorded and thus we are able to evaluate the scanner by looking at which end states were visited (effectiveness), in how much time, and over how many transitions(efficiency).
Currently we are looking at methods of measuring assorted aspects of the web vulnerability scanners. Specifically, the ability of bypassing client-side validation, the crawling coverage and the capability of scanning auto-generated pages.
Open research questions include:
- Currently, Baseline framework uses Regular expression to determine the transition between two states. Can we extend Baseline with more sophisticated validation methods?
- Client-side validation seems to be neglected by most (if not all) existing scanners. Are there any drawbacks for the scanners to omit them?
- There are no existing web vulnerabilty repository, can we create one?
Prof. Gail Kaiser, kaiser [at] cs.columbia.edu
Huning Dai, hdd2210 [at] columbia.edu
Shreemanth Hosahalli, sh2959 [at] columbia.edu