Eclipse Steady

It’s commonly accepted that using open source components with known vulnerabilities is a widespread problem that imposes severe security risks to end users. In the last few years, an increasing number of developers and researchers have addressed the problem from different angles.

We were triggered to work on the security of open source supply chains after reading a whitepaper entitled “The Unfortunate Reality of Insecure Libraries” [18], which was first authored by Jeff Williams and Arshan Dabirsiaghi in 2012.

Fast-forward to 2019: My colleagues and I developed a code-centric approach to address this problem [11]. After releasing the solution as open source code in 2018, the tool is now available at the Eclipse Foundation as Eclipse Steady.

Open Source Vulnerabilities Are a Significant Issue

Open source components with known vulnerabilities have been the root cause of many data breaches [19], including the infamous Equifax data breach in 2017. Accordingly, the last two editions of the Open Web Application Security Project (OWASP) top 10 most critical security risks for Web applications, published in 2013 and 2017, highlight the widespread prevalence of this security issue [1, 2].

Since 2013, detecting this issue has become easier thanks to great open source solutions such as OWASP Dependency Check [3] and Retire.js [4], as well as numerous commercial solutions, many of which can be used to check open source projects for free. All of these solutions compete in the software composition analysis (SCA) market [5], which has grown significantly over the past few years due to developers’ concerns about license compliance and security.

There Are Still Problems to Resolve

While it sounds like the problem has been resolved and we can move on, that’s not the case for two main reasons.

The first reason is technical in nature and relates to the reachability of vulnerable code in a given application context. Many tools rely on different kinds of metadata, such as Maven artifact identifiers, to detect that an application depends on a vulnerable component version. However, the tools cannot determine whether vulnerable code can be executed in the context of the application being analyzed, but this is necessary when assessing whether the vulnerability can be exploited by an attacker. Components with vulnerable code that can never be executed do not require an update, which can save considerable testing effort by software developers and users.

The second reason is the belief that the entire problem must be addressed by open source solutions such as OWASP Dependency Check, Retire.js, or Eclipse Steady. These solutions enable widespread tool adoption by open source and commercial software developers, and remain independent of vendor-specific interests and infrastructures. Similar to vaccines, only very broad tool adoption can ensure that open source ecosystems are healthy and trusted.

Eclipse Steady Is a Code-Centric Approach to Open Source Vulnerabilities

Work on Eclipse Steady started at SAP Security Research in 2014 with the development of a code-centric approach to open source vulnerabilities, which boils down to identifying single vulnerable open source methods [10, 11]. This fine-grain approach allows for the application and combination of all kinds of static and dynamic program analyses, which is out of scope for coarse-grain approaches that only map component versions to vulnerability identifiers.

What started as a research prototype has matured into an industry-grade tool that is used to scan all Java applications developed at SAP. More than 1,500 distinct projects have been scanned since 2017, and there are more than 150,000 individual scans each month. The tool has been open source since 2018 under the name “vulnerability assessment tool” [6] and is now in the process of being moved to the Eclipse Foundation as the Eclipse Steady project [7].

Note: At this time, the source code is still located in the SAP GitHub repository. However, it will soon be available in the Eclipse GitHub repository.

The Code-Centric Approach Is Crucial

The technical approach relies on identifying source code with a given vulnerability through automated analysis of so-called “fix commits.”

For example, the vulnerability CVE-2018-1000632 in Dom4j was fixed by commit e598e in its source code repository [8, 9]. As part of the commit, the method org.dom4j.tree.QNameCache.get(String,String) was modified and the method org.dom4j.QName.validateName(String) was added.

The signature of vulnerable source code constructs, such as methods, as well as the abstract syntax trees of the vulnerable and fixed versions, are stored in a PostgreSQL database, and can be consulted using a dedicated Web frontend.

This information is then used to detect vulnerable code in application dependencies and for static and dynamic reachability analyses.

Using this information, the detection of open source vulnerabilities consists of identifying the signature of vulnerable methods in application dependencies, such as Java archives (JARs), and comparing whether the respective method body is equal (closer) to the fixed method body or to the vulnerable method body (obtained from the fix commit). This approach makes detection very precise and robust against the re-bundling of Java classes, which is a very common technique in Java.

If a given vulnerable method is found in an application dependency, Eclipse Steady can perform static and dynamic analyses, alone or in combination. Figure 1 provides an overview of findings for a sample application. Here, the red exclamation marks indicate that vulnerable code is present, while the red footprints indicate the vulnerable code is potentially reachable (according to static analysis) or has been executed (according to dynamic analysis).

Figure 1: Overview analysis results

The static analysis uses the open source tools Wala [12] or Soot [13] to build a call graph that starts at the application methods and checks for vulnerable methods. If any are found, the analysis concludes there is an execution path from an application method to a vulnerable method.

For example, Figure 2 shows that the vulnerable constructor Namespace(String,String) is reachable from the application methods processRequest, doPost, and main (highlighted in green).

Figure 2: Execution path from application method to vulnerable method

The dynamic analysis uses a dedicated Java agent to instrument all methods so the execution of vulnerable methods can be detected. This analysis can be completed during execution of Junit and integration tests and with any standalone Java Virtual Machine (JVM).

Static and dynamic analyses can also be combined to overcome the weaknesses in each approach — the use of reflection in static analysis, and the limited test coverage in dynamic analysis. When they are combined, all methods executed from the application and its dependencies during tests are used as entry points for call graph construction. Experiments have confirmed that combining the two techniques results in a 7.9 percent increase in evidence that vulnerable code is potentially executable [11].

The results from static and dynamic analyses are also used to compute update metrics that help developers choose the best alternative when updating a vulnerable version to a non-vulnerable version.

For example, the metrics consider whether the component API used by the application changes and, and a result, whether an upgrade would result in compile exceptions. If there are no direct API calls from application methods to open source methods (so-called touchpoints), which is typically the case with transitive dependencies, the metrics consider the stability of methods between the version in use and the respective non-vulnerable alternative.

In Figure 3, 276 of 288 methods have an identical method signature and body in version 3.17 of Maven artifact org.apache.poi:poi-ooxml.

Figure 3: Touchpoints and update metrics

The Pros and Cons of a Code-Centric Approach

The immediate advantage of a code-centric approach is precision in detecting vulnerable code, no matter which archive or artifact contains it. In addition, it’s possible to apply a variety of software analysis techniques to determine, for example, the reachability of vulnerable code. Future extensions of Eclipse Steady could go even further, perhaps moving toward slicing (reducing) dependencies to the share of code that a given application uses.

However, we won’t hide the fact that a code-centric approach also comes with a cost for maintainers and users.

First, fix commits are not readily available for all vulnerabilities. They are sometimes referenced by Common Vulnerabilities and Exposures (CVE) entries, as is the case with CVE-2018-1000632 [9]. In many other cases, they must be collected by manually searching through issue trackers and commit histories, which can be tedious and inefficient. So far, we have collected about 1,300 fix commits, and have made them available in a dedicated repository [14].

To foster development of code-centric tools for vulnerability management independent of Eclipse Steady, we strongly recommend that open source projects mention fix commit(s) in public security advisories and communicate fix commits to the National Vulnerability Database (NVD) or to MITRE Corporation for CVE entries. In other words: Community, tell the world about your fix commits!

Second, the current implementation of Eclipse Steady requires fix commits to be analyzed once by each development organization that wants to use Steady. So far, due to license concerns, we refrained from sharing the results of these analyses, which are basically the project source code, in the repository [14].

Third, the focus on code requires Eclipse Steady to dig deep into the specifics of different programming languages. So far, we have only developed the full breadth of analyses for Java. Python support is limited to detection of vulnerable code, and static and dynamic analyses are not yet supported.

In comparison to Eclipse Steady, open source vulnerability scanners that rely on metadata can be extended more easily toward different languages. The OWASP Dependency Check, for example, fully supports Java and .NET, offers experimental support for Ruby, Node.js, and Python, and offers limited support for C/C++ build systems [4]. Being language-agnostic is particularly useful in development projects that mix different programming languages because it avoids the need for different tools.

Getting Started With Eclipse Steady

Today, using Eclipse Steady requires running several Docker containers to persist vulnerability information and analysis results. This is facilitated through the provision of Docker images on Docker Hub [15] as well as Docker Compose files and Helm charts.

Next, users must analyze fix commits to populate the local PostgreSQL database with detailed information about vulnerable methods (signatures and abstract syntax trees). This is typically done for fix commits of open source projects, such as the ones shared through the SAP repository. However, it can also be done for fix commits of proprietary software projects maintained in private source code repositories.

Once these steps are complete, Java applications can be scanned using, for example, the plugins for Maven and Gradle. Necessary configuration parameters include the URL of the backend service as well as the token of a so-called workspace, which serves as a container for scan results.

Assuming the template Maven profile [17] has been included in the application’s pom.xml file, detection of vulnerable code can be triggered using the following Maven command:

mvn -Dvulas compile vulas:app

The static analysis starting from application code can be triggered as follows:

mvn -Dvulas compile vulas:a2c

More information about the various plugin goals can be found in the comprehensive user manual available through the SAP GitHub [16].

Looking Ahead

If you’re asking yourself whether Eclipse Steady is ready for production, the clear answer is yes because SAP has been successfully running the code for almost three years. However, the effort required to continuously operate Steady in a private cloud, provide user support, and maintain the vulnerability database keeps two engineers busy full time. Even if we assume it will become easier to identify fix commits in the future, the time and effort required exceeds the capacity of individual software developers and small development organizations.

It is understood that these resource requirements inhibit tool adoption, so future developments must try to lower the barrier.

First, we aim to improve management and synchronization of fix commits. It should be as easy as possible for open source project maintainers to contribute new fix commits to repositories [14]. For users of Eclipse Steady, the local synchronization and analysis must happen in a completely automated fashion.

Second, we aim to develop a version that makes the presence of an always-on central component optional. In other words, users will be able to scan their applications without the need to operate Docker containers. At the same time, bigger software organizations will be able to run Steady with a central backend. This approach gives these organizations several interesting features, including trend analyses and the potential to find all applications affected by a given vulnerability.

Third, once the signature of a vulnerable method is found in a Java archive, Steady compares its method body (Java bytecode) with the method bodies obtained from the source code repository of the respective open source project (Java source code). Today, Eclipse Steady requires running a periodic batch job that uses different strategies to perform this comparison [11]. The current implementation, however, does not always find an answer and manual intervention is required. To overcome this problem, we aim to develop a better bytecode to source code comparison, possibly using intermediate code representations such as Soot Jimple [13].