cipherdyne.org

Michael Rash, Security Researcher



fwknop    [Summary View]

« Previous | Next »

Integrating fwknop with the 'American Fuzzy Lop' Fuzzer

Over the past few months, the American Fuzzy Lop (AFL) fuzzer written by Michal Zalewski has become a tour de force in the security field. Many interesting bugs have been found, including a late breaking bug in the venerable cpio utility that Michal announced to the full-disclosure list. It is clear that AFL is succeeding perhaps where other fuzzers have either failed or not been applied, and this is an important development for anyone trying to maintain a secure code base. That is, the ease of use coupled with powerful fuzzing strategies offered by AFL mean that open source projects should integrate it directly into their testing systems. This has been done for the fwknop project with some basic scripting and one patch to the fwknop code base. The patch was necessary because according to AFL documentation, projects that leverage things like encryption and cryptographic signatures are not well suited to brute force fuzzing coverage, and fwknop definitely fits into this category. Crypto being a fuzzing barrier is not unique to AFL - other fuzzers have the same problem. So, similarly to the libpng-nocrc.patch included in the AFL sources, encryption, digests, and base64 encoding are bypassed when fwknop is compiled with --enable-afl-fuzzing on the configure command line. One doesn't need to apply the patch manually - it is built directly into fwknop as of the 2.6.4 release. This is in support of a major goal for the fwknop project which is comprehensive testing.

If you maintain an open source project that involves crypto and does any sort of encoding/decoding/parsing gated by crypto operations, then you should patch your project so that it natively supports fuzzing with AFL through an optional compile time step. As an example, here is a portion of the patch to fwknop that disables base64 encoding by just copying data manually:
diff --git a/lib/base64.c b/lib/base64.c
index b8423c2..5b51121 100644
--- a/lib/base64.c
+++ b/lib/base64.c
@@ -36,6 +36,7 @@
 int
 b64_decode(const char *in, unsigned char *out)
 {
-    int i, v;
+    int i;
     unsigned char *dst = out;
-
+#if ! AFL_FUZZING
+    int v;
+#endif
+
+#if AFL_FUZZING
+    /* short circuit base64 decoding in AFL fuzzing mode - just copy
+     * data as-is.
+    */
+    for (i = 0; in[i]; i++)
+        *dst++ = in[i];
+#else
     v = 0;
     for (i = 0; in[i] && in[i] != '='; i++) {
         unsigned int index= in[i]-43;
@@ -68,6 +80,7 @@ b64_decode(const char *in, unsigned char *out)
         if (i & 3)
             *dst++ = v >> (6 - 2 * (i & 3));
     }
+#endif

     *dst = '\0';
Fortunately, so far all fuzzing runs with AFL have turned up zero issues (crashes or hangs) with fwknop, but the testing continues.

Within fwknop, a series of wrapper scripts are used to fuzz the following four areas with AFL. These areas represent the major places within fwknop where data is consumed either via a configuration file or from over the network in the form of an SPA packet:

  1. SPA packet encoding/decoding: spa-pkts.sh
  2. Server access.conf parsing: server-access.sh
  3. Server fwknopd.conf parsing: server-conf.sh
  4. Client fwknoprc file parsing: client-rc.sh
Each wrapper script makes sure fwknop is compiled with AFL support, executes afl-fuzz with the corresponding test cases necessary for good coverage, archives previous fuzzing output data, and supports resuming of previously stopped fuzzing jobs. Here is an example where the SPA packet encoding/decoding routines are fuzzed with AFL after fwknop is compiled with AFL support:
$ cd fwknop.git/test/afl/
$ ./compile/afl-compile.sh
$ ./fuzzing-wrappers/spa-pkts.sh
+ ./fuzzing-wrappers/helpers/fwknopd-stdin-test.sh
+ SPA_PKT=1716411011200157:root:1397329899:2.0.1:1:127.0.0.2,tcp/22:AAAAA
+ LD_LIBRARY_PATH=../../lib/.libs ../../server/.libs/fwknopd -c ../conf/default_fwknopd.conf -a ../conf/default_access.conf -A -f -t
Warning: REQUIRE_SOURCE_ADDRESS not enabled for access stanza source: 'ANY'+ echo -n 1716411011200157:root:1397329899:2.0.1:1:127.0.0.2,tcp/22:AAAAA

SPA Field Values:
=================
   Random Value: 1716411011200157
       Username: root
      Timestamp: 1397329899
    FKO Version: 2.0.1
   Message Type: 1 (Access msg)
 Message String: 127.0.0.2,tcp/22
     Nat Access: <NULL>
    Server Auth: <NULL>
 Client Timeout: 0
    Digest Type: 3 (SHA256)
      HMAC Type: 0 (None)
Encryption Type: 1 (Rijndael)
Encryption Mode: 2 (CBC)
   Encoded Data: 1716411011200157:root:1397329899:2.0.1:1:127.0.0.2,tcp/22
SPA Data Digest: AAAAA
           HMAC: <NULL>
 Final SPA Data: 200157:root:1397329899:2.0.1:1:127.0.0.2,tcp/22:AAAAA

SPA packet decode: Success
+ exit 0
+ LD_LIBRARY_PATH=../../lib/.libs afl-fuzz -T fwknopd,SPA,encode/decode,00ffe19 -t 1000 -i test-cases/spa-pkts -o fuzzing-output/spa-pkts.out ../../server/.libs/fwknopd -c ../conf/default_fwknopd.conf -a ../conf/default_access.conf -A -f -t
afl-fuzz 0.66b (Nov 23 2014 22:29:48) by <lcamtuf@google.com>
[+] You have 1 CPU cores and 2 runnable tasks (utilization: 200%).
[*] Checking core_pattern...
[*] Setting up output directories...
[+] Output directory exists but deemed OK to reuse.
[*] Deleting old session data...
[+] Output dir cleanup successful.
[*] Scanning 'test-cases/spa-pkts'...
[*] Creating hard links for all input files...
[*] Validating target binary...
[*] Attempting dry run with 'id:000000,orig:spa.start'...
[*] Spinning up the fork server...
[+] All right - fork server is up.
And now AFL is up and running (note the git --abbrev-commit tag integrated into the text banner to make clear which code is being fuzzed): AFL SPA packets fuzzing If the fuzzing run is stopped by hitting Ctrl-C, it can always be resumed as follows:
$ ./fuzzing-wrappers/spa-pkts.sh resume
Although major areas in fwknop where data is consumed are effectively fuzzed with AFL, there is always room for improvement. With the wrapper scripts in place, it is easy to add new support for fuzzing other functionality in fwknop. For example, the digest cache file (used by fwknopd to track SHA-256 digests of previously seen SPA packets) is a good candidate.

UPDATE (11/30/2014): The suggestion above about fuzzing the digest cache file proved to be fruitful, and AFL discovered a bounds checking bug that has been fixed in this commit. The next release of fwknop (2.6.5) will contain this fix and will be made soon.

Taking things to the next level, another powerful technique that would make an interesting side project would be to build a syntax-aware version of AFL that handles SPA packets and/or configuration files natively. The AFL documentation hints at this type of modification, and states that this could be done by altering the fuzz_one() routine in afl-fuzz.c. There is already a fuzzer written in python for the fwknop project that is syntax-aware of the SPA data format (see: spa_fuzzing.py), but the mutators within AFL are undoubtedly much better than in spa_fuzzing.py. Hence, modifying AFL directly would be an effective strategy.

Please feel free to open an issue against fwknop in github if you have a suggestion for enhanced integration with AFL. Most importantly, if you find a bug please let me know. Happy fuzzing!

Software Release: fwknop-2.6.4

fwknop-2.6.4 software release The 2.6.4 release of fwknop is available for download. New functionality has been developed for 2.6.4, including a new UDP listener mode to remove libpcap as a dependency for fwknopd, support for firewalld on recent versions of Fedora, RHEL, and Centos (contributed by Gerry Reno), and support for Michal Zalewski's 'American Fuzzy Lop' fuzzer. Further, on systems where execvpe() is available, all system() and popen() calls have been replaced so that the shell is not invoked and no environment is used. As usual, fwknop has a Coverity Scan score of zero, and the code coverage report achieved by the 2.6.4 test suite is available here.

Here is the complete ChangeLog for fwknop-2.6.4:

  • [server] Added a UDP server mode so that SPA packets can be acquired via UDP directly without having to use libpcap. This is an optional feature since it opens a UDP port (and therefore requires the local firewall be opened for communications to this port), but fwknopd is careful to never send anything back to a client that sends data to this port. So, from the perspective of an attacker or scanner, fwknopd remains invisible. This feature is enabled in fwknopd either with a new command line argument --udp-server or in the fwknopd.conf file with the ENABLE_UDP_SERVER variable. When deployed in this mode, it is advisable to recompile fwknop beforehand with './configure --enable-udp-server' so that fwknopd does not link against libpcap.
  • [server] Replaced all popen() and system() calls with execvpe() with no usage of the environment. This is a defensive measure to not make use of the shell for firewall command execution, and is supported on systems where execvpe() is available.
  • (Gerry Reno) Added support for firewalld to the fwknopd daemon on RHEL 7 and CentOS 7. This is implemented using the current firewalld '--direct --passthrough' capability which accepts raw iptables commands. More information on firewalld can be found here: https://fedoraproject.org/wiki/FirewallD
  • [server] Added support for the 'American Fuzzy Lop' (AFL) fuzzer from Michal Zalewski. This requires that fwknop is compiled with the '--enable-afl-fuzzing' argument to the configure script as this allows encryption/digest short circuiting in a manner necessary for AFL to function properly. The benefit of this strategy is that AFL can fuzz the SPA packet decoding routines implemented by libfko. See the test/afl/ directory for some automation around AFL fuzzing.
  • (Bill Stubbs) submitted a patch to fix a bug where fwknopd could not handle Ethernet frames that include the Frame Check Sequence (FCS) header. This header is four bytes long, and is placed at the end of each Ethernet frame. Normally the FCS header is not visible to libpcap, but some card/driver combinations result in it being included. Bill noticed this on the following platform: BeagleBone Black rev C running 3.8.13-bone50 #1 SMP Tue May 13 13:24:52 UTC 2014 armv7l GNU/Linux
  • [client] Bug fix to ensure that a User-Agent string can be specified when the fwknop client uses wget via SSL to resolve the external IP address. This closes issue #134 on github reported by Barry Allard. The fwknop client now uses the wget '-U' option to specify the User-Agent string with a default of "Fwknop/<version>". In addition, a new command line argument "--use-wget-user-agent" to allow the default wget User-Agent string to apply instead.
  • [python module] When an HMAC key is passed to spa_data_final() then default to HMAC SHA256 if no HMAC mode was specified.

Code Coverage: Challenges For Open Source Projects

In this blog post I'll pose a few challenges to open source projects. These challenges, among other goals, are designed to push automated test suites to maximize code coverage. For motivation, I'll present code coverage stats from the OpenSSL and OpenSSH test suites, and contrast this with what I'm trying to achieve in the fwknop project.

Stated bluntly, if comprehensive code coverage is not achieved by your test suite, you're doing it wrong.

With major bugs like Heartbleed and "goto fail", there is clearly a renewed need for better automated testing. As Mike Bland demonstrated, the problem isn't so much about technology since unit tests can be built for both Heartbleed and "goto fail". Rather, the problem is that the whole software engineering life cycle needs to embrace better testing.

1) Publish your code coverage stats

Just as open source projects publish source code, test suite code coverage results should be published too. The mere act of publishing code coverage - just as open source itself - is a way to engage developers and users alike to improve such results. Clearly displayed should be all branches, lines, and functions that are exercised by the test suite, and these results should be published for every software release. Without such stats, how can users have confidence that the project test suite is comprehensive? Bugs still crop up even with good code coverage, but many many bugs will be squashed during the development of a test suite that strives to exercise every line of source code.

For a project written in C and compiled with gcc, the lcov tool provides a nice front-end for gcov coverage data. lcov was used to generate the following code coverage reports on OpenSSL-1.0.1h, OpenSSH-6.6p1, and fwknop-2.6.3 using the respective test suite in each project:

Hit Total Coverage
OpenSSL-1.0.1h Lines: 44604 96783 46.1 %
Full code coverage report Functions: 3518 6574 53.5 %
Branches: 22653 67181 33.7 %

OpenSSH-6.6p1 Lines: 18241 33466 54.5 %
Full code coverage report Functions: 1217 1752 69.5 %
Branches: 9362 22674 41.3 %

fwknop-2.6.3 Lines: 6971 7748 90.0 %
Full code coverage report Functions: 376 377 99.7 %
Branches: 4176 5239 79.7 %


The numbers speak for themselves. To be fair, fwknop has a much smaller code base than OpenSSL or OpenSSH - less than 1/10th the size of OpenSSL and about 1/5th the size of OpenSSH in terms of lines reported by gcov. So, presumaby it's a lot easier to reach higher levels of code coverage in fwknop than the other two projects. Nevertheless, it is shocking that the line coverage in the OpenSSL test suite is below 50%, and not much better in OpenSSH. What are the odds that the other half of the code is bug free? What are the odds that changes in newer versions won't break assumptions made in the untested code? What are the odds that one of the next security vulnerabilities announced in either project stems from this code? Of course, test suites are not a panacea, but there are almost certainly bugs lying in wait within the untested code. It is easier to have confidence in code that has at least some test suite coverage than code that has zero coverage - at least as far as the test suite is concerned (I'm not saying that other tools are not being used). Both OpenSSL and OpenSSH use tools beyond their respective test suites to try and maintain code quality - OpenSSL uses Coverity for example - but these tools are not integrated with test suite results and do not contribute to the code coverage stats above. What I'm suggesting is that the test suites themselves should get much closer to 100% coverage, and this may require the integration of infrastructure like a custom fuzzer or fault injection library (more on this below).

On another note, given that an explicit design goal of the fwknop test suite is to maximize code coverage, and given the results above, it is clear there is significant work left to do.

2) Make it easy to automatically produce code coverage results

Neither OpenSSL nor OpenSSH make it easy to automatically generate code coverage stats like those shown above. One can always Google for the appropriate CFLAGS and LDFLAGS settings, recompile, and run lcov, but you shouldn't have to. This should be automatic and built into the test suite as an option. If your project is using autoconf, then there should be a top level --enable-code-coverage switch (or similar) to the configure script, and the test suite should take the next steps to produce the code coverage reports. Without this, there is unnecessary complexity and manual work, and this affects users and developers alike. My guess is this lack of automation is a factor for why code coverage for OpenSSL and OpenSSH is not better. Of course, it takes a lot of effort to develop a test suite with comprehensive code coverage support, but automation is low hanging fruit.

If you want to generate the code coverage reports above, here are two trivial scripts - one for OpenSSH and another for OpenSSL. This one works for OpenSSH:
#!/bin/sh
#
# Basic script to compile OpenSSH with code coverage support via gcc
# gcov and build HTML reports with lcov.
#

LCOV_DIR=lcov-results
LCOV_FILE=coverage.info
LCOV_FILTERED=coverage_final.info
PREFIX=~/install/openssh  ### tmp path

mkdir $LCOV_DIR

make clean
./configure --with-cflags="-fprofile-arcs -ftest-coverage" --with-ldflags="-fprofile-arcs -lgcov" --prefix=$PREFIX
make
make tests

### build coverage info and filter /usr/include/ files
lcov --capture --directory . --output-file $LCOV_FILE
lcov -r $LCOV_FILE /usr/include/\* --output-file $LCOV_DIR/$LCOV_FILTERED

### create the HTML report
genhtml $LCOV_DIR/$LCOV_FILTERED --output-directory $LCOV_DIR

exit

3) Integrate a fuzzer into your test suite

If your test suite does not achieve 99%/95% function/line coverage, architectural changes should be made to reach these goals and beyond. This will likely require that test suite drive a fuzzer against your project and measure how well it exercises the code base.

Looking at code coverage results for older versions of the fwknop project was an eye opener. Although the test suite had hundreds of tests, there were large sections of code that were not exercised. It was for this reason the 2.6.3 release concentrated on more comprehensive automated test coverage. However, achieving better coverage was not a simple matter of executing fwknop components with different configuration files or command line arguments - it required the development of a dedicated SPA packet fuzzer along with a special macro -DENABLE_FUZZING built into the source code to allow the fuzzer to reach portions of code that would have otherwise been more difficult to trigger due to encryption and authentication requirements. This is a similar to the strategy proposed in Michal Zalewski's fuzzer American Fuzzy Lop here (see the "Known Limitations" section).

The main point is that fwknop was changed to support fuzzing driven by the test suite as a way to extend code coverage. It is the strong integration of fuzzing into the test suite that provides a powerful testing technique, and looking at code coverage results allows you to measure it.

Incidentally, an example double free() bug that the fwknop packet fuzzer triggered in conjunction with the test suite can be found here (fixed in 2.6.3).

4) Further extend your code coverage with fault injection

Any C project leveraging libc functions should implement error checking against function return values. The canonical example is checking to see whether malloc() returned NULL, and if so this is usually treated as an unrecoverable error like so:
char *buf = NULL;
int size = 100;

if ((buf = malloc(size)) == NULL) {
    clean_up();
    exit(EXIT_FAILURE);
}
Some projects elect to write a "safe_malloc()" wrapper for malloc() or other libc functions so that error handling can be done in one place, but it is not feasible to do this for every libc function. So, how to verify whether error conditions are properly handled at run time? For malloc(), NULL is typically returned under extremely high memory pressure, so it is hard to trigger this condition and still have a functioning system let alone a functioning test suite. In other words, in the example above, how can the test suite achieve code coverage for the clean_up() function? Other examples include filesystem or network function errors that are returned when disks fill up, or a network communication is blocked, etc.

What's needed is a mechanism for triggering libc faults artificially, without requiring the underlying conditions to actually exist that would normally cause such faults. This is where a fault injection library like libfiu comes in. Not only does it support fault injection at run time against libc functions without the need to link against libfiu (a dedicated binary "fiu-run" takes care of this), but it can also be used to trigger faults in arbitrary non-libc functions within a project to see how function callers handle errors. In fwknop, both strategies are used by the test suite, and this turned up a number of bugs like this one.

Full disclosure: libfiu does not yet support code coverage when executing a binary under fiu-run because there are problems interacting with libc functions necessary to write out the various source_file.c.gcno and source_file.c.gcda coverage files. This issue is being worked on for an upcoming release of libfiu. So, in the context of fwknop, libfiu is used to trigger faults directly in fwknop functions to see how calling functions handle errors, and this strategy is compatible with gcov coverage results. The fiu-run tool is also used, but more from the perspective of trying to crash one of the fwknop binaries since we can't (yet) see code coverage results under fiu-run. Here is an example fault introduced into the fko_get_username() function:
/* Return the current username for this fko context.
*/
int 
fko_get_username(fko_ctx_t ctx, char **username)
{   
#if HAVE_LIBFIU
    fiu_return_on("fko_get_username_init", FKO_ERROR_CTX_NOT_INITIALIZED);
#endif
With the fault set (there is a special command line argument --fault-injection-tag on the fwknopd server command line to enable the fault), the error handling code seen at the end of the example below is executed via the test suite. For proof of error handling execution, see the full coverage report (look at line 240).
/* Popluate a spa_data struct from an initialized (and populated) FKO context.
*/
static int
get_spa_data_fields(fko_ctx_t ctx, spa_data_t *spdat)
{   
    int res = FKO_SUCCESS;

    res = fko_get_username(ctx, &(spdat->username));
    if(res != FKO_SUCCESS)
            return(res);

Once again, it is the integration of fault injection with the test suite and corresponding code coverage reports that extends testing efforts in a powerful way. libfiu offers many nice features, including thread safey, the ability to enable a fault injection tag relative to other functions in the stack, and more.

5) Negative valgrind findings should force tests to fail

So far, a theme in this blog post has been better code coverage through integration. I've attempted to make the case for the integration of fuzzing and fault injection with project test suites, and code coverage stats should be produced for both styles of testing.

A third integration effort is to leverage valgrind. That is, the test suite should run tests underneath valgrind when possible (speed and memory usage may be constraints here depending on the project). If valgrind discovers a memory leak, double free(), or other problem, this finding should automatically cause corresponding tests to fail. In some cases valgrind suppressions will need to be created if a project depends on libraries or other code that is known to have issues under valgrind, but findings within project sources should cause tests to fail. For projects heavy on the crypto side, there are some instances where code is very deliberately built in a manner that triggers a valgrind error (see Tonnerre Lombard's write up on the old Debian OpenSSL vulnerability), but these are not common occurrences and suppressions can always be applied. The average valgrind finding in a large code base should cause test failure.

Although running tests under valgrind will not expand code coverage, valgrind is a powerful tool that should be tightly integrated with your test suite.

Summary

  • Publish your code coverage stats
  • Make it easy to automatically produce code coverage results
  • Integrate a fuzzer into your test suite
  • Further extend your code coverage with fault injection
  • Negative valgrind findings should automatically force tests to fail

Software Release: fwknop-2.6.3

fwknop-2.6.3 software release The 2.6.3 release of fwknop is available for download. The emphasis in this release is maximizing code coverage through a new python SPA packet fuzzer, and also on fault injection testing with the excellent fault injection library libfiu developed by Alberto Bertogli. Another important change in 2.6.3 is all IP resolution lookups in '-R' mode now happen over SSL to make it harder for an adversary to mount a MITM attack on the resolution lookup. As always, manually specifying the IP to allow through the remote firewall is safer than relying on any network communication - even when SSL would be involved.

Here is the complete ChangeLog for fwknop-2.6.3:

  • [client] External IP resolution via '-R' (or '--resolve-ip-http') is now done via SSL by default. The IP resolution URL is now 'https://www.cipherdyne.org/cgi-gin/myip', and a warning is generated in '-R' mode whenever a non-HTTPS URL is specified (it is safer just to use the default). The fwknop client leverages 'wget' for this operation since that is cleaner than having fwknop link against an SSL library.
  • Integrated the 'libfiu' fault injection library available from http://blitiri.com.ar/p/libfiu/ This feature is disabled by default, and requires the --enable-libfiu-support argument to the 'configure' script in order to enable it. With fwknop compiled against libfiu, fault injections are done at various locations within the fwknop sources and the test suite verifies that the faults are properly handled at run time via test/fko-wrapper/fko_fault_injection.c. In addition, the libfiu tool 'fiu-run' is used against the fwknop binaries to ensure they handle faults that libfiu introduces into libc functions. For example, fiu-run can force malloc() to fail even without huge memory pressure on the local system, and the test suite ensures the fwknop binaries properly handle this.
  • [test suite] Integrated a new python fuzzer for fwknop SPA packets (see test/spa_fuzzing.py). This greatly extends the ability of the test suite to validate libfko operations since SPA fuzzing packets are sent through libfko routines directly (independently of encryption and authentication) with a special 'configure' option --enable-fuzzing-interfaces. The python fuzzer generates over 300K SPA packets, and when used by the test suite consumes about 400MB of disk. For reference, to use both the libfiu fault injection feature mentioned above and the python fuzzer, use the --enable-complete option to the test suite.
  • [test suite] With the libfiu fault injection support and the new python fuzzer, automated testing of fwknop achieves 99.7% function coverage and 90.2% line coverage as determined by 'gcov'. The full report may be viewed here: http://www.cipherdyne.org/fwknop/lcov-results/
  • [server] Add a new GPG_FINGERPRINT_ID variable to the access.conf file so that full GnuPG fingerprints can be required for incoming SPA packets in addition to the abbreviated GnuPG signatures listed in GPG_REMOTE_ID. From the test suite, an example fingerprint is:
    GPG_FINGERPRINT_ID     00CC95F05BC146B6AC4038C9E36F443C6A3FAD56
    
  • [server] When validating access.conf stanzas make sure that one of GPG_REMOTE_ID or GPG_FINGERPRINT_ID is specified whenever GnuPG signatures are to be verified for incoming SPA packets. Signature verification is the default, and can only be disabled with GPG_DISABLE_SIG but this is NOT recommended.
  • [server] Bug fix for PF firewalls without ALTQ support on FreeBSD. With this fix it doesn't matter whether ALTQ support is available or not. Thanks to Barry Allard for discovering and reporting this issue. Closes issue #121 on github.
  • [server] Bug fix discovered with the libfiu fault injection tag "fko_get_username_init" combined with valgrind analysis. This bug is only triggered after a valid authenticated and decrypted SPA packet is sniffed by fwknopd:
    ==11181== Conditional jump or move depends on uninitialised value(s)
    ==11181==    at 0x113B6D: incoming_spa (incoming_spa.c:707)
    ==11181==    by 0x11559F: process_packet (process_packet.c:211)
    ==11181==    by 0x5270857: ??? (in /usr/lib/x86_64-linux-gnu/libpcap.so.1.4.0)
    ==11181==    by 0x114BCC: pcap_capture (pcap_capture.c:270)
    ==11181==    by 0x10F32C: main (fwknopd.c:195)
    ==11181==  Uninitialised value was created by a stack allocation
    ==11181==    at 0x113476: incoming_spa (incoming_spa.c:294)
    
  • [server] Bug fix to handle SPA packets over HTTP by making sure to honor the ENABLE_SPA_OVER_HTTP fwknopd.conf variable and to properly account for SPA packet lengths when delivered via HTTP.
  • [server] Add --test mode to instruct fwknopd to acquire and process SPA packets, but not manipulate firewall rules or execute commands that are provided by SPA clients. This option is mostly useful for the fuzzing tests in the test suite to ensure broad code coverage under adverse conditions.

Software Release: fwknop-2.6.0

fwknop-2.6.0 software release The 2.6.0 release of fwknop is available for download. This release incorporates a number of feature enhancements such as an AppArmor policy for fwknopd, HMAC authenticated encryption support for the Android client, new NAT criteria that are independently configurable for each access.conf stanza, and more rigorous valgrind verification powered by the CPAN Test::Valgrind module. A few bugs were fixed as well, and similarly to the 2.5 and 2.5.1 releases, the fwknop project has a Coverity defect count of zero. As proof of this, you can see the Coverity high-level defect stats for fwknop here (you'll need to sign up for an account): Coverity Scan Build Status I would encourage any open source project that is using Coverity to publish their scan results. At last count, it appears that over 1,100 projects are using Coverity, but OpenSSH is still not one of them.

Development on fwknop-2.6.1 will begin shortly, and here is the complete ChangeLog for fwknop-2.6.0:

  • (Radostan Riedel) Added an AppArmor policy for fwknopd that is known to work on Debian and Ubuntu systems. The policy file is available at extras/apparmor/usr.sbin/fwknopd.
  • [libfko] Nikolay Kolev reported a build issue with Mac OS X Mavericks where local fwknop copies of strlcat() and strlcpy() were conflicting with those that already ship with OS X 10.9. Closes #108 on github.
  • [libfko] (Franck Joncourt) Consolidated FKO context dumping function into lib/fko_util.c. In addition to adding a shared utility function for printing an FKO context, this change also makes the FKO context output slightly easier to parse by printing each FKO attribute on a single line (this change affected the printing of the final SPA packet data). The test suite has been updated to account for this change as well.
  • [libfko] Bug fix to not attempt SPA packet decryption with GnuPG without an fko object with encryption_mode set to FKO_ENC_MODE_ASYMMETRIC. This bug was caught with valgrind validation against the perl FKO extension together with the set of SPA fuzzing packets in test/fuzzing/fuzzing_spa_packets. Note that this bug cannot be triggered via fwknopd because additional checks are made within fwknopd itself to force FKO_ENC_MODE_ASYMMETRIC whenever an access.conf stanza contains GPG key information. This fix strengthens libfko itself to independently require that the usage of fko objects without GPG key information does not result in attempted GPG decryption operations. Hence this fix applies mostly to third party usage of libfko - i.e. stock installations of fwknopd are not affected. As always, it is recommended to use HMAC authenticated encryption whenever possible even for GPG modes since this also provides a work around even for libfko prior to this fix.
  • [Android] (Gerry Reno) Updated the Android client to be compatible with Android-4.4.
  • [Android] Added HMAC support (currently optional).
  • [server] Updated pcap_dispatch() default packet count from zero to 100. This change was made to ensure backwards compatibility with older versions of libpcap per the pcap_dispatch() man page, and also because some of a report from Les Aker of an unexpected crash on Arch Linux with libpcap-1.5.1 that is fixed by this change (closes #110).
  • [server] Bug fix for SPA NAT modes on iptables firewalls to ensure that custom fwknop chains are re-created if they get deleted out from under the running fwknopd instance.
  • [server] Added FORCE_SNAT to the access.conf file so that per-access stanza SNAT criteria can be specified for SPA access.
  • [test suite] added --gdb-test to allow a previously executed fwknop or fwknopd command to be sent through gdb with the same command line args as the test suite used. This is for convenience to rapidly allow gdb to be launched when investigating fwknop/fwknopd problems.
  • [client] (Franck Joncourt) Added --stanza-list argument to show the stanza names from ~/.fwknoprc.
  • [libfko] (Hank Leininger) Contributed a patch to greatly extend libfko error code descriptions at various places in order to give much better information on what certain error conditions mean. Closes #98.
  • [test suite] Added the ability to run perl FKO module built-in tests in the t/ directory underneath the CPAN Test::Valgrind module. This allows valgrind memory checks to be applied to libfko functions via the perl FKO module (and hence rapid prototyping can be combined with memory leak detection). A check is made to see whether the Test::Valgrind module has been installed, and --enable-valgrind is also required (or --enable-all) on the test-fwknop.pl command line.

Validating libfko Memory Usage with Test::Valgrind

Validating libfko Memory Usage with Test::Valgrind The fwknop project consistently uses valgrind to ensure that memory leaks, double free() conditions, and other problems do not creep into the code base. A high level of automation is built around valgrind usage with the fwknop test suite, and a recent addition extends this even further by using the excellent CPAN Test::Valgrind module. Even though the test suite has had the ability to run tests through valgrind, previous to this change these tests only applied to the fwknop C binaries when executed directly by the test suite. Further, some of the most rigorous testing is done through the usage of the perl FKO extension to fuzz libfko functions, so without the Test::Valgrind module these tests also could not take advantage of valgrind support. Now that the test suite supports Test::Valgrind (and a check is done to see if it is installed), all fuzzing tests can also be validated with valgrind. Technically, the fuzzing tests have been added as FKO built-in tests in the t/ directory, and the test suite runs them through Test::Valgrind like this:
# prove --exec 'perl -Iblib/lib -Iblib/arch -MTest::Valgrind' t/*.t
Here is a complete example - first, run the test suite like so:
# ./test-fwknop.pl --enable-all --include perl --test-limit 3

[+] Starting the fwknop test suite...

    args: --enable-all --include perl --test-limit 3

[+] Total test buckets to execute: 3

[perl FKO module] [compile/install] to: ./FKO...................pass (1)
[perl FKO module] [make test] run built-in tests................pass (2)
[perl FKO module] [prove t/*.t] Test::Valgrind..................pass (3)
[valgrind output] [flagged functions] ..........................pass (4)

    Run time: 1.27 minutes

[+] 0/0/0 OpenSSL tests passed/failed/executed
[+] 0/0/0 OpenSSL HMAC tests passed/failed/executed
[+] 4/0/4 test buckets passed/failed/executed
Note that all tests passed as shown above. This indicates that the test suite has not found any memory leaks through the fuzzing tests run via Test::Valgrind. But, let's validate this by artificially introducing a memory leak and see if the test suite can automatically catch it. For example, here is a patch that forces a memory leak in the validate_access_msg() libfko function. This function ensures that the shape of the access request conforms to something fwknop expects like "1.2.3.4,tcp/22". The memory leak happens because a new buffer is allocated from the heap but is never free()'d before returning from the function (obviously this patch is for illustration and testing purposes only):
$ git diff
diff --git a/lib/fko_message.c b/lib/fko_message.c
index fa6803b..c04e035 100644
--- a/lib/fko_message.c
+++ b/lib/fko_message.c
@@ -251,6 +251,13 @@ validate_access_msg(const char *msg)
     const char   *ndx;
     int     res         = FKO_SUCCESS;
     int     startlen    = strnlen(msg, MAX_SPA_MESSAGE_SIZE);
+    char *leak = NULL;
+
+    leak = malloc(100);
+    leak[0] = 'a';
+    leak[1] = 'a';
+    leak[2] = '\0';
+    printf("LEAK: %s\n", leak);

     if(startlen == MAX_SPA_MESSAGE_SIZE)
         return(FKO_ERROR_INVALID_DATA_MESSAGE_ACCESS_MISSING);
Now recompile fwknop and run the test suite again, after applying the patch (recompilation output is not shown):
# cd ../
# make
# test
# ./test-fwknop.pl --enable-all --include perl --test-limit 3
[+] Starting the fwknop test suite...

    args: --enable-all --include perl --test-limit 3

    Saved results from previous run to: output.last/

    Valgrind mode enabled, will import previous coverage from:
        output.last/valgrind-coverage/

[+] Total test buckets to execute: 3

[perl FKO module] [compile/install] to: ./FKO...................pass (1)
[perl FKO module] [make test] run built-in tests................pass (2)
[perl FKO module] [prove t/*.t] Test::Valgrind..................fail (3)
[valgrind output] [flagged functions] ..........................fail (4)

    Run time: 1.27 minutes

[+] 0/0/0 OpenSSL tests passed/failed/executed
[+] 0/0/0 OpenSSL HMAC tests passed/failed/executed
[+] 2/2/4 test buckets passed/failed/executed

This time two tests fail. The first is the test that runs the perl FKO module built-in tests under Test::Valgrind, and the second is the "flagged functions" test which compares test suite output looking for new functions that valgrind has flagged vs. the previous test suite execution. By looking at the output file of the "flagged functions" test it is easy to see the offending function where the new memory leak exists. This provides an easy, automated way of memory leak detection that is driven by perl FKO fuzzing tests.
# cat output/4.test
[+] fwknop client functions (with call line numbers):
       10 : validate_access_msg [fko_message.c:256]
        6 : fko_set_spa_message [fko_message.c:184]
        4 : fko_new_with_data [fko_funcs.c:263]
        4 : fko_decrypt_spa_data [fko_encryption.c:264]
        4 : fko_decode_spa_data [fko_decode.c:350]
Currently, there are no known memory leaks in the fwknop code, and automation built around the Test::Valgrind module will help keep it that way.

Port Knocking: Why You Should Give It Another Look

Port Knocking: Why you Should Give It Another Look (Update 10/20/2013: There is a Reddit comment thread going on this post here.)

It has been a decade since Port Knocking was first introduced to the security community in 2003, so it seemed fitting to recap how far the concept has evolved. Much effort has gone into solving architectural problems with PK systems, and today Single Packet Authorization (SPA) embodies the primary benefits of PK while fixing its limitations.

There are noted security researchers on both sides of the debate as to whether PK/SPA has any security value, but it is interesting that researchers who don't find value seem to concentrate on aspects of PK/SPA that have little to do with the chief benefit: cryptographically strong concealment. At least, this is the property offered by Single Packet Authorization but admittedly not necessarily with Port Knocking. Let's first go through some of the more common criticisms of PK/SPA, and show what the SPA answer is to each one. For those that haven't considered SPA in the past, perhaps it is time to give it a second look if for no other reason than to propose a method for breaking it.

1) Isn't PK/SPA just another password?

Suppose I hand you an arbitrary IP, say 2.2.2.2 that is running a default-drop firewall policy. As an attacker, you scan 2.2.2.2 and can't get any information back whatsoever. It doesn't respond to pings, Nmap cannot detect any TCP or UDP service under all scanning techniques, and any Metasploit module that relies on a TCP connect() call is ineffective. In the absence of a routing issue, it is safe to assume there is a firewall or ACL blocking all incoming scans. It is not feasible to tell whether there are any services listening (SSH or otherwise), and it also not feasible to tell whether there is a PK/SPA daemon either - the firewall is being used to do what it does best: block network traffic. The point is that there is no information coming back from the target. A PK/SPA daemon may be deployed, but the passive nature of PK/SPA makes it undiscoverable [1].

As a thought experiment, what would it take to make PK/SPA "just another password"? Well, if the PK/SPA daemon listened on a TCP socket and advertised itself via a server banner (like "fwknop-2.5.1, enter password for SSH access:") this would go a long way. Then Nmap would once again become an effective tool for finding the PK/SPA daemon, and an attacker could start to try different passwords. In other words, the daemon would no longer be passive, which is the whole point of PK/SPA to begin with.

But wait, you might say "but attackers can just try to brute force passive PK/SPA daemons anyway (even though they can't scan for them directly) and see if a port opens up", which brings us to:

2) Can PK/SPA be brute-forced?

Port Knocking implementations that use simple shared sequences are certainly vulnerable to brute forcing, and they are also vulnerable to replay attacks. These vulnerabilities (among other problems) were primary motivators for the development of SPA, and any modern SPA implementation is not vulnerable to either of these attacks. For example, fwknop uses AES in CBC mode authenticated with an HMAC SHA-256 in the encrypt-then-authenticate model, and both the encryption and HMAC keys (256 and 512 bits respectively for a total of 768 bits) are generated from random data in --key-gen mode. Further, fwknop can leverage GnuPG instead of AES, and 2048-bit GnuPG keys are fully supported. If it were practical to brute force fwknop encryption and authentication, then it would also be practical to brute force a lot of other cryptographic software too. Hence, fwknop is not vulnerable to such attacks in any practical sense [2].

Beyond this, from 1) above remember that the very existence of the SPA daemon is not discoverable by an attacker. For the average adversary, interacting with the SPA daemon must be done blindly "by chance". So, the target - even if it is running an SPA daemon which the attacker can't see - which implementation should the attacker try to brute force? If the target is running Moxie Marlinspike's knockknock (which uses AES in CTR mode authenticated with a truncated HMAC SHA-1), then the attacker needs to try and brute force the daemon with crafted TCP SYN packets via the following fields: TCP window size, TCP sequence, TCP acknowledgement, and the network layer IP ID. On the other hand, if the target is running fwknop, then the attacker would have to try and brute force the fwknopd daemon with UDP payloads to a port that the fwknopd pcap filter statement allows (although fwknopd can also be configured to only accept SPA payloads over ICMP instead). Should the attacker try to brute force the fwknop AES-CBC + HMAC SHA-256 mode? Or the GnuPG + HMAC SHA-256 mode? Further, the fwknopd daemon can place restrictions on the services that an authenticated client is authorized to request via the access.conf file. There are a lot of bits adding up, and the entire time the attacker doesn't even know whether an SPA daemon is actually running let alone which one.

Unfortunately for the attacker it gets worse. Even if the attacker could somehow brute force both the encryption and authentication steps in fwknop or other SPA software, to which service should the attacker try to make a connection? No service has to listen on the default port, so if a connection to SSH isn't answered should the attacker scan the target looking for a service? Maybe SPA is being used to conceal an IMAP daemon, a webserver, or OpenVPN instead of SSHD? Further, because the SPA daemon never acknowledges anything to begin with, the attacker can only infer that a brute force attempt was successful by seeing if a service is finally available after each attempt. So, in order for the attacker to be effective, the work flow for every brute force attempt should be: (1) brute force attempt, (2) scan for SSH (just because SPA is usually used to conceal SSH), (3) full scan if the SSH scan doesn't work. This starts to become extremely noisy to say the least. Even if full scans aren't also used, the volume of traffic just to attempt brute force operations by themselves is prohibitively huge.

Regardless of the attacker work flow, which service is concealed, or how heavily the attacker scans the target after every attempt, the brute force resistance offered by fwknop is fundamentally provided through the strength of cryptography. Getting past the authentication step alone would require breaking the 512-bit HMAC SHA-256 key (or forcing a hash collision against SHA-256), and fwknop even supports HMAC SHA-512 if one prefers that instead. Beyond this, the encryption key would also need to be brute-forced. For all intents and purposes it is not practical to brute force fwknop, and similar arguments apply to other SPA software.

3) Does PK/SPA add intolerable complexity?

Every security measure has some associated complexity. Firewalls add complexity. Encryption adds complexity. SSH itself adds complexity. If complexity were always the trump card against higher levels of security, then people would connect admin shells to sockets directly via netcat (or just run telnet) and not worry about encryption. People would not run firewalls because vanilla IP stacks without firewalling hooks would be simpler. Filesystem permissions overlays would be considered insecure because they add complexity. Obviously such viewpoints don't pass muster in the real world. The point is that using more complex code sometimes enables higher levels of security against widely understood threat models. For example, people use SSH and SSL because they want authentication and confidentiality over untrusted networks. Firewalls are used to reduce the attack surface that an adversary can easily communicate with. Application and/or filesystem layer policy controls are engineered to place restrictions on classes of users. The list continues.

This is not to say that complexity is not an important consideration - far from it. Rather, a real security benefit must be realized in order to justify increasing the complexity of a system. In the SPA community, we assert there is a security benefit afforded by passive, cryptographically strong service concealment. How would an attacker try to brute force user passwords via SSH when it is concealed by SPA? How would an attacker exploit even a zero-day vulnerability in a service protected by SPA? How would an attacker exploit a vulnerability in the SPA daemon itself when it is indistinguishable from a system that is running a default-drop firewall policy? (An attacker may interact with the SPA daemon, but it is more or less "by chance" [3].)

In the context of PK/SPA, the real issue is whether the complexity of the code that an attacker can interact with is more or less when PK/SPA is deployed. Anyone protecting networked services is probably already running a firewall, so the firewall usage by the PK/SPA software isn't adding to complexity that wasn't already there. Next, if PK/SPA is used to conceal multiple services (say, SSH, an IMAP daemon, and a webserver all at the same time), a would-be attacker cannot interact with any of those code bases without first getting past the PK/SPA daemon. It is a good bet that the complexity of the PK/SPA daemon is a lot less than the aggregate complexity of all three of those services if they were open to the world. Further, this may still be true even for a single daemon such as SSH as well. In essence, the effective complexity of code that an attacker can interact with may actually go down with PK/SPA deployed - that is, until the PK/SPA daemon can be circumvented (and hence a method for doing this becomes an important question for an attacker).

4) What happens if the PK/SPA daemon dies?

This is a solved problem. Process monitoring software has been around for decades. Many options exist for any OS on which a PK/SPA daemon is deployed. For example, on Ubuntu systems fwknopd is monitored by upstart. Having said this, fwknopd is extremely stable anyway so this feature is hardly ever needed in practice. Still, it is certainly important to ensure that PK/SPA usage does not cause a single point of failure, so using process monitoring software is a good idea.

Summary

There are other criticisms of SPA that are not included in this blog post, and certainly some of them are legitimate such as the fact that SPA requires a specialized client to access concealed services and the fact that "NAT piggybacking" is possible for users on the same network from which an SPA client is used when behind a NAT. However, these points don't generally rise to the level that they invalidate the SPA strategy. This blog post attempts to address those criticisms that could rise to this level were it not for the effort that has gone into solid SPA design by fwknop and other projects. More information on the design goals that guide fwknop can be found in the project tutorial.

In conclusion, SSL uses cryptography to provide authentication and confidentiality, Tor uses cryptography to provide anonymity, and SPA uses cryptography to conceal service existence. For those that assert there is no security value in the later strategy, it should consequently not be difficult to circumvent. To those in this camp, given the material in this post, please propose a method for breaking SPA.

[1] At least, a PK/SPA daemon is not discoverable by attackers who aren't already in a privileged position to sniff all traffic to and from the target. Clearly, most attackers - including password-guessing botnets - do not fall into this category.

[2] It is possible to weaken the security of fwknop SPA communications by not using --key-gen mode to generate random encryption and HMAC keys and thereby make them more susceptible to brute force attacks. However, this type of problems similarly affects other cryptographic software so it isn't unique to fwknop. And, even if a user doesn't use --key-gen mode, it is still not as easy to brute force fwknopd (which never confirms its existence to an attacker) as other software which an attacker can readily see is available to exploit.

[3] The security of fwknopd code itself is nonetheless quite important, and this is why the fwknop project uses static analysis provided by Coverity (and has a Coverity scan score of zero), the CLANG static analyzer, and also implements dynamic analysis with valgrind via a comprehensive test suite.

Software Release: fwknop-2.5 with HMAC Support

fwknop-2.5 with HMAC support After a long development cycle started over a year ago that has focused on how fwknop uses cryptography, the 2.5 release of fwknop is available for download. This release now includes support for HMAC authenticated encryption, with SHA-256 being the default digest algorithm though others such as SHA-512 are supported as well. The HMAC mode can be applied to SPA packets that have been encrypted with either Rijndael or GnuPG, and the order of operation is always encrypt-then-authenticate which is considered to be the most secure option among all possible orders. Not only does using the new HMAC mode provide a cryptographically strong authentication step for SPA communications, it also affords a significant security benefit because maliciously constructed SPA packets can be discarded before they are even sent through decryption routines. I.e. HMAC verification is a much more simplistic operation than decryption, and therefore generally less prone to programming bugs and potential security vulnerabilities.

There are many other enhancements in fwknop-2.5 as well such as usage of the Coverity static analyzer, a new ~/.fwknoprc stanza saving feature for fwknop client usage simplification, support for automatic Rijndael+HMAC key generation with the --key-gen option, many test suite improvements, an updated tutorial, and more. There is a robust roadmap for fwknop, and new releases will come faster now that a solid foundation is made upon HMAC authenticated encryption for SPA packets.

I wish to thank all who contributed to this effort - particularly Damien Stuart, Franck Joncourt, Blair Zajac, Michael T. Dean, and Ryman. Additional contributors are listed in the git history.

NOTE: If you are upgrading from a previous version of fwknop you will want to read the following information on backwards compatibility. In short, fwknop-2.5 is compatible with prior versions, but it requires one configuration tweak to either the client (add "-M legacy" to the command line) or server config (add "ENCRYPTION_MODE legacy" to each stanza in the access.conf file) if you wish to run a mixed environment of older clients and/or older servers. The reason for the incompatibility is that prior to 2.5, fwknop was not properly using PBKDF1 for Rijndael key derivation - this has been fixed.

Here is the complete ChangeLog for fwknop-2.5:

  • ***** IMPORTANT *****: If you are upgrading from an older version of fwknop, you will want to read the "Backwards Compatibility" section of the fwknop tutorial available here:

    http://www.cipherdyne.org/fwknop/docs/fwknop-tutorial.html#backwards-compatibility

    In summary, it is possible to have a mixed environment of fwknop-2.5 clients and/or servers with older client and/or servers, but this requires some configuration in order to work properly. On the server side, the directive "ENCRYPTION_MODE legacy" will need to be added to every access.conf stanza that uses Rijndael and that needs to support SPA packets from pre-2.5 clients. On the client side when generating Rijndael-encrypted SPA packets from a pre-2.5 server, the command line argument "-M legacy" will need to be given. GnuPG operations are not affected however and don't require the above steps whenever the new HMAC authenticated encryption feature (offered in fwknop-2.5) is not used.
  • Major release of new functionality - HMAC authenticated encryption support in the encrypt-then-authenticate model for SPA communications. Supported HMAC digests include MD5, SHA1, SHA256, SHA384, and SHA512. The default is HMAC-SHA256 when an HMAC is used. The HMAC mode is supported for both Rijndael and GPG encrypted SPA packet data, and provides a significant security benefit for the fwknopd server since the HMAC verification is more simplistic than decryption operations. This is particularly true for GPG. Beyond this, HMAC authenticated encryption in the encrypt-then-authenticate mode does not suffer from things like CBC-mode padding oracle attacks (see the Vaudenay attack and the more recent "Lucky 13" attack against SSL). HMAC verifications are performed with a constant time comparison function.
  • [libfko] Significant bug fix to honor the full encryption key length for user-supplied Rijndael keys > 16 bytes long. Previous to this fix, only the first 16 bytes of a key were actually used in the encryption/ decryption process even if the supplied key was longer. The result was a weakening of expected security for users that had keys > 16 bytes. Note that "passphrase" is perhaps technically a better word for "user-supplied key" in this context since the actual key is generated with the PBKDF1 key derivation algorithm. This issue was reported by Michael T. Dean. Closes issue #18 on github.
  • [libfko] Added the ability to maintain backwards compatibility with the now deprecated "zero padding" key derivation strategy in AES mode that was a hold over from the old perl fwknop implementation. This is NOT compliant with PBKDF1 and is only brought forward into fwknop-2.5 for backwards compatibility. Future versions of fwknop will remove this code altogether since PBKDF1 is now implemented.
  • [libfko+server] Ensure that all HMAC, digest, and other comparisons are done via a dedicated constant_runtime_cmp() function so that a potential attacker cannot gain any information about fail/success just by mounting a timing attack. This function always compares two buffers from beginning to end regardless of whether a difference is detected early on in the comparison, and this strategy mirrors changes in SSL libraries such as yaSSL to protect against potential timing attacks. This change fixes #85 on github which was reported by Ryman.
  • [test suite] Added --enable-openssl-checks to send all SPA packets encrypted via libfko through the OpenSSL library to ensure that the libfko usage of AES is always compatible with OpenSSL. This ensures that the fwknop usage of AES is properly implemented as verified by the OpenSSL library, which is a frequently audited high profile crypto engine. If a vulnerability is discovered in OpenSSL and a change is made, then the --enable-openssl-checks mode will allow the test suite to discover this in a automated fashion for fwknop.
  • The fwknop project is using Coverity for source code validation (in addition to other tools such as the CLANG static analyzer). Many bugs have been fixed in this release that were discovered by Coverity. These bugs spanned multiple classes of problems from memory leaks, improper use of sizeof(), potential double-free conditions, and more. Full details on these fixes are available in the git history. Any open source project that is written in a language supported by Coverity would benefit highly from participating. As of the 2.5 release, fwknop has a Coverity defect score of zero.
  • [test suite] Changed how the test suite interacts with the fwknop client and server by looking for indications that SPA packets are actually received. This is done by first waiting for 'main event loop' in fwknopd log output to ensure that fwknopd is ready to receive packets, sending the SPA packet(s), and then watching for for 'SPA Packet from IP' in fwknopd output. This is an improvement over the previous strategy that was only based on timeout values since it works identically regardless of whether fwknop is being run under valgrind or when the test suite is run on an embedded system with very limited resources. Another check is run for fwknopd receiving the SIGTERM signal to shutdown via 'fwknopd -K', and that failing, the test suite manually kills the process (though this should be rarely needed). This change was implemented based on discussions with George Herlin.
  • (Franck Joncourt) Added support for resolving hostnames in various NAT modes (fixes issue #43 in github).
  • (Franck Joncourt) Bug fix in the client for resolving hostnames in '-P icmp' mode (fixes issue #64).
  • (Franck Joncourt) Added support for saving fwknop client command line arguments via a new options --save-rc-stanza.
  • (Franck Joncourt) Added log module support for the client.
  • [client] Bug fix for --nat-rand-port mode to ensure that the port to be NAT'd is properly defined so that the fwknopd server will NAT connections to this port instead of applying the NAT operation to the port that is to be accessed via -A. This change also prints the randomly assigned port to stdout regardless of whether --verbose mode is used (since if not then the user will have no idea which port is actually going to be NAT'd on the fwknopd side).
  • (Vlad Glagolev) Submitted an OpenBSD port for fwknop-2.0.4, and this has been checked in under the extras/openbsd/fwknop-2.0.4 directory.
  • (Shawn Wilson) Added better SPA source IP logging for various fwknopd logging messages. This helps to make it more clear why certain SPA packets are rejected from some systems.
  • [client] Added --get-hmac-key to allow HMAC keys to be acquired from the specified file similarly to the --get-key option. This is a convenience only, and the fwknop rc file feature should be used instead since it is far more powerful.

Coverity Static Analysis and Open Source Software

Coverity Scan A few months ago Coverity announced that they would grant open source projects access to their "Scan" static analysis tool for free. This was certainly a bold move, and one that has the potential to improve the security landscape as more projects take advantage of it. The first time I had encountered Coverity was several years ago when Linux kernel ChangeLog entries were referencing Coverity's Scan tool as having found various classes of bugs in the kernel sources. This impressed me because the Linux kernel source code is not the easiest code to grok, and at the same time it is extremely widely deployed and audited so bugs get reported and fixed. Andrew Morton, second in command of the kernel, has pegged the time required to become proficient as a kernel developer at about one year of full time dedicated effort for an experienced systems programmer. Hence, any automated static analysis infrastructure that is good enough to find kernel bugs and have it credited in the kernel ChangeLog is a strong vote of confidence to my mind. Here is a recent instance of Coverity finding a NULL pointer dereference bug in Infiniband driver code for example.

Even though Coverity started being used by the Linux kernel development community circa 2005, Coverity was expensive and therefore not generally available to open source projects that typically do not have access to significant resources. Today, that has all changed with Coverity's open source initiative, and the community is better off. Simultaneously, Coverity also gets a benefit in terms of free press as more open source projects start using Coverity Scan. As of this writing, over 300 projects have signed up including fwknop, and Coverity maintains a list of these here. OpenVPN and OpenSSL are participating, but it doesn't look like OpenSSH is yet and I (for one) hope they do eventually.

Here is an example of a bug that Coverity found in the fwknop client:

index fea3679..2532669 100644 (file)
@@ -552,6 +552,7 @@ send_spa_packet_http(const char *spa_data, const int sd_len,
                 log_msg(LOG_VERBOSITY_ERROR,
                     "[-] proxy port value is invalid, must be in [%d-%d]",
                     1, MAX_PORT);
+                free(spa_data_copy);
                 return 0;
             }
         }


This is a classic memory leak bug in the fwknop client that is only triggered when an invalid port number is specified from the command line when sending SPA packet data through an HTTP proxy (this is not a widely used method for SPA packet transmission, but it is supported by the fwknop client along with other transmission modes). The spa_data_copy variable points to heap allocated memory that wasn't free()'d after the error condition was detected and before returning from the send_spa_packet_http() function call. Coverity was able to spot this bug through static analysis, and the diff output above shows the corresponding fix. In terms of other bugs found by Coverity, they have all been fixed in the current fwknop sources in preparation for the 2.5 release. A -pre release is available through the new Github releases feature, and will likely become 2.5 within a week: fwknop-2.5-pre3.tar.gz.

This same bug could also have been caught at run time with the excellent valgrind "memcheck" tool, but the invalid port number error would have had to be triggered in order for valgrind to detect the memory leak. While the fwknop test suite is fairly comprehensive and can automatically run every test underneath valgrind to help with the detection of memory leaks, not every error condition is triggered by the test suite and hence static analysis is quite important. At the same time, run time analysis is also important to pick up leaks and other problems that static analysis techniques may miss, so running valgrind will continue to be a critical test suite component.

Now, on the vulnerability research side, there are implications for vulnerability discovery. That is, source control changelogs that mention Coverity or other static analysis tools can help malicious actors find potentially exploitable code. Here is an example Github search that shows all issues where Coverity is mentioned. Is this a bad thing? Well, in a word I would say "no". The exploit community is constantly looking for ways to exploit software regardless of whether static analysis tools are used by project developers. The real problem is that bugs are frequently never discovered or fixed in the first place. Further, keeping bugs under wraps in a vein effort to hope they won't be found has never worked, and this is true regardless of whether a project is open source or not (binary patches released by Microsoft are reversed engineered to figure out how to exploit unpatched code for example). What we need is better, more security conscious software engineering - not imagined bug secrecy. Consistent, advertised usage of static analysis tools helps to drive this. In addition, static analysis tools should be seen as a piece of the bug hunting puzzle and be used in conjunction with other standard techniques such as dynamic analysis, unit tests, and good old fashioned code reviews.

Finally, in terms of Coverity's own competition such as Veracode, Klocwork, or Checkmarx, it seems to me that Coverity has made a shrewd move by offering code analysis to open source projects for free. They will continue to make important inroads into the OS community (which has become quite powerful over the past, say, 15 years and longer), and will most likely see a gain on the commercial side as a result.

The bottom line is that Coverity has an excellent static analysis product, and if you run or contribute to an open source project written in C/C++, you should be using Coverity Scan. It will likely find bugs that can certainly have security implications in your code. Even if Coverity doesn't turn up any bugs, additional verification is always a good thing - especially for security projects.

ShmooCon 2013 Talk on fwknop

ShmooCon 2013 Update 03/24/2013: All presentation videos have been uploaded to shmoocon.org, and here is the full video for the talk discussed below: video.

This past weekend at ShmooCon 2013 I gave a talk entitled "Generalized Single Packet Authorization for Cloud Computing Environments" (slides, video demo), and in this blog post I thought it appropriate to give an introduction to the talk in blog format. The main point of the talk was to show that it is completely feasible to integrate SPA as implemented by fwknop with Cloud Computing environments that conform to the Infrastructure as a Service (IaaS) model. In particular, it was shown that SPA can be used to protect and access both SSH and RDP services running on EC2 instances within a Virtual Private Cloud (VPC) in Amazon's AWS environment. Accessing SSHD via SPA by itself is nothing new and has been a primary use case supported by fwknop for a long time, but maintaining a default-drop packet filtering stance for RDP is a different story due to the fact that fwknop does not support a Windows firewall. However, through the use of both SNAT and DNAT capabilities within iptables, the fwknop daemon is able to convert an Ubuntu EC2 instance running within Amazon's Cloud into an "SPA gateway" that is able to then allow access to any service running on any other system within the VPC network - including RDP on a separate Windows host. Further, such access is granted through a single externally routable Amazon Elastic IP that is associated with the Ubuntu system. The Windows host doesn't even need to have a default route back out to the Internet - it always just sees incoming RDP connections from the Ubuntu host even though they really originate out on the Internet from whatever system is allowed via the SPA protocol. Here is a graphic to illustrate this scenario: fwknop + Amazon integration If you want a video demonstration of this all in action, this demo was given during the ShmooCon talk:


Now, for more explanation, the following systems are depicted in the graphic above:

  • The SPA client system that runs the fwknop client to send SPA packets and from which SSH and RDP connections originate.
  • The Amazon border firewall.
  • An Ubuntu host on the VPC network where the fwknop daemon runs.
  • A Windows system on the VPC network where RDP lives.
At the same time, the Ubuntu VPC host is also running a webserver. This is to illustrate that everyone out on the Internet can scan the externally routable Elastic IP (107.23.221.41 in this case) associated with the Ubuntu host and see that Apache is listening on port 80. (Note that this IP is not currently associated with my Amazon account.) But, no other services are visible to a scanner on the Ubuntu system because iptables is deployed in a default-drop stance for everything except for port 80. Even so, fwknop is able to create access through the iptables policy for SSH and RDP through the Ubuntu system over port 80 (i.e. SPA "ghost services"). These two scripts that are shown in the demo video illustrate gaining access to SSH and RDP:
$ cat ssh_access.sh
#!/bin/sh -x

fwknop -A tcp/80 --server-port 40001 -R -D 107.23.221.41
sleep 2
ssh -p 80 -i ~/.ssh/cdyne-vpc-testing.pem ubuntu@107.23.221.41
exit

$ cat rdp_access.sh
#!/bin/sh -x

fwknop -A tcp/80 --server-port 40001 -N 10.0.0.223,3389 -R -D 107.23.221.41
sleep 2
rdesktop -u Administrator 107.23.221.41:80
exit

To see all of the above in action just watch the video. Hopefully the talk itself will be posted online soon, and when it is I'll post a link. More information can also be found in the talk slides.

Finally, I'd like to thank the ShmooCon organizers for putting on a great conference as usual, and I'm looking forward to next year.
« Previous | Next »