Unicode character bits thrown away

Bug #313472 reported by Matt Giuca
252
Affects Status Importance Assigned to Milestone
PySGP
Triaged
High
Unassigned

Bug Description

This is a bug in the original SuperGenPass implementation (http://supergenpass.com). We have faithfully reproduced this bug in PySGP, to maintain compatibility with the original. But we'd like to solve it, hence this bug report.

When SGP processes the salted string, it bitwise-ands each character's Unicode code point with 0xFF (I'm not sure if this was their intention). This means it's throwing away some information from the string - all but the lowest 8 bits of each character. This is a (minor) potential security vulnerability, as all strings with the same low 8 bits in each character will generate the same password.

To demonstrate, try sgp(passwd="foo", url="google.com").

With the master password "foo" (code points U+0066, U+006F, U+006F), you get "mSp9jooep2".
Now try the password "fůo" (code points U+0066, U+016F, U+006F). "mSp9jooep2".
Or even "Ѧůᑯ" (code points U+0466, U+016F, U+146F). "mSp9jooep2".
(ie. any characters where the last two hex digits are the same => same hash).

This is not very practical to exploit, but there is a small potential, IF unicode is allowed in domain names (which it currently isn't ~really~ but you sort of can, and will probably be in the future). I could create a site at "gůůgle.com" (try it) which will generate the same password as your login for google.com. If I assume you're using this algorithm, I can steal your Google password.

Not sure why you'd trust a site called "gůůgle.com", it's clearly a phishing site, but some people might.

Source Code Location

This "bug" is implemented in the function str2binl in supergenpass.py, and clearly marked in comments.

Plan For Fix

The correct behaviour would be to encode the string to UTF-8 first. I suspect this wasn't done originally because JavaScript has no built-in way to do this. Note that fixing it this way would also make it easier to port the algorithm to other languages which don't support Unicode at all, such as PHP and C.

I doubt you could fix this without breaking the algorithm for anybody who happens to have used Unicode in their password. An alternative version could be made with the correct behaviour, but it would have to be marked as incompatible. To be discussed.

Matt Giuca (mgiuca)
Changed in pysgp:
importance: Undecided → Wishlist
status: New → Confirmed
Revision history for this message
bugmenot (bugmenot) wrote :

This bug has been fixed in SuperGenPass 2.0 -- now live.

I stumbled upon this project by accident, and this was the first I'd heard of this bug. Wish you'd sent it to me, I would have fixed it much sooner.

Revision history for this message
Matt Giuca (mgiuca) wrote :

Oh, great.

Sorry we didn't contact you about it. We were trying to faithfully reproduce the algorithm and figured this "bug" was not actually fixable without breaking the algorithm itself.

Revision history for this message
goto (gotolaunchpad) wrote :

It does "break" the algorithm in the sense that master passwords or domains with Unicode characters now produce different passwords. But it was worth fixing, in my opinion, for the security concern (google.com/göögle.com).

Revision history for this message
Matt Giuca (mgiuca) wrote :

Now that this bug is fixed in the main SuperGenPass algorithm, we need to fix it to remain compatible.

Changed in pysgp:
importance: Wishlist → High
status: Confirmed → Triaged
To post a comment you must log in.
This report contains Public Security information  
Everyone can see this security related information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.