A while back 3APA3A and Arian Evans (Director of Operations, WhiteHat Security) left off a full-disclosure thread about an interesting encoding bypass attack, Unicode Left/Right Pointing Double Angel Quotation Mark.
By the way: I saw Unicode Left Pointing Double Angel Quotation Mark (%u00AB) / Unicode Right Pointing Double Angel Quotation Mark (%u00BB) are sometimes translated to '<' and '>'. Does somebody experimented with
in different environments to bypass filtering in this way?
Arian promised to get back to 3APA3A after scanning several hundred production websites using WhiteHat Sentinel. A huge R&D benefit of the platform. Two years later there is data to share. We’ve been busy, but hey, better late the never right? :) As it turned out 3APA3A was correct! Arian discovered a small number of Web applications vulnerable to the encoding technique and they add up if the sample pool is large enough. Samples ranging from 300 to roughly 1000 websites. Remember these are collapsed numbers. Meaning multiple vulnerability inputs on the same Web application are grouped together.
11 exploitable XSS in 8 websites:
15 exploitable XSS in 12 sites:
〈 ;script〉 ;
2 in 2:
1 in 1:
‹ ;script› ;
1 in 1:
1 in 1:
⟨ ;script⟩ ;
*whitespace before semi-colons are added purposely to prevent formatting blog formatting glitches.
Arian Evans, in his own words...
These are exploitable conditions where this was the ONLY way that arbitrary HTML could be created. There were are many more sites that normalized these and the same encoding could be used for filter-evasion/exploitation, but they were not the ONLY way to create arbitrary HTML in the application. Unfortunately the dataset does not count all of the ANDs/combinations right now, just the ONLYs. So if there was a simpler way to create arbitrary HTML, that is the only way it was counted. The rabbit hole goes much deeper. Dozens of combinations and permutations that lead to exploitation and not just for XSS. For many types of syntax-attacks. Still researching.
There are also MANY more of these in international language code pages. Browser behavior gets really unpredictable with foreign-language character sets which increases XSS and HTTP/RS exploit options even more. There are also many more ways to use these when you start layering your encoding techniques. Yosuke Hasegawa did a great presentation on Japanese/Kanji character sets @ BlackHat Tokyo 2008. For example I found many of these attack vectors work at an even higher percentage when URI-escaped or combined with other Hex-encoding formats (or Decimal, Base64, etc. etc. etc.).
3APA3A, thanks for opening my mind up to some new angles on filter-evasion tricks! :)