dimanche 10 mai 2015

Windows: directly examine cp1252

Let me prefice this by saying: I am by no means a Windows programmer. Please help me by correcting any misunderstanding I may have.

My understanding is that Windows has both (legacy) single-byte string interfaces and modernized Unicode interfaces.

My goal is to closely examine the cp1252 as implemented in the Windows kernel. I'll start with Windows XP, but I plan to check as many versions as I can.

I'm going to make the output of such a program similar in format to: http://ift.tt/1E9Ff6E

My question is primarily: what Windows API functions would I use to accomplish the above task? I think it's mbstowcs_s.

Secondarily: Must I write C in order to examine the relevant interfaces? If so what compiler would I use? I think Visual Studio Express 2010 is a good match, but I can't find any (legitimate) place to download it.


For those that must know the X to my Y, there are two competing standards and implementations of cp1252. They differ only slightly but they do differ, and it's significant to me.

The WHATWG specifies, and all browsers implement this standard: http://ift.tt/1E9Ff6E

Microsoft specifies, and python implements this standard: http://ift.tt/1RrRQv9

The difference is in the five non-printable characters. In the windows spec they're entirely undefined, so these bytes cannot be round-tripped through cp1252. In the WHATWG spec (and all browsers), these bytes map to non-printing characters of the same value, as in latin1, meaning that those bytes can round-trip successfully through cp1252.

I strongly suspect that Microsoft's implementation actually matches the WHATWG spec and browsers' implementations, rather than the spec they've published. This is what I'm trying to prove/disprove above.

Aucun commentaire:

Enregistrer un commentaire