yumeyao wrote:The answer is I used win7 notepad. Just reproduced it in WinXP notepad.
So I suppose win7 notepad do some more actions in addition to calling IsTextUnicode because notepad2 fails(which is an example that directly calls IsTextUnicode)
Indeed, Notepad in Windows 7 is different. Specifically, after it calls IsTextUnicode, it checks the reasons that IsTextUnicode gave for its result. If its only reason for flagging text as little-endian UTF-16 is a statistical/heuristic analysis (i.e., there was no BOM or other cues), then it will accept that result only if the file size is at least 100 bytes. Otherwise, it will disregard the statistical analysis.
The likelihood of a false positive in the statistical analysis is higher for short, small files, so they picked an arbitrary line (100 bytes) and disregarded the heuristics for anything smaller than that. It's pretty hackish, but it does help eliminate some false positives (of course, this will result in some new false negatives, but the avoided FPs probably outnumber the new FNs). Note that this check does
not apply if the text is flagged as BE UTF-16; only LE results are subject to this adjustment.
I will make this same change to my next build of Notepad2 (and to HashCheck, which also uses IsTextUnicode when processing md5/sfv/etc. files).
Code: Select all
notepad.exe Section .text (0x01001000)
0x1009931: CALL DWORD PTR [ADVAPI32.DLL!IsTextUnicode]; (0x1001014)
0x1009937: TEST EAX,EAX
0x1009939: JZ 0x1009949
0x100993B: CMP DWORD PTR [EBP-0x4],0x2 ; IS_TEXT_UNICODE_STATISTICS
0x100993F: JNE 0x1009949
0x1009941: CMP DWORD PTR [EBP+0xC],0x64 ; iSize <=> 100 bytes
0x1009945: JGE 0x1009949
0x1009947: XOR EAX,EAX
0x1009949: LEAVE
0x100994A: RET 0x8