Home   Index   About
Ultimate Pack


Custom Search
Byte-order Mark

Always prefix a Unicode plain text file with a byte-order mark. Because Unicode plain text is a sequence of 16-bit codes, it is sensitive to the byte ordering used when the text was written.

A byte-order mark is not a control character that selects the byte order of the text; it simply informs an application receiving the file that the file is byte ordered.

Ideally, all Unicode text would follow only one set of byte-ordering rules. This is not possible, however, because microprocessors differ in the position of the least significant byte: IntelŪ and MIPSŪ processors have the least significant byte first; Motorola processors (and byte-reversed Unicode files) have it last. With only a single set of byte-ordering rules, users of one type of microprocessor would be forced to swap the byte order every time a plain text file is read from or written to, even if the file is never transferred to another system based on a different microprocessor.

The preferred place to specify byte order is in a file header, but text files do not have headers. Therefore, Unicode has defined a character (0xFEFF) and a noncharacter (0xFFFE) as byte-order marks. They are mirror byte-images of each other.

Since the sequence 0xFEFF is exceedingly rare at the outset of regular non-Unicode text files, it can serve as an implicit marker or signature to identify the file as a Unicode file. Applications written to read both Unicode and non-Unicode text files should use the presence of this sequence as a near-certain indicator that the file is a Unicode file. (Compare this technique to using the MS-DOS EOF marker to terminate text files.)

When an application finds 0xFEFF at the beginning of a text file, it typically processes the file as though it were a Unicode file, although it may also perform further heuristic checks to verify that this is true. Such a check could be as simple as a test of whether the variation in the low-order bytes is much higher than the variation in the high-order bytes. For example, if ASCII text is converted to Unicode text, every second byte is zero. Also, checking both for the linefeed and carriage-return characters (0x000A and 0x000D) and for even or odd file size can provide a strong indicator of the nature of the file.

When an application finds 0xFFFE at the beginning of a text file, it interprets it to mean the file is a byte-reversed Unicode file. The application can either swap the order of the bytes or alert the user that an error has occurred.

The Unicode byte-order mark character is not found in any code page, so it disappears if data is converted to ANSI. Unlike other Unicode characters, it is not replaced by a default character when it is converted. If a byte-order mark is found in the middle of a file, it is not interpreted as a Unicode character and has no effect on text output.

The Unicode value 0xFFFF is illegal in plain text files and cannot be passed between Windows functions. The value 0xFFFF is reserved for an application's private use.


Last news from Greatis Software

Nostalgia .Net     Nostalgia .Net     .Net is powerful, but not all-powerful, so sometimes we need to use Win32 API for our .Net applications. It's simple enough with Platform Invoke if you have Win32 skill, but we do not always have time to dig the ancient documentation, declare the special types that are compatible with Win32, find the values of the Win32's constants and so on. Nostalgia .Net offers several simple-to-use classes, and components that will allow you to forget about the headache of Win32 and just use the power of Win32 in your application the same way as you use the native. Net classes.  More »

Recommended software for developers

Ultimate Pack for Delphi and C++ Builder     Ultimate Pack     Component pack for Delphi and C++ Builder that contains runtime form designer, runtime object inspector, print suite and much more for the very special price.  More »

Form Designer .Net     Form Designer .Net     Unique runtime form design solution that allows to edit any form in .Net WinForms application at runtime with full source codes for only 300 euro!  More »

Print Suite .Net     Print Suite .Net     Print Suite .Net is a set of components for easy printing texts, images and grids from your WinForms applications. Full C# source codes are available  More »

Gradient Controls .Net     Gradient Controls .Net     Gradient Controls .Net offers controls with gradient background feature. Labels, panels and so on... Full C# source codes are available  More »

iGrid     Greatis iGrid     iGrid plots drawing grid right over your desktop, so you can use it everywhere, with any drawing application without any special plugins for different graphic editors.  More »


All the contacts and projects

Dmitry Vasiliev (just.dmitry)

Related Links

Software for Visual Studio .NET developers
Software for Delphi and C++ Builder developers
Software for Visual Basic 6 developers
Delphi Tips&Tricks
MegaDetailed.NET

More Online Helps

Win32 Programmer's Reference
Win32 Multimedia Programmer's Reference
OLE Programmer's Reference
Microsoft Windows Pen API Programmer's Reference
Microsoft Windows Sockets 2 Reference
Microsoft Windows Telephony API (TAPI) Programmer's Reference
Unix Manual Pages

Free Tech Secrets ;) Copyright © 2008-2012 Free Tech Secrets ;) greatis just4fun network just4fun