This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: 1 Octal and hexadecimal escape sequences do not undergo conversion; ’\x12’ has the value 0x12 regardless of the currently selected execution character set. All other escapes are replaced by the character in the source character set that they represent, then converted to the execution character set, just like unescaped characters. Unless the experimental ‘-fextended-identifiers ’ option is used, GCC does not per- mit the use of characters outside the ASCII range, nor ‘ \u ’ and ‘ \U ’ escapes, in identifiers. Even with that option, characters outside the ASCII range can only be specified with the ‘ \u ’ and ‘ \U ’ escapes, not used directly in identifiers. 1.2 Initial processing The preprocessor performs a series of textual transformations on its input. These happen before all other processing. Conceptually, they happen in a rigid order, and the entire file is run through each transformation before the next one begins. CPP actually does them all at once, for performance reasons. These transformations correspond roughly to the first three “phases of translation” described in the C standard. 1. The input file is read into memory and broken into lines. Different systems use different conventions to indicate the end of a line. GCC accepts the ASCII control sequences LF , CR LF and CR as end-of-line markers. These are the canonical sequences used by Unix, DOS and VMS, and the classic Mac OS (before OSX) respectively. You may therefore safely copy source code written on any of those systems to a different one and use it without conversion. (GCC may lose track of the current line number if a file doesn’t consistently use one convention, as sometimes happens when it is edited on computers with different conventions that share a network file system.) If the last line of any input file lacks an end-of-line marker, the end of the file is considered to implicitly supply one. The C standard says that this condition provokes undefined behavior, so GCC will emit a warning message. 2. If trigraphs are enabled, they are replaced by their corresponding single characters. By default GCC ignores trigraphs, but if you request a strictly conforming mode with the ‘-std ’ option, or you specify the ‘-trigraphs ’ option, then it converts them. These are nine three-character sequences, all starting with ‘ ?? ’, that are defined by ISO C to stand for single characters. They permit obsolete systems that lack some of C’s punctuation to use C. For example, ‘ ??/ ’ stands for ‘ \ ’, so ’??/n’ is a character constant for a newline. 1 UTF-16 does not meet the requirements of the C standard for a wide character set, but the choice of 16-bit wchar_t is enshrined in some system ABIs so we cannot fix this. Chapter 1: Overview 3 Trigraphs are not popular and many compilers implement them incorrectly....
View Full Document
- Summer '13
- GCC, C preprocessor