cs lecture nov 4

cs lecture nov 4 - cs lecture nov 4 Unicode \/...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: cs lecture nov 4 Unicode \/ codepoints-assigns values for over 100000 characters from many languages (living & dead) range 0 -> 100,000 (or more) ^ 2^17 ~17 bits ~ 3 bytes!! (3 bytes per character is too much???) a codepoint: 97 (U+0061), binary: 110001, codepoint is same as ASCII value ~ sun 26085 (U+65E5), binary:110010111100101 Representing codepoints one way UTF-8 UTF-8 is backwards compatible with ASCII variable_byte encoding, uses 1-4 bytes depending on the value of the codepoints (other encodings eg UFF-16,etc) 1 byte character (UTF 8) Unicode codepoints are exactly ASCII characters 0->127 0000000->01111111 ^when this bit is zero, it means one byte character 2-byte characters (UTF-8) codepoint xxxxxyyyyyy (up to 11 bits) in binary becomes: 110xxxxx 10yyyyyy encodes codepoints up to 11111111111=2^11-1=2047s 3-byte characters codepoint xxxxyyyyyyzzzzzz up to 16 bits becomes 1110xxxx 10yyyyyy 10zzzzzz codepoints up to 2^16 1=65535 4-byte characters (UTF-8)...
View Full Document

Page1 / 2

cs lecture nov 4 - cs lecture nov 4 Unicode \/...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online