data | HaoIne

Binary

The computer actually could only process the binary, instead of decimals.

One binary digit is called bit.

We group up 8 bits into one byte.

Texts

We want computer to store, display and process texts, so we need to give each text characters a number to ‘encode’ it.

The first coding standard is ASCII, standing for American Standard Codes for Information Interchange, which encode a character to one byte, stored in computer.

But the problem is it can most encodes 256 chars.

Then comes the Unicode, in Unicode Transformation Format like UTF-8, or UTF-16 (UTF-8 is now used in 98% websites), containing millions of characters around the world.

And another interesting pro for Unicode is it’s down-compatible with ASCII.

In python

>>> w = 'hello hackers! 😄'
>>> type(w)
<class 'str'>
>>> w.encode()
b'hello hackers! \xf0\x9f\x98\x84'  # b'...' means a bytes class. Actually it is some bits, but python automatically display the bytes in ASCII, for human to read.
>>> b = w.encode()
>>> type(b)
<class 'bytes'>
>>> b.decode()
'hello hackers! 😄'
>>> w.encode("utf-16")
b'\xff\xfeh\x00e\x00l\x00l\x00o\x00 \x00h\x00a\x00c\x00k\x00e\x00r\x00s\x00!\x00 \x00=\xd8\x04\xde'

Hex

We turns binary to hexadecimal to better comprehension them, because it’s shorter.

But why don’t we use hex to store the files instead of binaries?

For example, assuming we have a byte, maybe 0101 0010, and in hex it is 5 2, shorter, right?
But computer should turn ‘5’ and ‘2’ into the ASCII numbers, which have a size of 8 bit, and this one byte 0101 0010 turns to a 2 bytes thing.

Higher than hex

Now we know hex is base16, and there is base32 and base64

Here is an example of base64 encoding:

In python

If you print(n) a number or convert it to a string with str(n), the number will be represented in base 10.
You can get a hexadecimal string representation of a number using hex(n).
You can get a binary string representation of a number using bin(n)
Converting a string to a number with int(s) will read it as a base 10 number by default.
You can specify a different base to use with a second argument: int(s, 16) will interpret the string as hex, int(s, 2) will interpret it as binary.
You can try to auto-identify the number base using int(r, 0), which requires a prefix on the string (0b for binary, 0x for hex, nothing for decimal).