data
Binary
The computer actually could only process the binary, instead of decimals.
One binary digit is called bit
.
We group up 8 bits into one byte
.
Texts
We want computer to store, display and process texts, so we need to give each text characters a number to ‘encode’ it.
The first coding standard is ASCII, standing for American Standard Codes for Information Interchange, which encode a character to one byte, stored in computer.
But the problem is it can most encodes 256 chars.
Then comes the Unicode, in Unicode Transformation Format like UTF-8, or UTF-16 (UTF-8 is now used in 98% websites), containing millions of characters around the world.
And another interesting pro for Unicode is it’s down-compatible with ASCII.
In python
1 | 'hello hackers! 😄' w = |
Hex
We turns binary to hexadecimal to better comprehension them, because it’s shorter.
But why don’t we use hex to store the files instead of binaries?
For example, assuming we have a byte, maybe 0101 0010
, and in hex it is 5 2
, shorter, right?
But computer should turn ‘5’ and ‘2’ into the ASCII numbers, which have a size of 8 bit, and this one byte 0101 0010
turns to a 2 bytes thing.
Higher than hex
Now we know hex is base16, and there is base32 and base64
Here is an example of base64 encoding:
In python
- If you
print(n)
a number or convert it to a string withstr(n)
, the number will be represented in base 10. - You can get a hexadecimal string representation of a number using
hex(n)
. - You can get a binary string representation of a number using
bin(n)
- Converting a string to a number with
int(s)
will read it as a base 10 number by default. - You can specify a different base to use with a second argument:
int(s, 16)
will interpret the string as hex,int(s, 2)
will interpret it as binary. - You can try to auto-identify the number base using
int(r, 0)
, which requires a prefix on the string (0b
for binary,0x
for hex, nothing for decimal).