Shortcut to seniority

Home
Go to main page

Section level: Junior
A journey into the programming realm

Section level: Intermediate
The point of no return

Section level: Senior
Leaping into the unknown
Go to main page
A journey into the programming realm
The point of no return
Leaping into the unknown
In all numbering systems, the formula to calculate a number is as follows:
We calculate the value of each digit, and then we add them together.
Not clear? No problem, let's see some examples in decimal and binary systems.
In the decimal system, we have 10 available value: zero to nine.
Like i previously said, each digit of a number have a different value, based on its position in that number.
Let’s take for example the number 1234:
1 | 2 | 3 | 4 |
Now, let’s display the position for each digit:
1 | 2 | 3 | 4 |
103 = 1000 | 102 = 100 | 101 = 10 | 100 = 1 |
And this is how we calculate the value based on the digits:
(1 * 103) + (2 * 102) + (3 * 101) + (4 * 100) = 1000 + 200 + 30 + 4 = 1234
In the binary system, we only have two possible values, zero and one.
Similar to decimal system, each position also carries a value, but instead of having 10 as base
(100 = 1, 101 = 10, 102 = 100), we use the base two
(20 = 1, 21 = 2, 22 = 4).
Let’s take for example the value 1010:
1 | 0 | 1 | 0 |
Now, let’s display the position for each digit:
1 | 0 | 1 | 0 |
23 = 8 | 22 = 4 | 21 = 2 | 20 = 1 |
To calculate the value, we do:
(1 * 23) + (0 * 22) + (1 * 21) + (0 * 20) = 8 + 2 = 10.
With this, any number can be represented with ones and zeros, assuming we use enough bits for the job.
Decimal (base 10) | Binary (base 2) | ||||
---|---|---|---|---|---|
Digit | Power | Value | Digit | Power | Value |
1 | 3 | 103 = 1000 | 1 | 3 | 23 = 8 |
2 | 2 | 102 = 100 | 0 | 2 | 22 = 4 |
3 | 1 | 101 = 10 | 1 | 1 | 21 = 2 |
4 | 0 | 100 = 1 | 0 | 0 | 20 = 1 |
With 8 bits, we can store numbers between 0 and 255.
That is because each bit represent a value, which is the power of 2 based on their position.
The highest value is set when all bits are set to 1, which is equal to 255 (128 + 64 + 32 + 16 + 8 + 4 + 2 + 1).
1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
27 = 128 | 26 = 64 | 25 = 32 | 24 = 16 | 23 = 8 | 22 = 4 | 21 = 2 | 20 = 1 |
We call an octet a unit of information that consist of 8 bits. Most people use the word byte instead of octet, and that is not exactly correct because a byte is not always equal to 8 bits, but is guaranteed to contain at least 8 bits.
If this is not clear yet, let’s assume we want to calculate the following numbers:
Let’s apply the knowledge we got from school on how to add two numbers - by adding together two individual digits. If the sum of two digits is a two-digit number:
Let’s take them digit by digit and calculate their real value based on their position in the number.
Decimal 1234 = (1 * 1000) + (2 * 100) + (3 * 10) + (4 * 1)
Binary 1010 = (1 * 8) + (0 * 4) + (1 * 2) + (0 * 1) = 8 + 2 = 10 (decimal)
Note: In school, we learned that:
a / b = divisor (the number of times a is divided by b)
a % b = remainder (what remains after we divided the values – also called modulo)
if we want to divide ten items in groups of 3, we will have 3 full groups of 3 items each, and in the end we’re left with 1 remaining item.
In the decimal system, we cannot go higher than the digit 9, because we are in base 10.
When that happens, we take the remainder (13 % 10 = 3) and store 1 for the next digit operation.
In the binary system, we cannot go higher than the digit 1, because we are in base 2.
When that happens, we take the remainder (2 % 2 = 0) and store 1 for the next digit operation.
Therefore, in base 2, 1 + 1 = 10.
Decimal | Binary | Incremental with 1 | Logic |
---|---|---|---|
0 | 0000 | 0000 + 0001 = 0001 | 0 + 1 = 1 |
1 | 0001 | 0001 + 0001 = 0010 | 01 + 01 = |
2 | 0010 | 0010 + 0001 = 0011 | 10 + 01 = 11 |
3 | 0011 | 0011 + 0001 = 0100 | 11 + 01 = |
4 | 0100 | 0100 + 0001 = 0101 | 100 + 001 = 101 |
The hexadecimal system is widely used in programming, as it provides a more human-friendly representation of binary-coded values. A hexadecimal digit uses a four-bit aggregation (called a nibble) to represent its values.
The hexadecimal digit uses the decimal numbers (0-9) and six extra symbols for the values for 10 and beyond, taken from the english alphabet (hexadecimal A = decimal 10, B = 11, … F=15).
To avoid confusion with the decimal system, we write the numbers with the letter h after the value (25h = 25 hexadecimal), or with 0x before the number (0x25). Similar to the other systems, the value is the sum of the values calculated by multiplying the digit value with the base to the power.
Let’s take for example the value ABCD:
A | B | C | D |
163 | 162 | 161 | 160 |
To calculate the value, we do the following:
(A * 163) + (B * 162) + (C * 161) + (D * 160) = (10 * 163) + (11 * 162) + (12 * 161) + (13 * 160) = 43,981
Therefore, if we want to provide some values to the computer, it’s easier to write 0xCDEF, instead of writing 1100 1101 1110 1111.
Since 0xF is the value 1111, the value 0xFF will be a full byte (1111 1111).
Decimal | Hex | Binary |
---|---|---|
0 | 0 | 0000 |
1 | 1 | 0001 |
2 | 2 | 0010 |
3 | 3 | 0011 |
4 | 4 | 0100 |
5 | 5 | 0101 |
6 | 6 | 0110 |
7 | 7 | 0111 |
8 | 8 | 1000 |
9 | 9 | 1001 |
10 | A | 1010 |
11 | B | 1011 |
12 | C | 1100 |
13 | D | 1101 |
14 | E | 1110 |
15 | F | 1111 |
The conversion between hex and binary is quite easy.
From hex to binary, we must change from base 16 to base 2, and since 16 is a power of 2, we simply convert each hex digit to the four binary bits that correspond to that value.
Let’s take for example 0xABCD.
With the help of the values table, we can directly transpose them to binary.
0xABCD = 1010 1011 1100 1101
From binary to hex, we first need to assure that the number of bits are a multiple of 4. If that’s not the case, we will concatenate the value with leading 0s to make it so, because leading 0s add no value to the number (same as with decimals, the value 42 being equal to 0042). Afterwards, we group 4 bits at a time and convert them to their hex correspondent.
Let’s take for example the sequence 10 1100.
First step is to assure we have a multiple of 4, and trailing zeros are not relevant (value 12 in decimal = 012 = 0012).
So, our bit sequence with trailing 0s will be 0010 1100.
The conversion between decimal and other bases is a bit more difficult than that. The easiest way would be to divide by the base repeatedly, and then read the remainder.
Given the value 245, converting from decimal to binary (dividing by 2) is as follows:
Value | Remainder | Formula |
---|---|---|
245 | 1 | 245 / 2 = 122, remainder 1 |
122 | 0 | 122 / 2 = 61, remainder 0 |
61 | 1 | 61 / 2 = 30, remainder 1 |
30 | 0 | 30 / 2 = 15, remainder 0 |
15 | 1 | 15 / 2 = 7, remainder 1 |
7 | 1 | 7 / 2 = 3, remainder 1 |
3 | 1 | 3 / 2 = 1, remainder 1 |
1 | 1 | 1 / 2 = 0, remainder 1 |
Let’s read the remainder now (from lowest to highest). The value is 1111 0101.
The number 1111 0101
= (1 * 27) + (1 * 26) + (1 * 25) + (1 * 24) + (0 * 23) + (1 * 22) + (0 * 21) + (1 * 20)
= 128 + 64 + 32 + 16 + 4 + 1
= 245
Value | Remainder | Formula |
---|---|---|
245 | 5 | 245 / 16 = 15, remainder 5 |
15 | 15 | 15 / 16 = 0, remainder 15 |
Let’s read the remainder now (from lowest to highest). The value is 0x15 0x5.
Since 15 = 0xF, the hex equivalent is 0xF5.
Let’s convert to binary so we can confirm this with the math from above.
Let’s take an example to understand what’s going on.
Value | Operation | Result |
---|---|---|
0100 (4) | < < 1 | 1000 (8) |
1000 (8) | < < 1 | 0000 (0) |
1000 (8) | > > 1 | 0100 (4) |
0100 (4) | > > 1 | 0010 (2) |
0001 (2) | > > 1 | 0000 (0) |
The concept of bit masking is a bit more difficult to grasp, but I’ll try to explain it as simple as possible.
The bit masking is performed with the AND operator (&), and we will need to have a mask of bits that we apply to our data.
Given the binary sequence 0010 0011, let’s assume we want to get the value of the bit from the first position.
0b is just a way of saying that our number is in binary form. Not all programming languages support such feature, but we’ll use it here for making it more clear.
value = 0b0010'0011;
bit = value & 0x1;
We take the whole number, and then we do a logical AND with the value 0x1.
Keep in mind that 0x1 = 0000 0001, as we can have trailing 0s.
Doing an AND on this and any other bit sequence will return the following:
0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | & |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
This way, we managed to take only the bit value for the last bit. Let's try now with a mask that consists of multiple bits.
Assuming we want to take the first four bits (0xF):
0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | & |
0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | |
0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |
We can see that the result is the same with what was set on those positions.
But what if we want to do a mask in the middle of a bit field? Let’s say we want to take the bits 2-3.
For this, we set 1 as many as bits we want to return (2 in our case, for bits from [2,3]) and we shift them to that location.
mask = 0b0000'0011 << 2;
0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | & |
0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
We can see that our end result is 00 because the bit values in the number we applied the mask on were 00 as well. If any bits were there with the value of 1, the AND result would have also been 1.
Since we need two letters to represent a byte, we can split them in two parts: high order, and low order.
For the value 0xAC (1010 1100), 0xA (1010) is the high order part, whereas 0xC (1100) is the low order part).
If we want to take them separately and then put them together, we need to perform a set of logical operations.
The low part (0x0C) will be taken by doing an AND call between our number and the range we’re interested in (0xF for 4 bits).
num = 0xAC; // 1010 1100
low = num & 0xF; // 0000 1111
1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | & |
0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | |
0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | LOW PART |
The high part (0x0A) will be taken by shifting the bits four positions to the right, overwriting the bits that are there. Since we shift them, what is left on the left side will be filled with 0’s, so we will remain only with the bits we’re interested in.
num = 0xAC; // 1010 1100
high = num >> 4; // 0000 1010
1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | >> 4 |
0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | HIGH PART |
When combining together, we need to move the high bits to where they are supposed to be, so we have to shift them back on that position. The OR operation will set the bit to 1 if there’s any bit to 1 in that operation, and since we use it with bits that are on different positions, it’s a simple concatenation of bits for our final number.
low = 0xC; // 1100
high = 0xA; // 1010
num = (high << 4) | low;
0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | Low |
0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | High |
1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | High << 4 |
1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | High << 4 | low |
What happens if we want to increment a value, but it’s the highest possible value for that data type?
Will the number of bytes that we store in memory will increase? Of course not.
This is called an integer overflow. When we want to go over the highest possible value in a data type, we overflow the range we can use to represent that value. The value that will be set after incrementing with 1 over the highest possible value will be the lowest possible value for that data type.
The most common result will be to wrap around the maximum – and be set to 0 (lowest possible value).
255 + 1 will not be equal to 256 since that would require the bit 29 to be set to 1, and we only have 8 bits in total.
In this case, 0 is the lowest possible value, because we use all bits to represent a positive number, therefore we can represent the range 0-255.
But there are cases in which we need to represent negative values as well. But how can we do that, if we represent numbers as sequences of bits? One way of representing this is through making use of the leftmost bit as a flag (sign bit).
A signed variable is a variable that can hold both positive and negative values. On the same principle, an unsigned variable can only hold positive values.
For an 8 bit data type, by using 1 bit for the sign, we are left with 7 bits for the data. Therefore, this means that the range will be from -128 to +127.
Why so?
1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
The first bit is the sign, and 1 shows that we are a negative value. Therefore, the number will start with the – sign.
Having 0 everywhere means this is the smallest possible negative value.
0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
In this case, the sign is 0, so we are dealing with a positive value.
As we’re used to by now, we calculate the decimal value in the same way – by powers of 2.
111 1111 will be our data (127) and the sign is +, so our maximum value will be +127.
An integer underflow occurs when we substract over the minimum value. Decreasing with 1 from the minimum value will cause the data to reach the highest maximum value that can be stored on those bits.
On a computer, the memory is organized into bytes. When we assign a value to a data type, we write that value at a memory location.
Pointers are used to store the address location of another variable located in memory. Based on the type it points to, the compiler will know how many bytes to read starting from that location and how to interpret those bytes.
When we want to write in memory a value with a size larger than a byte, we need to decide how those bytes will be stored.
Endianness refers to the order in which bytes are written in memory.
When reading memory or receiving transmitted data from a different computer system, it is required to verify the endianness, and if they differ, to process and translate data between one and another.
Before we talk about endianness types, we need to first define what is the most significant byte, and the least significant byte.
In the number 123456, 1 is the most significant digit, and 6 is the least significant digit.
That is because changing 1 will have the highest impact upon the number, whereas changing 6 will not have such a big impact on the overall number.
Under the same logic, assuming we have the data 0x1234, 0x12 is the most significant byte, and 0x34 is the least significant byte.
Big-endian is a format in which the most significant byte is stored or sent first (has the lowest address), and then we send the following bytes in decreasing significance order.
Assuming we want to store the value 0x12345678 in big endian, we will store them on an incremental memory address, as follows:
Little-endian is a format that reverses the order. We send or store the least significant byte first (lowest address) and the most significant byte last (highest address).
Until now, we talked about bits and bytes, but how does computers represent other types of information, such as text, video, or audio? Well, all of these things can be represented with numbers.
If we think of all the letters in the alphabet, we can represent each of them with a different number.
A = 1, B = 2, C = 3, etc.
D | A | T | A |
4 | 1 | 20 | 1 |
0000 0100 | 0000 0001 | 0001 0100 | 0000 0001 |
With this, we can represent any word as a sequence of numbers. Every word we see is displayed through data.
But what about images, and videos?
Images and videos are made out of a big number of dots called pixels, and each pixel has a color, each of them being represented as numbers.
The most common system used to represent them is RGB (Red, Green, Blue), where each pixel is a combination of these 3 bytes, that goes from 0 - 255. A typical image has millions of pixels, and a video with 30 FPS (Frames per second) means that there are 30 frames drawn on the screen (rendered) each second, or in other words, 30 images per second.
Any sound is a series of vibration, and graphically, they are represented as a waveform. Any point on the waveform can be represented by a number.
With this, any sound can be broken down into a series of numbers. For a higher quality sound, you pick 32-bit audio over 8-bit audio, because more bits imply a higher range of numbers, so it is more precise.
When we want to create a variable to hold the value 1 for example, we need to specify to the computer where we want to store it, how much space (bytes) it should hold, and how to interpret the data that is at that location. But declaring all of that is really not as complicated as it looks like, the value 1 is an integer, so we need to use an integer type to store it, and the compiler will allocate enough space for an integer to be placed at that memory location.
Below, you’ll find a table with the standard types and their minimum size in bytes. I say minimal, because it’s not necessary to be of that size, but it’s the smallest possible size for each of the types – therefore these types can have a higher number of bits in different architectures or different programming languages.
These standard types are often called primitives, due to the fact that they are basic building blocks and they are integrated directly into the language.
Type | Keyword | Example | Range | Minimum size in bytes |
---|---|---|---|---|
Boolean | bool | True | Limited to true/false | 1 |
Character | char | a | -128 to +127 | 1 |
Small number | short | 3 | -32,768 to +32,767 | 2 |
Number | int | 40000 | -2,147,483,648 to +2,147,483,647 | 4 |
Long number | long | 9999999999999 | -263 to +(263 -1) | 8 |
Decimal number | float | 1,38 | 3.4E +/- 38 (7 digits) | 4 |
Decimal number (double precision) |
double | 1,400000000000000 | 1.7E +/- 308 (15 digits) | 8 |
An example of platform is the operating system, so we can include windows-related code only if we’re building the binary for windows, for example.
A pointer references a location in memory.
CPU cache is a small storage space that keeps the most frequently used data in order to reduce the accesses to the main memory.
Except for languages that contain garbage collector (which automatically frees memory that is no longer referenced).
NULL refer to the absence of a value, which means that it is not the same as having the value 0.
A Complete binary tree is a binary tree in which all nodes except leaves have two children, and the last level has all keys as left as possible.