Shortcut to seniority

Home
Go to main page

Section level: Junior
A journey into the programming realm

Section level: Intermediate
The point of no return

Section level: Senior
Leaping into the unknown
Go to main page
A journey into the programming realm
The point of no return
Leaping into the unknown
Copy on write is a resource management technique used in programming to efficiently implement a copy operation. In case the resource is duplicated, but not modified, there is no need to create a new resource; we can share the same one until the first write occurs – thus the name. This significantly reduces the resource consumption for the unmodified copies, while adding a small overhead to operations that modify that given resource.
In C++ for example, we have two ways of copying an object: deep and shallow copy.
A shallow copy is nothing more than a reference copy – we have one shared data and two objects (original and copy) which point to the same data, same location in memory.
A deep copy implies duplicating the object (allocating memory for it and copying all the data from the original to th new object).
If we want to implement in our code the Copy on Write mechanism, we will need to think of our object as a shared/sharable item.
Therefore, instead of having access directly to the data, we need to use a wrapper which also encapsulates a reference count.
In any functions that modify the data, we will need to detach from the owner (perform a copy) before modifying it.
However, certain measurements were made and it appears that Copy on Write is actually slower due to the synchronization needed for the reference count and to the introduced overhead of checking it before modifying.
Copy on Write does have its utility, though.
Serialization is the process of translating data objects or an object state into a format that can be stored (in a file, or memory buffer) to be transmitted over a network. Deserialization is the opposite of it, taking the bits and reconstruct the object (possibly in a different computer environment).
There are a number of usecases in which we’d want to serialize our data:
In games, for example, we want to store information such as profile, most recent save point, items of the character, and so on. This way, the user will not start from the beginning each time the application crashes, or it is exited.
If we have an application and a server, and both of them keep a data structure with relevant information that should be transmitted between them, we need a way to translate them from raw bytes (the communication between them) to an object.
Serialization can be done by using text (human-readable) or binary (non-human-readable) formats.
Text format:
Binary format:
As advice, when serializing data, you should use a “magic” tag and a version number.
With the version number, we can make sure that the deserialization can be done properly.
In our application, we can either reject the file if the versioning has changed, or have different deserialization logic for each versioning, if we decide to make a radical change in the format.
Single-precision floating-point format is a number format which represents a dynamic range of numeric values, by using a floating radix point (at the cost of precision).
Two most common floating-point storage formats are the following:
The IEEE 754 standard defines number representations and operations for floating-point arithmetic.
In memory, we store floats by decomposing them in the following items:
The exponent is called biased because it has an offset of -127.
(1 = 2-126, 127 = 20, 254 = 2127, 255 = infinity / NaN)
All numbers except 0.0 have a one 1 in binary as the first digit.
25.0 = 11001 binary = 1.1001 * 24
In a decimal value (1.2345E6), 1.2345 is the mantissa, 6 is the exponent).
With floating point numbers,we can always normalize the number as follows:
0.01234E5 = 0.1234E4 = 1.234E3
Because it’s a waste to keep 1 in all numbers, we can always assume that the number starts with 1, and get an extra bit of precision for free.
For the 0.0 case, if both the exponent and the fraction is 0, then the number is plus or minus 0.0 (based on sign).
Subnormals are numbers where the exponent is zero, and they are a tradeoff between size and precision.
They are generally rare and most processors do not try to handle them efficiently.
The rule for subnormals is the following:
In the IEEE 754 standard, zero is signed, therefore we have a positive zero (+0) and a negative zero (-0).
The two values are equal when comparing them, but some operations may return different results.
The infinities are also used in the IEEE 754 standard. The following values are handled when working with infinities:
NaN (Not a number) is a special value to be returned as the result of certain invalid operations. (like the one from above). NaN will also propagate (almost all operations that involve a NaN will also result in a NaN).
An exception would be when there are some already known values that work for all floating point values, such as raising the value to the power of 0:
NaN0 = 1
Because we have a finite number of bits, we can’t represent all possible real numbers, therefore errors will occur due to approximations.
Because of the rounding errors, the floating point numbers may end up slightly imprecise, causing numbers considered to be equal to fail a simple equality test.
float a = 0.15 + 0.15
float b = 0.1 + 0.2
if (a == b) // can be false
if (a >= b) // can also be false
A solution would be to compare by using absolute error margins (whether their difference is very small). Usually, the condition threshold is called epsilon.
if (abs(a-b) < 0.00001)
A fixed epsilon (which looks small in value) could still be too large when the numbers we compare are also very small.
When the numbers are very big, the epsilon could end up being smaller than the smallest rounding error, thus returing false in all situations.
We could check whether the relative error is smaller than epsilon:
if (abs((a-b)/b) < 0.00001)
Now, the condition will still fail in some cases:
API is a set of defined methods of communication between components. For example, an API refers to the functions that are exposed by a library.
Documentation for the API is usually provided to facilitate usage and implementation. In the same way GUIs (graphical user interfaces) makes it easier for people to use the application, APIs make it easier for developers to use the provided / exposed implementation and build the software.
After the API has been released to the public, you’d want to make sure that the interface is stable. Changing the internal logic is fine, but adding new parameters to function calls will break the compatibility between the old and the new version.
ABI refers to how parameters are passed, where the return values are placed, how the classes/objects are placed in memory, etc.
ABIs cover multiple topics:
The machine does not know what a function is, so when you write a function, the compiler will generate a line with a label close to the function name. This label will eventually get resolved into an address by the assembler.
When you call the function, the CPU will jump to the address of that label and continue the execution.
Before the function call, the compiler generates assembly code to:
After the function call, the compiler generates assembly code to:
A library is binary compatible if a software dynamically uses a version of it, and it continues to run with the newer versions of the library, without having the need to recompile them.
As a summary:
In the end, both of them are conventions on how you define compatibility.
Data race occurs when more than two threads access the same memory and one of them is a write.
Polymorphism – “poly” (many) + “morphe” (form)
Binary search: split the range in half and check it from there. Based on the result, you will reduce half of the work by knowing in which half is the problem contained.