Shortcut to seniority

Definitions

Copy on write

Copy on write is a resource management technique used in programming to efficiently implement a copy operation. In case the resource is duplicated, but not modified, there is no need to create a new resource; we can share the same one until the first write occurs – thus the name. This significantly reduces the resource consumption for the unmodified copies, while adding a small overhead to operations that modify that given resource.

image with caption — Figure xxx - Definitions - Copy on write

In C++ for example, we have two ways of copying an object: deep and shallow copy.

A shallow copy is nothing more than a reference copy – we have one shared data and two objects (original and copy) which point to the same data, same location in memory.

A deep copy implies duplicating the object (allocating memory for it and copying all the data from the original to th new object).

If we want to implement in our code the Copy on Write mechanism, we will need to think of our object as a shared/sharable item.

Therefore, instead of having access directly to the data, we need to use a wrapper which also encapsulates a reference count.

In any functions that modify the data, we will need to detach from the owner (perform a copy) before modifying it.

However, certain measurements were made and it appears that Copy on Write is actually slower due to the synchronization needed for the reference count and to the introduced overhead of checking it before modifying.

Copy on Write does have its utility, though.

Serialization and deserialization

Serialization is the process of translating data objects or an object state into a format that can be stored (in a file, or memory buffer) to be transmitted over a network. Deserialization is the opposite of it, taking the bits and reconstruct the object (possibly in a different computer environment).

Usecases

There are a number of usecases in which we’d want to serialize our data:

Saving data for later use

In games, for example, we want to store information such as profile, most recent save point, items of the character, and so on. This way, the user will not start from the beginning each time the application crashes, or it is exited.

Retrieve information from a remote server

If we have an application and a server, and both of them keep a data structure with relevant information that should be transmitted between them, we need a way to translate them from raw bytes (the communication between them) to an object.

Formats

Serialization can be done by using text (human-readable) or binary (non-human-readable) formats.

Text format:

is easy to check/debug – you simply open the file and check the data
lets you ignore little-endian vs big-endian
can produce smaller results when most numbers are small

Binary format:

uses fewer CPU cycles
lets you ignore separations between adjacent values
can produce smaller results when most numbers are large

Advice

As advice, when serializing data, you should use a “magic” tag and a version number.

With the version number, we can make sure that the deserialization can be done properly.

In our application, we can either reject the file if the versioning has changed, or have different deserialization logic for each versioning, if we decide to make a radical change in the format.

Floating points

Single-precision floating-point format is a number format which represents a dynamic range of numeric values, by using a floating radix point (at the cost of precision).

Two most common floating-point storage formats are the following:

Single precision (IEEE short real) – 32 bits (1 sign, 8 exponent, 23 mantissa).
Double precision(IEEE long real) – 64 bits (1 sign, 11 exponent, 52 mantissa).

IEEE 754 Standard

The IEEE 754 standard defines number representations and operations for floating-point arithmetic.

In memory, we store floats by decomposing them in the following items:

sign (s) – 1 bit – the sign bit, so we know whether the value is positive or negative
mantissa (m) – 24 bits – the digits of the number
exponent (e) – 7 bits

The exponent is called biased because it has an offset of -127.

(1 = 2^-126, 127 = 2⁰, 254 = 2¹²⁷, 255 = infinity / NaN)

All numbers except 0.0 have a one 1 in binary as the first digit.
25.0 = 11001 binary = 1.1001 * 2⁴

In a decimal value (1.2345E6), 1.2345 is the mantissa, 6 is the exponent).

With floating point numbers,we can always normalize the number as follows:
0.01234E5 = 0.1234E4 = 1.234E3

Because it’s a waste to keep 1 in all numbers, we can always assume that the number starts with 1, and get an extra bit of precision for free.

For the 0.0 case, if both the exponent and the fraction is 0, then the number is plus or minus 0.0 (based on sign).

Subnormals are numbers where the exponent is zero, and they are a tradeoff between size and precision.

They are generally rare and most processors do not try to handle them efficiently.

The rule for subnormals is the following:

if the exponent is 0 , then the leading bit will be 0, and the exponent is fixed to -126 (0).
The smallest non-subnormal number will have 1 as exponent (0 is for subnormals) and 0 as fraction, which is 1.0 * 2 ^(-126)
The largest subnormal number will have 0 as exponent, and 0x7FFFFF as fraction (23 bits of 1), which is 0.FFFFFE * 2 ^(-126)
The smallest subnormal number will have 0 as exponent, and 1 as fraction, which is 0.000002 * 2 ^(-126)

Special floating point values

Signed zero

In the IEEE 754 standard, zero is signed, therefore we have a positive zero (+0) and a negative zero (-0).

The two values are equal when comparing them, but some operations may return different results.

1 / (-0) = -INF
1 / (+0) = +INF

Infinities

The infinities are also used in the IEEE 754 standard. The following values are handled when working with infinities:

+INF +7 = +INF
+INF * (-2) = -INF
+INF * 0 = NaN

NaNs

NaN (Not a number) is a special value to be returned as the result of certain invalid operations. (like the one from above). NaN will also propagate (almost all operations that involve a NaN will also result in a NaN).

An exception would be when there are some already known values that work for all floating point values, such as raising the value to the power of 0:
NaN⁰ = 1

Comparing two floating point numbers

Because we have a finite number of bits, we can’t represent all possible real numbers, therefore errors will occur due to approximations.

Because of the rounding errors, the floating point numbers may end up slightly imprecise, causing numbers considered to be equal to fail a simple equality test.

                                            
                                                float a = 0.15 + 0.15    
                                                float b = 0.1 + 0.2    
                                                    
                                                if (a == b) // can be false    
                                                if (a >= b) // can also be false

A solution would be to compare by using absolute error margins (whether their difference is very small). Usually, the condition threshold is called epsilon.

                                            
                                                if (abs(a-b) < 0.00001)

A fixed epsilon (which looks small in value) could still be too large when the numbers we compare are also very small.

When the numbers are very big, the epsilon could end up being smaller than the smallest rounding error, thus returing false in all situations.

We could check whether the relative error is smaller than epsilon:

                                            
                                                if (abs((a-b)/b) < 0.00001)

Now, the condition will still fail in some cases:

When both A and B are zero. 0.0/0.0 is not a number, and will either return false, or cause exceptions on some platforms.
When only B is zero, the division will return infinity. This may also cause exception on some platforms.
When A and B are very small, but on opposite sides of zero, will return false (even if they’re the smallest possible non-zero numbers).

API and ABI

Application programming interface (API)

API is a set of defined methods of communication between components. For example, an API refers to the functions that are exposed by a library.

Documentation for the API is usually provided to facilitate usage and implementation. In the same way GUIs (graphical user interfaces) makes it easier for people to use the application, APIs make it easier for developers to use the provided / exposed implementation and build the software.

After the API has been released to the public, you’d want to make sure that the interface is stable. Changing the internal logic is fine, but adding new parameters to function calls will break the compatibility between the old and the new version.

Application binary interface (ABI)

ABI refers to how parameters are passed, where the return values are placed, how the classes/objects are placed in memory, etc.

ABIs cover multiple topics:

the procedure call standard (calling convention) => refers to how the functions are translated to assembly code.
how the names of functions are represented so other code can call them (name mangling).
what type of data can be used, how they must be aligned in memory, etc.

The machine does not know what a function is, so when you write a function, the compiler will generate a line with a label close to the function name. This label will eventually get resolved into an address by the assembler.

When you call the function, the CPU will jump to the address of that label and continue the execution.

Before the function call, the compiler generates assembly code to:

Store the current address, so the CPU can jump back in and continue execution.
Pass the arguments (calling conventions imply to add them on a stack, in a particular order, or in registers, or in both places, in a specific combination.

After the function call, the compiler generates assembly code to:

Write the return value in the correct place (in a register, or on the stack, based on the convention).
Clean up the stack (either after the function is done and the CPU jumps to the caller, or right before the end of the function, based on the convention).

A library is binary compatible if a software dynamically uses a version of it, and it continues to run with the newer versions of the library, without having the need to recompile them.

As a summary:

API: These are the functions you can call.
ABI: This is how you can call a function.

In the end, both of them are conventions on how you define compatibility.

Shortcut to seniority

Badea Robert