Shortcut to seniority

To improve code quality, many development teams are using static code analysis tools which allow them to:

Perform change impact analysis to determine how changing code in one area may affect functionality in another
Assure compliance with coding standards
Measure code complexity
Find valid bugs

Specific terms

Technical debt

Technical debt is a concept that reflects the cost of additional rework caused by choosing the fast solution for now instead of going with a better, more elegant, but slower approach.

When you’re under pressure, you’ll sometimes have to choose between “doing it the quick way” or “doing it the right way”.

Sometimes, you’ll go with the quick solution, with the promise that you’ll come later and fix it.

The problem is that, most of the times, there will be new pressure and new problems, and therefore you’re stuck with technical debt.

There are times where you have to do a quick fix, to meet a deadline for example. In such cases, you should track the technical debt and pay it quickly.

If you leave the technical debt unpaid, more and more of such issues will stack, leaving the code in a bad shape and/or harder to refactor/scale the system.

Common causes of technical debt are:

Insufficient definition up-front: Occurs when requirements are still being defined during development, or when development starts before the design is over
Last minute specification changes: Occurs when requirements or change requests are defined during development phase, and when the project does not have time or budget to get through all development phases.
Business pressure: Occurs when the business considers getting something released fast.
Lack of a test suite: Encourages quickfixes and workarounds and may introduce regressions.
Lack of documentation: The work to create any supporting documentation is also part of technical debt
Lack of knowledge: Occurs when the developer is still young and with no experience, and doesn’t know how to write elegant code.

Key Performance Indicator (KPI)

A Key Performance Indicator (KPI) is a measurable value that demonstrates how effectively a company is achieving key business objectives. Organizations use KPIs at multiple levels to evaluate their success at reaching targets.

One way to evaluate the relevance of a KPI is to use the SMART criteria.

Smart comes from the following criterias:

Specific: Is the objective specific?
Measurable: Is the progress towards the goal measurable?
Attainable: Is the goal realistically attainable?
Relevant: Is the goal relevant for the organization?
Time-bound: Is the time-frame correctly estimated for achieving the goal?

Failure mode and effects analysis (FMEA)

FMEA is a technique for failure analysis, which involves reviewing the system in an attempt to identify failure modes, their causes, and their effects. All of the findings are written in a FMEA worksheet document.

Software Audit

A software audit review, or software audit, is a type of software review made by one or more auditors that are not members of the software development organization. The purpose of a software audit is to provide an evaluation of the product or processes to applicable regulations, standards, guidelines, plans, and procedures. Parts of the software audit can be done using static analysis tools.

Coding standards

Coding standards are a set of rules and guidelines for the project. Using a coding standard assures that we’re compliant with the industry standards, we have a consistent (minimum) of code quality, and a secure code.

At the start of the project, everyone agrees and try to follow the coding standards.

When the developer is under pressure, these are the first ones to go.

Other than that, following a coding standard is quite boring unless automated.

But why would we want to follow a coding standard?

A reason would be to format the code uniformly, to preserve the quality, and to make it easier for everyone to work under the same project.

Most of our programming time we spend navigating and reading code, finding the place where we want to make the change than actually typing, and therefore we should strive to optimize that.

Code that is uniformly indented is easy to scan.

Static source code analysis

Static source code analysis tools are the best way to enforce coding standards.

Static source code analysis

Static source code analysis refers to the automated testing of a program’s source code with the purpose of finding errors at compile time.

Errors detected by static code analysis have different priority levels, based on the risk they provide to the software.

Knowing the errors is a good starting point for writing safer code, as you will think about these errors when writing the software.

Prio 1

Null Pointer dereference

Dereferencing means reading from or writing to the memory value at a given memory address.

When you have a pointer to an object, to dereference a pointer means to read or write the data that the pointer is pointing to.

A null pointer dereference is an error that occurs when a program attempts to read or write to a memory from a null pointer.

A null pointer has a reserved value, indicating that it refers to no object. Since a null pointer does not refer to an object, a running program that contains a null pointer dereference immediately generates a segmentation fault.

                                            
                                                int *x = nullptr;    
                                                int y = *x;         // dereference x, trying to read it  
                                                *x = 0;             // dereference x, trying to write it

A solution would be to always verify the state of the pointer before accessing it.

                                            
                                                int *x = nullptr;  
                                                if (x) {  
                                                int y = *x;  
                                                *x = 0;  
                                                }

Division by zero

A division by zero occurs when a program attempts to divide by zero.

                                            
                                                int numerator = 1;  
                                                int denominator = 0;  
                                                int division_by_zero = numerator / denominator;

Depending on the programming environment and the type of the number that is being divided by zero (Floating point or integer), the application can generate positive or negative infinity, generate an exception, an error message, crash, etc.

A solution would be to create a helper function that returns some specific value or throws an exception in case the division was not possible.

                                            
                                                int divide(int numerator, int denominator) {    
                                                if (denominator == 0)     
                                                    throw overflow_error("Division by zero");    
                                                return numerator / denominator;    
                                                }    
                                                    
                                                int main (void) {    
                                                try {     
                                                    int i = divide(10, 0);     
                                                }    
                                                catch (overflow_error e) {    
                                                    cout << e.what();    
                                                }    
                                                return 0;    
                                                }

Double free / delete

Double free occurs when we attempt to de-allocate the same memory address twice.

                                            
                                                int *op = new int;  
                                                int *op2 = op;  
                                                delete op;  
                                                delete op2;

When a buffer is deallocated, a linked list of free buffers is read to rearrange and combine the chunks of free memory, to be able to allocate large buffers in the future. Unlinking an unused buffer (due to deallocation of a memory address) could allow an attacker to write arbitrary values in memory, overwriting valuable registers, and calling shellcode from that buffer.

Double free vulnerabilities have three common causes:

Error conditions, other exceptional circumstances.
Usage of the memory space after it’s freed.
Confusion on who is responsible for freeing the memory

Buffer overflow

A buffer is a sequential section of memory. In a buffer overflow, we provide to a fixed-length buffer more data than it can accomodate. The extra data overflows into an adjacent memory space, overwriting or corrupting the data that already exists there.

                                            
                                                struct user_account {   
                                                char name[16];  
                                                int permissions = 0;  
                                                };  
                                                
                                                user_account user;  
                                                for (int i = 0; i <= 16; ++i) {  
                                                user.name[i] = 1;  
                                                }

Therefore, you could say that buffer overflow occurs when you copy a bit too much.

We have a structure that contains 16 bytes for the name (1 char = 1 byte) and an integer.

We iterate in a for loop, from 0 until we reach 16 (less or equal compare). The problem is that in the range of 0-16 there are 17 numbers, so the last write (when i == 16) will be over the char buffer. That is because a buffer of size 16 is from 0 to 15. When we access the buffer through square brackets (user.name[i]), the language will calculate the memory address we want to write into, based on the index and data type.

User.name[16] will access the memory address of user[1 * 16] when writing into it, which is beyond the char [] buffer, and will write in the memory address of the element which is next of it, into our permissions integer.

Although this looks like a mistake that can be avoided by the programmer, the buffer overflow is a security risk and could overwrite the address we’re accessing next, and replace it with other code (code injection).

A solution would be to use static source code analysis to find such errors, as they are of big risk and impact, either by writing in memory addresses that would cause errors in the code, crashing the application or causing hard to find bugs, or allowing an attacker to take control of the system.

Null test after dereference

This occurs when the code analyzer finds a dereference and a null check for the same pointer, but not in the expected order. First, we dereference the pointer, so we can use it, and only afterwards we are checking it for null, meaning that if it was null when we dereferenced it first, it will crash the application.

                                            
                                                if(ptr && *ptr == value); // ok  
                                                if(*ptr == value && ptr); // crash

Let’s see another example of this.

                                            
                                                struct Obj {  
                                                int data;  
                                                };  
                                                
                                                Obj* getObj(bool value) {  
                                                if (value)   
                                                    return new Obj;  
                                                    
                                                return false;  
                                                }  
                                                
                                                Obj *obj = getObj(false);   
                                                obj-> data = 10;  
                                                
                                                if (!obj)   
                                                cout << "Cannot allocate object";

The problem is that we call a function that can return a null pointer, we dereference it, and then we make a null test on that pointer.

Prio 2

Uninitialized variable

An uninitialized variable is a variable that is declared but we do not set its value before using it.

In such cases, the variable will contain garbage data.

As best practice, we should set all variables to an initial value, right where we declare them, and we should declare them right before using them.

                                            
                                                int getSum(int i, int j) {  
                                                int sum;  
                                                sum += i;  
                                                sum += j;  
                                                return sum;  
                                                }

Unreachable call

Unreachable call is the part of the source code which can never be executed, and sometimes people refer to it as dead code. This could happen because there’s no way the compiler could reach that part of the code, either due to an early return, or wrongly put conditional statements.

Let’s see some examples.

                                            
                                                int value = 2;    
                                                if (value == 4) {    
                                                cout << "Will never print this line";    
                                                }

                                            
                                                int function(int x, int y) {  
                                                return x + y;  
                                                int deadCode = x + y;  
                                                }

Type mismatch

A type mismatch occurs when data types are not matching.

This error can occur when we pass arguments of type int and the function expects doubles, or when we return double from a function and the function actually returns int.

                                            
                                                int func(double i, double j) {    
                                                return i + j;    
                                                }    
                                                    
                                                int i = 0;    
                                                int j = 1;    
                                                double result = func(i, j);

Cast alters value

As the name occurs, casting from one type to another may occur in data loss.

For example, if we have an integer type (4 bytes) with the value being its maximum possible value, and a character type, which is stored in 1 byte, casting from one to another will always occur a data loss, because we cannot squeeze 4 bytes of data into 1 byte.

                                            
                                                int myInt = 10;  
                                                
                                                char c = (char)myInt;  
                                                char c = char(myInt);  
                                                char c = static_cast<char>(myInt);

Coercion alters value

Cast is an explicit conversion, whereas coercion is an implicit conversion.

Let’s assume we have a function called getChar(), which returns an unsigned char.

                                            
                                                unsigned char getChar();

We talk about an implicit conversion when we try to assign one type but we return another one.

This is similar with type mismatch, which is a more general error.

                                            
                                                unsigned char i = getChar();  
                                                char j = i;

Line 2 is a type mismatch and, more exactly, an implicit conversion performed by the compiler (Coercion).

Why is this problematic?

A char type requires 1 byte (8 bits) of memory. Therefore, with 1 byte, the maximum value it can have is 255 (1111 1111).

If the type is unsigned, it means that the value can be only positive, therefore they don’t need a bit sign.

Since all bits are used to store the value, and the value can be only positive, the values are between 0 and 255.

If the type is signed, it means that the value can be both positive and negative. Therefore, one bit is used as a bit sign.

Since one bit (the first one) stores the sign, there are only 7 bits left to store the value.

If the first bit is 0, the value is positive, and if the first bit is 1, the value is negative.

The highest value that can be stored in 7 bits is +127 (0111 1111).

The smallest value that can be stored in 7 bits is -127 (1111 1111).

If we decrement 1 from -127, the value will wrap to +127, and if we increment 1 from +127, the value will wrap to -127.

Therefore, the values are between [-127,-0] and [+0, 127]. For signed data, the value 0000 0000 will be -0, and the value 1000 0000 will be +0.

If getChar() returns a value between 0-127, the signed char returned will have the same value.

Othewise, if the value returned is between 128-255, the signed integer will overflow and the value will wrap around, going into the negative space.

For example, if getChar() will return 128, and we store it into a signed value, the value returned will be -127.

Negative Character Value

This issue refer to performing an operation outside the bounds of a memory address, due to not properly checking the index bounds on the array.

                                            
                                                int getValue(vector<int> vec, int index) {    
                                                int value = -1;    
                                                if (index < vec.length()) {    
                                                    value = vec[index];    
                                                    cout << "Value retrieved is" << value;    
                                                } else {    
                                                    cout << "Unable to get value";    
                                                }    
                                                    
                                                return value;    
                                                }

However, this method only verifies that the given array index is less than the maximum length of the array, but does not check for the minimum value. This will allow a negative value to be accepted as the input array index, which will result in an out of bounds read and may allow access to sensitive memory. The input array index should be checked to verify that is within the maximum and minimum range required for the array. In this example the if statement should be modified to include a minimum range check, as shown below.

                                            
                                                if (index >=0 && index < vec.length())

Double mutex lock / unlock

Double lock error occurs when we attempt to lock the same mutex twice, which leads to undefined behavior.

                                            
                                                mutex mx;    
                                                    
                                                void function() {    
                                                mx.lock();    
                                                mx.lock();    
                                                }

Double unlock occurs when we attempt to unlock the same mutex twice, which leads to undefined behavior.

                                            
                                                mutex mx;    
                                                    
                                                void function() {      
                                                mx.lock();      
                                                // do something      
                                                        
                                                mx.unlock();      
                                                mx.unlock();      
                                                }

Prio 3

Ignored return value

This error occurs when we do not check the return value of a function that return something other than void.

Assuming we have the following function:

                                            
                                                bool getResult(int& value) {  
                                                if (someCondition) {  
                                                    value = doGetValue();  
                                                    return true;  
                                                }   
                                                    
                                                return false;  
                                                }

In the next snippet we can see how the function may be called.

                                            
                                                int someValue;  
                                                getResult(someValue);  
                                                // use someValue

The problem is that we use someValue variable without checking the return value of the getResult function.

                                            
                                                bool state = getResult(someValue);  
                                                if (state) {  
                                                // use someValue  
                                                }

Useless assignment / unused value

Useless assignment is an error where we assign a value to a variable that is not used further in its scope.

                                            
                                                int sum(int a, int b) {  
                                                int sum1, sum2;  
                                                sum1 = a + b;  
                                                return sum2;  
                                                }

Empty if statement

Empty if statement refers to when we have no implementation for a branch of the if-case.

                                            
                                                if (something);  
                                                if (something) {}

We could have something like that when we care only about the if / else case.

A reason could be that we want to let other developers know that we thought of the code for that branch, or add brackets just for future developments:

                                            
                                                bool result = getResult();  
                                                if (result) {  
                                                    responseOK();  
                                                } else {  
                                                    // to be done  
                                                }

Redundant condition

The condition is redundant, meaning that the statement will always be true or false.

This occurs, for example, when we assign a bool to true/false directly, and then checking its value.

                                            
                                                bool value = true;  
                                                if (value) {  
                                                // redundant  
                                                }

Another example would be the following function: if val is smaller than threshold, it will surely be smaller than threshold summed with another positive value.

                                            
                                                bool isBiggerThan(int val, int threshold) {  
                                                    return (val < threshold && val < threshold + 1);  
                                                }

Software development metrics

Software development metrics are used to measure the quality of the software, and observing the metrics over time will provide some insight of the application, such as:

Is the application getting harder to maintain, or maybe the opposite?
Is the source code more tightly coupled?
Is there a lot of repetitive code going on?
Are the program structures too big or complex?

The metrics are there to identify the trends, to find areas that need improvement, and to allow the team to take proactive actions. They are only guidelines and ways to track the trends and should be in no way set as rules / compliance parameters.

Code Metrics

There are a number of metrics that analyze the source code, from simplest (program length, lines of code) to more complex ones, such as cyclomatic complexity.

Cyclomatic Complexity

Cyclomatic Complexity measures the unique logical paths that can be taken at the method level, such as if/else statements, switch instructions or early returns. For example, a function that simply returns has a Cyclomatic Complexity of 1, and each instruction that branches off (such as an if) will increase the logical path the application can take by one.

The bigger the number is, the more complex the function gets, because there are many branches that the developer needs to think of, so the goal is to have a small number.

Lines of Code

Lines of code is a metric that can be used both at the method level, but also at the class/project level. Theoretically, any method that cannot be entirely visible on a screen is probably too long and can be reduced through refactoring.

Number of fields/methods

Classes with too many fields/methods may be doing too much or are more difficult to maintain. Let’s say that having around 10 methods is a reasonable threshold, and having more should probably be a starting point for investigation.

Number of Parameters

Methods with too many parameters could indicate that they either exist in the wrong class or they are not stored in a class at all.

Number of Variables

A large number of variables declared within a method usually indicates that the function does too many things, and that also means that they are more difficult to maintain over time.

Efficiency

Efficiency measure the amount of productive code written by developers. This is proportionate to the amount of code churn, which measures code that is not productive. Code churn refers to the number of lines of code that were modified in a specified period of time.

Code reviews

A code review is an evaluation method used to identify errors in the code during the development process. When a programmer wants to submit their code to production, the changes that he/she added must first pass the code review.

The set of changes that were sent for review is called a changelist.

A code review consists of four phases: The request, the evaluation, the approval / feedback, and the developer action/acknowledgement.

Phase 1: The Request

The code review starts with the developer making a request for a code review. The code shall not be submitted until the code review is finished.

Phase 2: The Evaluation

The reviewers (which consists of developers, mainly those familiar with that specific part of the code) will review / evaluate the newly added code, trying to identify issues that could come up through the newly changed code.

Phase 3: The Approval / Feedback

Once the reviewer(s) has identified errors in the code, they provide feedback to the author. In this phase, the code is either approved or rejected.

Phase 4: Developer Action / Acknowledgement

In this phase, the author reads the review and the feedback, and makes any required changes to the code.
In case there are changes needed to be done, the code review is updated and the code review goes back to the evaluation phase.
If the code is approved, the developer can go to the next gate (usually meaning he/she can push the changes to production / development or master branch / etc.)

Benefits of Code review

Saving time and money

Code reviews are very important because errors are discovered earlier in the process, saving time and money for the company. Bugs fixed later in the SDLC are significantly more expensive to fix than bugs fixed early on.

Strengthen the skills

When code reviews are done right, they can strengthen the skills and increase the knowledge of development teams. Other benefits of code review include:

Knowledge transfer
Increase in team awareness
Development of alternative solutions to problems

Performing a code review

There are many things to look for when performing a code review.

First of all, you should look for the functionality. By the time the code reaches the code review, we assume / expect that the developer tested the code well-enough, that the application starts and the functionality that was changed or the interaction with the new changes is working correctly, in other words, that no regressions were introduced and that the feature was fully implemented. Testing refers to both manually and automatic testing, so unit tests should be part of the commit.

Another thing to look for is the complexity of the code – are the lines too complex, or maybe the functions, or the classes? When we say complex, we refer to the fact that they cannot be quickly understood by other developers, meaning that there are more chances bugs can hide in those areas, or there can be potential future bugs when others try to call or modify the code.

Are there unit tests commited? The code should also contain some unit tests to verify the new functionality, or the test results should be attached to the task / review to confirm that the old functionality is still working properly.

Other important things to look for are the namings, the existance and relevance of the comments, and that the new code conforms to the style guide used within the project.

When providing feedback to the person that implemented the code, through comments on the review, remember to be respectful and helpful. We do that by making sure we always comment the code, not comment about the developer.

Code reviews are done to improve the quality of the code and find bugs before being merged, and to improve the skills of the developers so that they require less review over time and become independent and acknowledgeable about the project. Therefore, you should not write code for them, nor go into too much details, but point out the problems and let them make decisions on how they would change the code. Common errors to look out for when doing code review are:

Syntax errors

Syntax errors are errors in the syntax of the code (such as typos/spelling errors), which result in build failure. The developer must make sure that the code builds, otherwise no testing was made anyway. These types of errors should not exist, and if they do, they should not pass the code review.

Runtime errors

Runtime errors are often triggered by illegal user input in code which is not handled correctly, but they can also occur when the code was not fully implemented (such as not handling all code paths, usually the error occurs here).

In most cases, the bugs are due to not thinking of all the scenarios that can occur.

Memory Leak: causes a program to use more RAM while the program is running.
Program Crash: happens when a program stops running (can be caused by memory leaks or programming errors).
Division by Zero: happens when a user inputs a zero in the denominator field.
Logic Errors: occurs when your code produces unintended outcomes. Logical errors are the hardest to catch. These errors are caused by an error in the logic of the application, and although the code compiles and the application starts, or even behaves properly in most of the cases, it will still be flawed when reaching the branch of the code where the logical error exists.

Shortcut to seniority

Badea Robert