Shortcut to seniority
Home
Go to main page
Section level: Junior
A journey into the programming realm
Section level: Intermediate
The point of no return
Section level: Senior
Leaping into the unknown
                    Go to main page
A journey into the programming realm
The point of no return
Leaping into the unknown
Debugging is the process of finding and resolving defects or problems within a software. Debugging tactics involve control flow analysis, unit testing, integration testing, log file analysis, memory dumps, or profiling.
There are three types of errors: Syntax errors, Run-time errors, and logical errors.
Syntax error is an error that will mostly be shown by the framework or compiler. In such cases, you should simply check what went wrong from the syntax point of view and fix it.
Run-time error is something that occurs when the software is running, and it encounters an unexpected situation. These could be easy to find, especially if they reproduce all the time, since the framework will throw an exception and you can have a look at the callstack and the value of the data and understand what’s going on.
Logical errors are the hardest to detect and cannot be shown by the compiler, since it doesn’t generate any error when the code is compiling.
Each problem we’re having can be solved by dividing it into four steps:
In order to identify the problem, you need to ask yourself a few questions:
| Repro steps | Expected results | Actual results | 
|---|---|---|
| What have you done? | What should happen? | What it actually happen? | 
You should focus on understanding the problem.
If you know which function causes the problem, you should check the documentation for it.
If you don’t understand why something is in the code, you should read it and try to figure out what it’s used for and what’s it’s purpose. If you can’t find it, you can check on the commit history.
Now that you understood the problem, get rid of it. Sounds simple, but it involves between changing one character, one line of code, one function, to refactoring set of classes. The most important thing is to keep testing your code and not get carried way and write hundreds of lines of code and then to find out that the code you wrote is not working and then spend more time to figure out why it does not.
Once you have a solution, try to replicate the conditions that led to the bug, and confirm that it’s not there anymore. Also, check that you haven’t introduced new bugs with your change, and that the old functionality is still working properly.
Run your test cases and, if possible, create some new ones to validate your changes.
The term ‘bug’ refers to a defect in the software, causing it to behave unexpectedly.
Core dump (crash dump, memory dump) consists of the recorded state of the software memory at a specific time, when the application has crashed or terminated abnormally. The generated core dump usually contain debug symbols and can be analyzed (the stack trace, the file/line where it crashed, the value of variables etc.) in order to fix the problem.
Segmentation faults (segfault) is a failure condition raised by hardware with memory protection, that is notifying the operating system that the software has attempted to access a restricted area of memory (memory access violation), or attempts to access a memory location in a way that it is not allowed (attempting to write to a read-only location, or to overwrite part of the Operating System).
Segmentation faults arise primarily due to errors in use of pointers, in programs written in languages that provide low-level memory access.
Symbols are a collection of data that allows debuggers to match the name of the variables and methods, the line numbers, or the source files to a running program. Symbol files are created when we build the application, and they are usually found next to the binary of our software.
When something unexpected occurs, an error (exception) can be raised and handled by the software, by either displaying it, recovering from it, or other solutions. If the software does not handle the exception, it will cause the software to terminate.
Based on their impact on the software / company, bugs can be sorted through their severity and priority. The severity of an issue refers to the impact it has on the system – whether the software can recover, if there’s a workaround for it, and so on.
The priority of an issue refers to the number of users it reaches – this could be related to the visibility of the issue, or to the reproduction rate it occurs. Basically, the probability of the bug to happen.
The severity is associated with the standards, while the priority is associated with scheduling (urgency).
The levels of severity and priority could be in digits (1 for most important, 5 least important) or in labels (critical, high, medium, low, trivial).
This category is for critical issues, which causes a crash, a required feature is missing or it is unusable, and so on.
For example, if our software is showing pictures of cats, but there’s a problem with the database and we cannot load any pictures – that’s a top issue for the company. This means that we will not be able to provide our main functionality to the users.
In this category we keep issues that are very severe, but it’s hard to encounter them, either because they don’t reproduce very often, or that they require a big number of steps to get there.
Let’s say that we release a feature in the beta version, but it’s buggy – we care about it, but it doesn’t impact all the other clients, so we can fix it later, and first work on the buggy features that actually impact the majority of the people.
This category is for issues that are not impacting the functionality of the software, but they are very critical to the business, such as, having the software name misspelled in a logo.
This category is for issues that do not affect the business and are not so important, such as cosmetic fixes – for example paragraph misalignments, or issues that occurred one-time only. These issues are usually tagged as nice to have, and they are fixed only if there's time allocated for it.
This may be obvious, but some people don’t give too much importance to the error messages, and they try to find the error themselves. Most of the error messages are accurate and descriptive, and they tell you what went wrong with your code, or at least the line number it reached before it crashed.
If you have your source code stored in a version-control repository, you can check the last few commits and see whether you can spot the issue. If possible, you could also build the binary with a previous commit and see if the problem is still there.
If there are many commits, and you know that version #100 worked, but version #200 does not work, you should not take them one by one, but use binary search instead. Try to see if version #150 works, and if it does, the error must be between version #150 and #200, otherwise, the error must be between version #100 and #150. By doing so, you just skipped checking 50 versions. Continue with this until you find the problematic commit.
You should have some unit tests already created, and it will be useful to check whether the unit tests still pass.
This step is mostly important when you make changes in the code.
IF the unit tests passed, but you still have an issue in the code, it means that the problem is not yet covered by a test, that the issue doesn’t always reproduce, or there are some prerequisites that are not covered by the test.
If there’s a unit test that fails, you already know where to look.
If possible, it would be fantastic if you could also write a test case that reproduces the bug, in order to be sure that the issue has been fixed once you have a solution (test fails before the fix, and passes after the fix).
Rubber duck debugging is a well-known technique, in which the developer is debugging the code by explaining it line by line to a rubber duck. Sometimes when we explain a problem to someone else, we reach the solution while being in the process of explaining the problem. You could obviously talk to a colleague about the problem you’re having, but using a rubber duck will at least not interrupt anyone else, and help you getting unstuck.
Simple as that. It’s easy to get stuck with all the details and to start losing focus. Taking a break will reset your mind and allow you to tackle the issue from a different perspective. Usually, an idea of how to solve a bug occurs when you are not actively thinking about it. Sleep on it actually works!
If you can reproduce the issue, it’s already a big success. It means you know how to trigger it, more or less where to look in the code, and you can check whether the issue was fixed once you do some changes.
It’s another way to tackle the problem, by whitelisting parts of the code. Instead of looking for the problem, you look for where the code is working properly, and you can take your mind away from that area.
In case of multithread applications, you should also consider adding the thread-id. This helps you differentiate between same debug messages and filter only what happens on specific threads of interest.
You should know what is the value of your variables. If you’re not sure what are those values, the first step is to print them out. If there are a lot of messages, it could be helpful to print a fixed string (such as the name of the function) before you print the variable, so you can tell where the value is printed from.
You should at least log the input and output structures for validation.
In case of multithread applications, you should also consider adding the thread-id.
Keep in mind that adding more log messages will change the timings so in case of multithreading your problems might disappear entirely.
If you are sure that some parts of the code do not affect the functionality or reproducibility of the issue, you can comment them out. In some functions, you can also overwrite the values received as parameters, to see how the function behaves.
A debug symbol is a special kind of symbol that attaches additional information to the symbol table of an object file. This information allows the debugger to gain access to information from the source code of the binary, such as variable or function names, etc. This information can be helpful while trying to investigate and fix a crashing application.
In case you are unable to recompile the source code, the options you have to find the issue are more limited, and they depend on the application you’re debugging.
If you have access to the binary that was used to find the bug, you can use the debuggers to set breakpoints, step into and over functions, and inspect memory during the execution to get additional run-time information.
Profiling tools will help you in finding some problems, such as memory leaks or multithreading issues. Simply checking the memory consumed by the application (through task manager on windows, or with top command on linux) can indicate that you have a memory leak somewhere (the memory keep increasing either with time or when you execute a specific action/sequence).
If you’re able to create some unit tests and validate the functionality of your application, do it. If the unit tests fail, you found the problem, and if they return success, you found where the problem is not.
If the application is using a network, you can use a packet sniffer to debug it.
A package sniffer is a tool that is used to diagnose network-related issues - it works by intercepting and logging network traffic, so you can check what packages are being sent and received through the network.
If you are using encryption such as SSL, you’ll only be able to see the source and destination information, but the data will still be encrypted.
If the application is using a database, you can monitor the queries that are sent to it, or check the database logs to see what changes were made. If the problem you’re having is related to the database, the problem could be that the column names were wrongfully written, the data is not as the expected type, or a ‘required’ field was not set.
If a core dump was created, you should use a virtual machine to test on the same OS and hardware setup as the system that created the core dump. Some crashes occur only a specific environment.
Data race occurs when more than two threads access the same memory and one of them is a write.
Polymorphism – “poly” (many) + “morphe” (form)
Binary search: split the range in half and check it from there. Based on the result, you will reduce half of the work by knowing in which half is the problem contained.