Shortcut to seniority

Home
Go to main page

Section level: Junior
A journey into the programming realm

Section level: Intermediate
The point of no return

Section level: Senior
Leaping into the unknown
Go to main page
A journey into the programming realm
The point of no return
Leaping into the unknown
Code refactoring is the process of improving the design of the code, without changing its external behavior.
“Cleaning the code” makes it easier to extend and to maintain the code.
After we refactor the code, it will be simple, short, clear to understand, and without duplication.
We have many reasons for refactoring, but the most important are the following:
Many problems occur when the methods are too long. Long methods are troublesome because they are often involved in complex logic.
Extract method takes a snippet of code and moves it into a separate function.
void function_call() {
// do one thing
// do another thing
}
// =>
void function_call() {
do_one_thing();
do_another_thing();
}
This usually occurs either when we have a function that does multiple things, or when multiple functions contain the same copy-paste code.
So what we have is a function that does two things. For the code to be clearer and easier to reuse, we extract each piece of code in a separate function. Now, each function does only one thing.
Let’s give a real example now.
Response get(Request request) {
Response resp;
if ("" == request.cookies.login_token) {
// not logged in, redirect to login page
resp.result_code = 401;
resp.message = "You need to be logged in to access this page";
print("Error encountered code: 401");
return resp;
}
if ("" == request.url) {
// invalid url
resp.result_code = 400;
resp.message = "No url has been provided";
print("Error encountered code: 400");
return resp;
}
Data result_data = get_data(request.url);
if (false == result_data.is_valid) {
// error occured while connecting to servers
resp.result_code = 500;
resp.message = "Internal server error";
print("Error encountered code: 500");
return resp;
}
resp.result_code = 200;
resp.buffer = result_data;
return resp;
}
Let’s assume we have a function that receives a request and returns a response object, with some data in it, if everything was ok.
So, the function returns some error codes if the request was not ok, or if the server could not process it.
We can see that we do the same thing over and over again – check for error, pass a result code and a message, print / log the error, and return the response object.
Let’s use extract method on this code.
Response prepare_response(int status, std::string message="") {
Response resp;
resp.result_code = status;
resp.message = message;
return resp;
}
Response get(Request request) {
if ("" == request.cookies.login_token) {
// not logged in, redirect to login page
print("Error encountered code: 401");
return prepare_response(401, "You need to be logged in to access this page");
}
if ("" == request.url) {
// invalid url
print("Error encountered code: 400");
return prepare_response(400, "No url has been provided");
}
Data result_data = get_data(request.url);
if (false == result_data.is_valid) {
// error occured while connecting to servers
print("Error encountered code: 500");
return prepare_response(500, "Internal server error");
}
Response resp = prepare_response(200);
resp.buffer = result_data;
return resp;
}
What we did was to extract the creation of the response object.
You still think the code looks ugly?
Yeah, me too.
How would you refactor it further? Think for a bit, before checking the next page.
The ugliness comes from those print messages that are all over the code. If we want to handle another error, and add another result code, we’d also want to print that too, because we want to have some server logs for such errors.
Another problem here is that we have those error messages hardcoded – and even if they were moved to a constant string instead, we’d have to write the same error message on each request handle.
The solution would be to move those strings into a function that is handling the errors.
Response get(Request request) {
if ("" == request.cookies.login_token) {
// not logged in, redirect to login page
return handle_error(401);
}
if ("" == request.url) {
// invalid url
return handle_error(400);
}
Data result_data = get_data(request.url);
if (false == result_data.is_valid) {
// error occured while connecting to servers
return handle_error(500);
}
Response resp = prepare_response(200);
resp.buffer = result_data;
return resp;
}
Response handle_error(int code) {
Response response;
response.result_code = code;
switch (code) {
case 400:
response.message = "Invalid request";
break;
case 401:
response.message = "You need to be logged in to access this page";
break;
case 500:
response.message = "Internal server error";
default:
response.message = "An error has occured. Try again";
}
return response;
}
Printing of the error can be moved here, but then the prepare_response function would do two things – create the object, and log an error – so let’s move it one layer up – in the function that calls the get function.
Now, you can call the get from only one place and assure that an error will be logged, in case it occurs.
Response call_get(Request request) {
Response result = get(request);
if (result.result_code != 200)
print("Error encountered code: " + result.result_code);
return result;
}
You have a complex expression. The result of the expression (or part of it) can be moved in a temporary (local) variable with a name that explains its purpose.
if (favorites.find(movie.name) != favorites.end() && movie.seats > 0 &&
movie.price < 20.0 && movie.end_hour < 24) {
buy_ticket(movie);
}
After we refactor, we will have variables that are easy to read and that we can reuse further in the code. It also provides more information / intent about what the developer wanted.
is_favorite_movie = favorites.find(movie.name) != favorites.end();
available_tickets = movie.seats > 0;
affordable_tickets = movie.price < 20.0;
movie_ends_before_midnight = movie.end_hour < 24;
if (is_favorite_movie && available_tickets && affordable_tickets &&
movie_ends_before_midnight) {
buy_ticket(movie);
}
A class contains functionality that can be divided in multiple classes. Extract class implies creating a new class and moving the method/fields from one class to our new one.
Assuming we have the following structure containing Person related information:
class PersonInfo {
string first_name;
string last_name;
int age;
};
And the following implementation of the class People:
class People {
// people related functionality
vector<PersonInfo> _entries;
void read_from_csv(string path) {
// open csv as readable
// add data from the csv into _entries
// close the csv
}
void write_to_csv(string path) {
// open csv as writable
// write data from _entries into the csv
// close the csv
}
};
Assuming that the class is more than 17 lines and have more than these 2 functions, we can safely say that the reading and writing data from / to a csv can be moved into a separate class that has this sole functionality.
The CSV class is required to know about the PersonInfo class in order to be able to read and write from/into. However, the CSV class is not required to know about the People class, which uses the PersonInfo objects.
class People {
// people related functionality
vector<PersonInfo> _entries;
};
class CSV {
vector<PersonInfo> read(string path) {
// open csv as readable
// add data from the csv into the result vector
// close the csv
// return the result vector
}
bool write(string path, vector<PersonInfo>& data) {
// open csv as writeable
// write data into the csv
// close the csv
}
};
In case you have a primitive that requires additional data or behavior, you can wrap it into an object instead.
This is similar to extract class, which splits logic into classes based on functionality.
Assuming we have a class that contains a primitive, and we find out that we need to add additional functionality to that primitive, we may think of moving such functionality in another class.
string PersonalIdentificationNumber = "";
class Person {
string PersonalIdentificationNumber;
void setPIN(string pin) { PersonalIdentificationNumber = pin; }
};
That’s the current class, and we’re ok with this – we have a basic functionality.
But assuming we want to get some information from that string, we would have 4-5 functions poluting the Person class, although they are actually related to the Personal Identification Number. Also, by using an object, we limit the errors that can occur, since the function signatures will receive objects of that type, instead of strings, for example.
class Person {
PersonalIdentificationNumber pin;
void setPIN(PersonalIdentificationNumber newPin) {
pin = newPin;
}
};
class PersonalIdentificationNumber {
string pin;
setPIN(string newPin) {
pin = newPin;
}
string getSex() {
return pin[0];
}
string getYear() {
return pin.substr(1,2);
}
string getMonth() {
return pin.substr(3,4);
}
string getDay() {
return pin.substr(5,6);
}
string getIDSequence() {
return pin.substr(7,12);
}
};
Magic numbers are aliases for hardcoded values.
This is problematic because these magic numbers can be found in multiple files, and other than that, they don’t provide any specific meaning. In order to make the code more readable and easier to change the value if needed, there should be only one definition of this value, with a specific name.
float getTotalPrice(int basePrice) {
return basePrice + (basePrice * 24/100);
}
Is it clear what the following function does? What’s with that 24/100 ? What are we trying to calculate?
float getTotalPrice(int basePrice) {
float VAT_RATE = 24 / 100;
return basePrice + (basePrice * VAT_RATE);
}
I guess it looks more clear now, after we moved that magic number into a constant instead, and gave it a name.
The problem here was that we didn’t understood from the beginning what the function does.
However, that’s a clean example, because probably the 24/100 is really a constant that we may not want/need to change anytime soon, and probably is not found in so many files.
Assuming we have a GUI, a database, and some logic to fetch values from the database into the GUI, and we have a hardcoded value (let’s say 12) which will be the max number of items we can display from the database, we will have 12 in the following places:
That’s a lot of 12 in the code. And what happens if one day we decide to switch from 12 to 15? Find-replace all from 12 to 15? And are we confident that we won’t change other values from other objects? Or we need to have a look on each of them, one by one? Wouldn’t it be nicer if we’d have a constant somewhere in a config file, and read it from there on all sides?
It’s nice and clean, we know its purpose due to its naming instead of a magic value, and we have to modify the value only from one place.
In case a conditional statement (if – else) is too complicated, the conditional and the branches can be extracted into separate methods. This is pretty much refactoring by using extract method in both branches.
if (condition) {
// 100 lines of code follows
}
else {
// 100 lines of code follows
}
// =>
if (condition) {
function_A();
}
else {
function_B();
}
This is a good way to reduce the number of lines of your functions, and to allow the developer to reason better about the code, by moving it into another scope. What I mean with that is, if the developer goes through the initial function and all’s ok, he/she can forget about everything that happened there before calling the function.
void getResult(bool condition) {
EState state = STATE_UNKNOWN;
if (condition) {
state = onSuccess();
} else {
state = onError();
}
return state == STATE_OK;
}
bool booleanPass(bool condition) {
if (condition) {
return onConditionPass();
}
else {
return onConditionFail();
}
}
Both examples are ok. The first example is getting a state from each function and compares it when it returns, the other example returns directly from branches – they are both easy to reason and nothing needs to be remembered when entering the inner functions.
The same logic is called from all branches within a conditional expression. Moving it outside the expression simplifies the logic and the developer will no longer have to check all branches to make sure that a specific function is called.
if (condition) {
function_A();
function_common();
}
else {
function_B();
function_common();
}
// =>
if (condition) {
function_A();
}
else {
function_B();
}
function_common();
The reasoning behind this is very simple – We want to avoid duplication. It’s easier to understand the code if it’s less complicated.
void sendEmail(MailInformation info, BookDetails book, bool isPromo) {
if (isPromo) {
bool isValid = validateEmail(info);
if (isValid) {
book.price = book.price / 2;
info.subject = "Promotional sale";
MailSender::send(info.from, info.to, info.subject, book.price);
}
else {
Logger::log("Mail is not valid");
}
}
else {
bool isValid = validateEmail();
if (isValid) {
info.subject = "Regular sale";
MailSender::send(info.from, info.to, info.subject, book.price);
}
else {
Logger::log("Mail is not valid");
}
}
}
The problem with the code from above is that we have duplicated code. Other than that, we need to make sure that we send an e-mail for both cases.
That duplicated code can be moved outside the if/else statements, since it is executed for both of them.
Let’s see how that will look like.
void sendEmail(MailInformation info, BookDetails book, bool isPromo) {
if (false == validateEmail(info)) {
Logger::log("Mail is not valid");
return;
}
if (isPromo) {
book.price = book.price / 2;
}
info.subject = isPromo ? "Promotional sale" : "Regular sale";
MailSender::send(info.from, info.to, info.subject, book.price);
}
Now…does anyone have any doubt that we send the mail for both cases (promotional or regular sale) ? I don’t think so.
And imagine that the code could be much worse than that – and much more of a serious issue.
Let’s think of a case where we get a message (request) from a component, and that component awaits for a response from us, otherwise it will reset the system.
Now, this request function can have 3-4 nested if/else functions, with all of them sending the response before exiting the scope.
And what if, in one specific if-if-else statement, we don’t have the response? The system will restart, and the bug will be hard to catch, and will not reproduce all the time, because we will have to be in a specific state to reach that case.
void requestHeartbeat(RoutingInfo data, Message msg) {
Response rsp;
if (data.destination == MEMORY_ADDRESS_CURRENT_DEVICE) {
rsp.routing.destination = data.source;
rsp.routing.source = data.destination;
rsp.msg = HEARTBEAT_ACKNOWLEDGE;
if (system.is_ready_to_answer) {
if (Logger::is_active) {
Logger::log("Heartbeat sent");
}
else {
// No logger available
write_to_console("Heartbeat sent");
responseHeartbeat(rsp);
}
}
else {
// system is not ready to answer yet
if (!connected_to_server) {
// we need to wait to connect first - will be handled after
add_message_in_queue(rsp);
return;
}
else {
requestHeartbeatFromServer(data, msg);
}
}
}
else {
Logger::log("The heartbeat was not for us");
}
}
void responseHeartbeatFromServer(RoutingInfo data, Message msg) {
// Server acknolwedged us
Response rsp;
rsp.routing.destination = data.source;
rsp.routing.source = data.destination;
rsp.msg = HEARTBEAT_ACKNOWLEDGE;
responseHeartbeat(rsp);
}
void handle_queue() {
ResponseCache msg = getNextMessage();
if (queue.busy) {
return;
}
// ...handle the next item in the queue
}
So we have a queue, that handles messages, and we add message in the queue if we’re not yet connected, so probably we call handle_queue when we connect to server.
However, if we look closely, we can see that we don’t call responseHeartbeat on line 9, and the whole system will crash. The fix will be simply to move the response outside the if/else statement.
In case it’s not very clear what a method/variable does, just by reading its name, changing the name to make the intent clearer could be the best refactoring you could do.
Obviously, shorter names are generally better than longer names, as long as they are clear, but the focus should be on clarity.
string getDMY();
For example, what is the following line doing ? Is it clear? It returns a string, yes, but what’s in there?
string getDayMonthYear();
I also strongly suggest to rename methods when we have symmetrical actions, but with unexpected names (e.g. Open/Destroy instead of Open/Close).
However, this is not the only usecase in which we would like to rename a method or variable.
Let’s assume we have the following function signature:
void drawShape(Shape shapeObject);
What will we do if we want to draw something other than a shape, such as a human? Create another function, of course.
void drawHuman(Human humanObject);
Yes, that will work. So now we have the following functions:
void drawShape(Shape shapeObject);
void drawHuman(Human humanObject);
I don’t like this, both functions draw an object, be it shape or human, but they have different names. Now, as programmers, we need to keep in mind that we have functions with different names, based on the object they act on.
Even more than that, the name is redundant, we know we draw a shape, that’s what we provide.
And it could be even more verbose, if it were in a Shape class:
void Shape::drawShape();
That’s horrible. Instead of having an overloaded function, and be able to call it and let the compiler to figure out which function we want to call based on the parameter we give, we will have to explicitly call the one we want.
void draw(Shape shapeObject);
void draw(Human humanObject);
In case you have several methods that implement a very similar logic, we can create one function and customize the logic based on a parameter.
void sleep5sec() {
Logger::log("Sleeping for 5 seconds");
sleep(5);
}
void sleep3sec() {
Logger::log("Sleeping for 3 seconds");
sleep(3);
}
We clearly see that the logic is similar, so we can move it in one function, and add a parameter for the time.
void Sleep(int seconds) {
Logger::log("Sleeping for {0} seconds", seconds);
sleep(seconds);
}
This implementation will also be useful for future usage, since we will not need to duplicate the logic for other sleep values.
This refactoring method should occur only in such cases where the code can be reused, and not when the function logic changes based on the input.
In case two or more subclasses have the same function or variable, you should move it into the base class. This will assure that the logic is not duplicated. This works for classes within the same hierarchy.
On the next page, we have a code snippet that contains a base class, with some basic functionality, and some derived classes that inherit from this base class.
class Human {
void eat();
void sleep();
};
class Worker : Human {
virtual void work() = 0;
};
class Programmer : Worker {
string name;
};
class Manager : Worker {
string name;
};
Now, we have duplicated logic – which will probably extend to all classes, because all workers are humans, and all humans have a name. Therefore, we can move that variable and any other variables/methods that are common in the base class.
class Human {
void eat();
void sleep();
string name;
};
class Worker : Human {
virtual void work() = 0;
};
class Programmer : Worker {};
class Manager : Worker {};
In case a function or variable from a base class is used only in one specific subclass, we should move them there. This will assure that subclasses that inherit from the base class will contain only logic that is needed.
This is similar to pull up method/field, but it’s actually the opposite – we have some members in the base class that are not used by everyone who inherits from it.
class Vehicle {
float fuel_consumption_per_100km;
};
class Car : Vehicle {};
class Bicycle : Vehicle {};
class Skateboard : Vehicle {};
class RollerSkate : Vehicle {};
I know, it’s controversial, but bicycles, skateboards, and so on are still considered vehicles, although they don’t have any engine (maybe except for electrical bicycles?).
Nevertheless, they do not consume gas per 100km, not unlike the cars.
class Vehicle {};
class Car : Vehicle {
float fuel_consumption_per_100km;
};
class Bicycle : Vehicle {};
class Skateboard : Vehicle {};
class RollerSkate : Vehicle {};
There, much better.
And assuming we’ll have one more class that inherits from Vehicle and should use fuel_consumption_per_100km, but is not related to the Car class, we could create a base class for such type of vehicles.
Every class should have a single responsibility, and that responsibility should be entirely encapsulated by the class.