Shortcut to seniority

Home
Go to main page

Section level: Junior
A journey into the programming realm

Section level: Intermediate
The point of no return

Section level: Senior
Leaping into the unknown
Go to main page
A journey into the programming realm
The point of no return
Leaping into the unknown
Data structures and algorithms topics are tightly coupled, and both of them should be taken into account when designing our application.
An array is an area of allocated memory for a set of elements of a known (fixed) size. The data stored in the arrays is allocated in contiguous memory (memory blocks have consecutive addresses).
Let's say we have:
The following table displays the memory address for the elements in the array.
Index | Value | Memory address |
---|---|---|
0 | 30 | 0xABCDEF00 |
1 | 34 | 0xABCDEF04 |
2 | 27 | 0xABCDEF08 |
3 | 68 | 0xABCDEF0C |
Because the memory is contiguous and that we know the starting memory address of the array, we can directly read/write a value by knowing its index. That is because we can calculate its memory address and go directly to it by using the following formula:
Output = Starting memory address + (Size of element * index)
A vector is similar to an array, but the memory for it is allocated dynamically (It’s like an array with grow/shrink functions).
The vector implementation also contains a data type called capacity, which keeps the maximum items the vector can hold. When we attempt to add more items than the vector can hold, a new (and bigger) vector is created, and the data from the old one is moved / copied to the new one.
By having the flexibility of having dynamically allocated space, while still having contiguous memory for its elements, a vector is usually the go-to solution.
A list is another way to store a collection of elements.
Unlike vectors, the elements in a list are not subsequent in memory - each element in a list is stored in the form of a node. The node contains the value for that specific element, and a pointer to an address memory where we can find the next element.
The linked list name comes from the way the elements are linked to each other, like a chain.
The first node is always used as a reference to traverse the list, and it is called head. The last node will always point to NULL.
In a list, each node is separately allocated, and all traversal operations chase pointers to their corresponding memory address, most likely causing a cache miss (the value is not stored anywhere close to the CPU, where it would be easy to fetch it, but it needs to be read from the main memory, which is slow).
Inserting or removing an element in a list is as simple as changing two pointers, but in order to do that, we need to have access to the memory location of the node prior to the one we need to insert to or delete from, so we still need to make a list traversal up to that point.
In a single linked list, each node has a pointer towards the next one, allowing us to traverse only forward.
In a double linked list, each node has two pointers: one for the next element, and one for the previous one, allowing us to traverse both forwards and backwards.
Circular linked list is similar to the usual linked list (can be both single- and double- linked list), with the exception that the last element, instead of pointing to null, will point to the head of the list.
Stacks and Queues are both special-purpose lists, that restrict how the data can be accessed. Both data structures are very simple and can be implemented with either linked-lists or vectors.
Stacks are dynamic data structure that use the LIFO principle (Last In First Out): the last block that was added is the first one to be removed. For example, if we have a stack of dishes in the sink, the first dish that we take out of the stack is the one on the front, which is also the last one that was added.
In a stack, elements can be added or removed only from the top of the stack.
The insertion of an element in the stack is called push()
, while the removal is called pop()
.
Queues are dynamic data structures that use the FIFO principle (First In First Out): the first element that was added is the first one to be removed. As an analogy, a Queue can be explained as a line of people waiting for the bus: The person at the beginning of the line is the first one to enter the bus.
In a Queue, elements are always added at the back of the queue, and removed from the front of it.
The insertion of an element in the queue is called enqueue()
, while the removal is called dequeue()
.
Hash tables are data structures that utilize the concept of {key, value} pairs. The value is the data that we store in the container, whereas the key is something that is computed from the value, with the help of a hash function.
The idea of hashing is to distribute the pairs (entries) uniformly across an array.
When we attempt to add an element in the hash table, the hash function computes the key, which will point to an internal container called a bucket, in which we can find all the values that have the same key.
We can think of a bucket as a linked list in which all the nodes are elements with the same key computed by the hash function. We are using a list because there’s a many-to-one relation between keys and values => multiple values can have the same hash key.
Hash tables are very useful, because given a set of data big enough, we can still access the element of interest quite fast, by simply using the hash function. With an array, we were required to know the index of where we can find it, whereas with a linked list, we would have needed to iterate through the list items to find it.
We see that we want to insert multiple values into our hash table.
We calculate the hash value for a specific item (with the help of the hash function), and that hash value will be the key / bucket entry.
Collision is a term used In case multiple keys return the same hash value. We can see that in this case, a solution will be to use a linked list in each bucket, so we can store multiple items in the same bucket.
Another solution would be to expand the number of buckets and recalculate the hash values for all objects, so we will not have collisions anymore.
Example:
A graph data structure is a connection of nodes (vertices) connected together through lines called edges. Each node can point to any number of nodes. A graph can also have some edge value assigned to each edge, to represent some specific attribute (cost, capacity, length, etc.).
There are two types of graphs: Directed graphs, and undirected graphs.
Directed graphs contain edges that are directed from one node to another. The nodes may be traversed only by following the assigned direction on each node.
Undirected graphs contain edges that do not have any direction, and can be traversed in both directions.
A cycle in a graph refer to a situation in which nodes are connected in a way that they form a closed structure.
If we traverse a cyclic graph and we don’t keep track of what vertices we visit, we will iterate through the same nodes infinitely.
Example: A -> B -> D -> C -> A -> B -> D -> C -> A ...
A tree structure can be defined as a collection of nodes, where each node is a data structure consisting of a value, and a list of references to its direct descendent nodes.
The first node (entry point) of the tree is called the root. If this root node (or any other node) is connected to another node, the root is called the parent node (ancestor), whereas the connected node is a child (descendent).
The last nodes (terminal nodes – nodes without children) are called leaves. We call siblings all the nodes that are on the same tree level. A subtree of a tree refer to a partion of a tree that can be viewed as a complete tree itself.
Other properties of a tree include:
A binary tree is a tree where a parent node can only have 2 children: left and right. Each node contains the data (value) for the item, a pointer to the left child, and a pointer to the right child.
Common types of binary trees include:
A spanning tree of a graph is a tree that connects together all the vertices, but not all the edges. There can be multiple spanning trees in a graph. The weight of a spanning tree is the sum of weights given to each edge of that spanning tree.
A minimum spanning tree (MST) is a spanning tree with the lowest weight, in comparison with all the other possible spanning trees. A MST has (V-1) edges, where V is the number of vertices in the graph.
Binary Heap is a Complete Binary Tree, in which nodes are arranged based on their value. Every heap data structure has the following properties:
A Binary Heap is either a Min Heap or Max Heap.
A min heap is a specialized full binary tree in which every parent node contains lesser or equal value than its child nodes. In a min heap, the root has the minimum value than all other nodes in the min heap.
A max heap is a specialized full binary tree in which every parent node contains greater or equal value than its child nodes. In a max heap, the root node has the maximum value than all other nodes in the max heap.
A trie is a data structure used to store data (usually strings), and it is used to represent the retrieval of data (thus, the name).
The prefix of a string refers to any substring of it, starting from the beginning of the string.
Example:
Prefixes for the word Shortcut:
S, Sh, Sho, Shor, Short, Shortc, Shortcu
Tries are used on groups of strings, and by receiving a prefix of a word, we can find all the possible words we can find starting with that prefix.
Using the entire dictionary of a language, and a single string, we can find the maximum length from the dictionary strings that matches that given string. We can also do autocomplete when we search something on the web, for example.
An example of platform is the operating system, so we can include windows-related code only if we’re building the binary for windows, for example.
A pointer references a location in memory.
CPU cache is a small storage space that keeps the most frequently used data in order to reduce the accesses to the main memory.
Except for languages that contain garbage collector (which automatically frees memory that is no longer referenced).
NULL refer to the absence of a value, which means that it is not the same as having the value 0.
A Complete binary tree is a binary tree in which all nodes except leaves have two children, and the last level has all keys as left as possible.