Shortcut to seniority

Basic structures

Arrays
Vector

Arrays

An array is an area of allocated memory for a set of elements of a known (fixed) size. The data stored in the arrays is allocated in contiguous memory (memory blocks have consecutive addresses).

Let's say we have:

an array with 4 elements (30, 34, 27, 68)
the size of an element being 4 bytes
the array starting from memory address 0xABCDEF00

The following table displays the memory address for the elements in the array.

Index	Value	Memory address
0	30	0xABCDEF00
1	34	0xABCDEF04
2	27	0xABCDEF08
3	68	0xABCDEF0C

Because the memory is contiguous and that we know the starting memory address of the array, we can directly read/write a value by knowing its index. That is because we can calculate its memory address and go directly to it by using the following formula:
Output = Starting memory address + (Size of element * index)

Vector

A vector is similar to an array, but the memory for it is allocated dynamically (It’s like an array with grow/shrink functions).

The vector implementation also contains a data type called capacity, which keeps the maximum items the vector can hold. When we attempt to add more items than the vector can hold, a new (and bigger) vector is created, and the data from the old one is moved / copied to the new one.

By having the flexibility of having dynamically allocated space, while still having contiguous memory for its elements, a vector is usually the go-to solution.

Linked list

A list is another way to store a collection of elements.

Unlike vectors, the elements in a list are not subsequent in memory - each element in a list is stored in the form of a node. The node contains the value for that specific element, and a pointer to an address memory where we can find the next element.

The linked list name comes from the way the elements are linked to each other, like a chain.

The first node is always used as a reference to traverse the list, and it is called head. The last node will always point to NULL.

In a list, each node is separately allocated, and all traversal operations chase pointers to their corresponding memory address, most likely causing a cache miss (the value is not stored anywhere close to the CPU, where it would be easy to fetch it, but it needs to be read from the main memory, which is slow).

Inserting or removing an element in a list is as simple as changing two pointers, but in order to do that, we need to have access to the memory location of the node prior to the one we need to insert to or delete from, so we still need to make a list traversal up to that point.

Single-linked list

In a single linked list, each node has a pointer towards the next one, allowing us to traverse only forward.

image with caption — Figure 7 - Data structures - Single Linked list node

Double-linked list

In a double linked list, each node has two pointers: one for the next element, and one for the previous one, allowing us to traverse both forwards and backwards.

Circular linked list

Circular linked list is similar to the usual linked list (can be both single- and double- linked list), with the exception that the last element, instead of pointing to null, will point to the head of the list.

Stack and queues

Stacks and Queues are both special-purpose lists, that restrict how the data can be accessed. Both data structures are very simple and can be implemented with either linked-lists or vectors.

Stack
Queue

Stacks are dynamic data structure that use the LIFO principle (Last In First Out): the last block that was added is the first one to be removed. For example, if we have a stack of dishes in the sink, the first dish that we take out of the stack is the one on the front, which is also the last one that was added.

In a stack, elements can be added or removed only from the top of the stack.

The insertion of an element in the stack is called push(), while the removal is called pop().

Queues are dynamic data structures that use the FIFO principle (First In First Out): the first element that was added is the first one to be removed. As an analogy, a Queue can be explained as a line of people waiting for the bus: The person at the beginning of the line is the first one to enter the bus.

In a Queue, elements are always added at the back of the queue, and removed from the front of it.
The insertion of an element in the queue is called enqueue(), while the removal is called dequeue().

Hash tables

Hash tables are data structures that utilize the concept of {key, value} pairs. The value is the data that we store in the container, whereas the key is something that is computed from the value, with the help of a hash function.

The idea of hashing is to distribute the pairs (entries) uniformly across an array.

When we attempt to add an element in the hash table, the hash function computes the key, which will point to an internal container called a bucket, in which we can find all the values that have the same key.

We can think of a bucket as a linked list in which all the nodes are elements with the same key computed by the hash function. We are using a list because there’s a many-to-one relation between keys and values => multiple values can have the same hash key.

Hash tables are very useful, because given a set of data big enough, we can still access the element of interest quite fast, by simply using the hash function. With an array, we were required to know the index of where we can find it, whereas with a linked list, we would have needed to iterate through the list items to find it.

We see that we want to insert multiple values into our hash table.

We calculate the hash value for a specific item (with the help of the hash function), and that hash value will be the key / bucket entry.

Collision is a term used In case multiple keys return the same hash value. We can see that in this case, a solution will be to use a linked list in each bucket, so we can store multiple items in the same bucket.

Another solution would be to expand the number of buckets and recalculate the hash values for all objects, so we will not have collisions anymore.

Example:

If we have 10 items and we want to store them into 1 bucket – all will go in the same bucket
If there are 2 buckets – a few of them will go in one bucket, and the remaining ones in the other one
If there are 10 buckets, there’s a chance that each item will go into a separate bucket
If there are 100 buckets, there’s a very small chance that a collision will occur

Graphs

A graph data structure is a connection of nodes (vertices) connected together through lines called edges. Each node can point to any number of nodes. A graph can also have some edge value assigned to each edge, to represent some specific attribute (cost, capacity, length, etc.).

There are two types of graphs: Directed graphs, and undirected graphs.

Directed graphs
Undirected graphs

Directed graphs contain edges that are directed from one node to another. The nodes may be traversed only by following the assigned direction on each node.

Undirected graphs contain edges that do not have any direction, and can be traversed in both directions.

A cycle in a graph refer to a situation in which nodes are connected in a way that they form a closed structure.

If we traverse a cyclic graph and we don’t keep track of what vertices we visit, we will iterate through the same nodes infinitely.

Example: A -> B -> D -> C -> A -> B -> D -> C -> A ...

Trees

A tree structure can be defined as a collection of nodes, where each node is a data structure consisting of a value, and a list of references to its direct descendent nodes.

The first node (entry point) of the tree is called the root. If this root node (or any other node) is connected to another node, the root is called the parent node (ancestor), whereas the connected node is a child (descendent).

The last nodes (terminal nodes – nodes without children) are called leaves. We call siblings all the nodes that are on the same tree level. A subtree of a tree refer to a partion of a tree that can be viewed as a complete tree itself.

Other properties of a tree include:

Node height: The longest path towards a leaf
Tree depth: The depth of a node refer to the distance from that node to the root
Tree level: Defined as 1 + the number of connections between that node and the root (level is depth +1)

Binary trees

A binary tree is a tree where a parent node can only have 2 children: left and right. Each node contains the data (value) for the item, a pointer to the left child, and a pointer to the right child.

Common types of binary trees include:

Full binary tree: A binary tree in which all nodes except leaves7 have two children
Complete binary tree: A binary tree in which all nodes except leaves have two children, and the last level has all keys as left as possible
Perfect binary tree: A binary tree in which all nodes have two children and all leaves are at the same level
Balanced binary tree: A binary tree that automatically keeps its height small after an arbitrary item insertions and deletions. This means that the tree will possibly readjust after an insertion has been made.

Spanning trees

A spanning tree of a graph is a tree that connects together all the vertices, but not all the edges. There can be multiple spanning trees in a graph. The weight of a spanning tree is the sum of weights given to each edge of that spanning tree.

Minimum spanning tree (MST)

A minimum spanning tree (MST) is a spanning tree with the lowest weight, in comparison with all the other possible spanning trees. A MST has (V-1) edges, where V is the number of vertices in the graph.

Heap / Binary heaps

Binary Heap is a Complete Binary Tree, in which nodes are arranged based on their value. Every heap data structure has the following properties:

Property #1 (Ordering): Nodes must be arranged in an order according to their values.
Property #2 (Structural): All levels in a heap must be full, except last level (leafs). Nodes must be filled strictly from left to right.

A Binary Heap is either a Min Heap or Max Heap.

Min heap

A min heap is a specialized full binary tree in which every parent node contains lesser or equal value than its child nodes. In a min heap, the root has the minimum value than all other nodes in the min heap.

Max heap

A max heap is a specialized full binary tree in which every parent node contains greater or equal value than its child nodes. In a max heap, the root node has the maximum value than all other nodes in the max heap.

Trie

A trie is a data structure used to store data (usually strings), and it is used to represent the retrieval of data (thus, the name).

The prefix of a string refers to any substring of it, starting from the beginning of the string.

Example:
Prefixes for the word Shortcut:
S, Sh, Sho, Shor, Short, Shortc, Shortcu

Tries are used on groups of strings, and by receiving a prefix of a word, we can find all the possible words we can find starting with that prefix.

Using the entire dictionary of a language, and a single string, we can find the maximum length from the dictionary strings that matches that given string. We can also do autocomplete when we search something on the web, for example.

Shortcut to seniority

Badea Robert