How to Learn Data Structures and Algorithms. 20 Problem-Solving Techniques You Must Know

This is the article I wish I had read when I started coding. I will dive deep into 20 problem-solving techniques that you must know to excel at your next interview. They have helped me at work too and even given me ideas for a side project I am working on. Also, the last section includes a step-by-step guide explaining how to learn data structures and algorithms, with examples.

Furthermore, I recommend you read this post, where I outlined a high-level strategy to prepare for your next coding interview as well as the top mistakes to avoid.

I have grouped these techniques in:

  • Pointer based
  • Recursion based
  • Sorting and searching
  • Extending basic data structures
  • Miscellanea

I will explain each of them, show how to apply them to coding problems, and leave you some exercises so that you can practice on your own. For your convenience, I have copied here the problem statements, but I have left links to all of the exercises. You can copy-paste my solution and play around with it. I strongly recommend you code your solution and see if it passes the tests.

Some of the questions are better explained through an image or diagram. For these, I have left a comment asking you to open the link to get a graphical description of the problem.

This list is part of the study notes that I took before I applied to Amazon. I hope they will be as useful to you as they have been to me.

Pointer based techniques

1. Two Pointers

The idea is to use two (or more pointers) to split the array into different areas or groups based on some condition:

  • Elements smaller than, equal to, and greater than a certain value
  • Elements whose sum is too small or too large
  • Etc.

The following examples will help you understand this principle.

Two sum


  • Your returned answers (both index1 and index2) are not zero-based.
  • You may assume that each input would have exactly one solution and you may not use the same element twice.


  • Input: numbers = [2,7,11,15], target = 9
  • Output: [1,2]
  • Explanation: The sum of 2 and 7 is 9. Therefore index1 = 1, index2 = 2.


  • The largest sum is equal to the sum of the last 2 elements
  • The smallest sum is equal to the sum of the first 2 elements
  • For any index i in [0, a.size() — 1) => a[i + 1] >= a[i]

With this, we can design the following algorithm:

  • We keep 2 pointers: l, starting at the first element of the array, and r starting at to the last.
  • If the sum of a[l] + a[r] is smaller than our target, we increment l by one (to change the smallest operand in the addition for another one equal or larger than it at l+1); if it is larger than the target, we decrease r by one (to change our largest operand for another one equal or smaller at r-1).
  • We do this until a[l] + a[r] equals our target or l and r point to the same element (since we cannot use the same element twice) or have crossed, indicating there is no solution.

Here is a simple C++ implementation:

The time complexity is O(N), since we may need to traverse the N elements of the array to find the solution.

The space complexity is O(1), since we only need two pointers, regardless of how many elements the array contains.

There are other ways of solving this problem (using a hash table, for example), but I have used it just as an illustration of the two pointer technique.


This is a very common technique: transform a problem whose solution you don’t know into a problem that you can solve.

Remove duplicates from array

Do not allocate extra space for another array, you must do this by modifying the input array in-place with O(1) extra memory.

Example 1:

  • Given nums = [1,1,2],
  • Output = 2

Example 2:

  • Given nums = [0,0,1,1,1,2,2,3,3,4],
  • Output = 5

It doesn’t matter what values are set beyond the returned length.


  • You will need one pointer to iterate through the array, i.
  • And a second pointer, n, one to define the area that contains no duplicates: [0,n].

The logic is as follows. If the values of the elements at index i (except i = 0) and i-1 are:

  • The same, we don’t do anything — this duplicate will be overwritten by the next unique element in a.
  • Different: we add a[i] to the section of the array that contains no duplicates — delimited by n, and increment n by one.

This problem has linear time complexity and constant space complexity (it is usually the case for problems solved using this technique).

Sort colors


  • Input: [2,0,2,1,1,0]
  • Output: [0,0,1,1,2,2]


  • Smaller than 1
  • Equal to 1
  • Larger than 1

What we can achieve with 3 pointers.

This implementation is a bit tricky, so make sure you test it thoroughly.

Since need to traverse the array to sort it, the time complexity is O(N). The space complexity is O(1).

Just as a curiosity, this is an instance of the National Dutch flag problem described by Dijkstra.

Remove the n-th node from the end of a linked list


In this case, if we move one of the pointers, f, n times and then start advancing both at the same time by one node, when f reaches the end of the list the other pointer, s, will point to the node right before the node we want to delete.

Make sure you define what n = 1 means (last element or the element before the last element?), and avoid off-by-one errors.

The time and space complexity are the same as the previous problems’.

2. Pointers moving at different speeds

As usual, it will become clearer once I present some examples.

Find the middle node of a linked list of unknown size

Example 1:

  • Input: [1,2,3,4,5]
  • Output: Node 3 from this list


How can we use 2 pointers to find the middle element in one single pass?

If one of the pointers, f, moves twice as fast as the other one s, when f reaches the end, s will be in the middle of the list.

Here’s my solution in C++. Make sure you take into account edge cases (empty list, lists of odd and even sizes, etc) when you test your code.

Detect cycle in a linked list

Example 1:

  • Input: head = [3,2,0,-4], pos = 1
  • Output: true
  • Explanation: There is a cycle in the linked list, where the tail connects to the second node.


This has a time complexity of O(L), being L the length of the list, and space complexity of O(L), since in the worst case — no cycles — we need to add all the elements of the list to the hash set.

Time complexity cannot be improved. However, space complexity can be reduced to O(1). Think for a minute how this can be achieved with two pointers moving at different speeds.

Let’s call these pointers fast and slow. For each node slow visits, fast will move two nodes forward. Why?

  • If fast reaches the end of the list, the list does not contain any cycles.
  • If there is a cycle, since fast moves twice as fast as slow, it is just a matter of time (iterations, to be more precise) that the fast node catches the slow one, pointing both to the same node, which indicates the existence of a cycle.

Now, let’s translate this solution into code:

Find the duplicate number

Example 1:

  • Input: [1,3,4,2,2]
  • Output: 2


This is the same problem/solution as the previous problems, for arrays instead of linked lists.


3. Sliding Window

  • Longest subarray that …
  • Shortest substring containing …
  • Etc

You can think of it as another variation of the two pointer technique, where pointers are updated separately based on a certain condition. Below is the basic recipe for this type of problems, in pseudocode:

Create two pointers, l, and r
Create variable to keep track of the result (res)
Iterate until condition A is satisfied:
Based on condition B:
update l, r or both
Update res
Return res

Longest Substring Without Repeating Characters

Example 1:

  • Input: “abcabcbb”
  • Output: 3
  • Explanation: The answer is “abc”, with the length of 3


Based on the recipe I described above, you will need:

  • Two pointers, l and r, to define our substring s.
  • A variable, sol, to store the length of the longest substring we have seen so far.
  • A way of keeping track of the characters that form s: a set, seen, will be perfect for this.

While iterating through the string:

  • If the current character is in seen*you have to increment *l to start removing elements from the beginning of our s.
  • Otherwise, add the character to seen, move r forward and update sol.


There might be simpler solutions but focus on using this technique to get a better grasp of it.

Recursion based techniques

4. Dynamic Programming

5. Backtracking

Backtracking problems present you with a list of choices. Should you:

  • Place this piece in this position?
  • Add this number to the set?
  • Try this number in this position next?
  • Etc

After you have picked one of the options, it will get you a new list of choices, until you reach a state where there are no more choices: either you arrived at a solution or there is no solution.

Visually, you are moving from the root of a tree with every choice, until you get to a leaf. The basic high-level recipe (in pseudocode) for a backtracking algorithm is the following:

This can of course change depending on the problem:

  • If you need all solutions, the helper function returns nothing (void) to avoid stopping when we find the first solution.
  • To backtrack, you may have to bring your program to a previous state before you can move forward
  • After you choose a child, you need to detect if the candidate solution is viable or not: the definition of viable depends on the problem
  • Etc

But the core idea is the same: examine, in a systematic way, all paths and backtrack as soon as the current path is no longer viable.

N queens

Each solution contains a distinct board configuration of the n-queens’ placement, where ‘Q’ and ‘.’ both indicate a queen and an empty space respectively.


  • Input: 4
  • Output: [ [“.Q..”, “…Q”, “Q…”, “..Q.”], [“..Q.”, “Q…”, “…Q”, “.Q..”] ]
  • Explanation: There exist two distinct solutions to the 4-queens puzzle as shown above.


  • We need all solutions here, which is why the recursive function returns nothing as I explained in the introduction of this section.
  • Do not worry too much about the isViableSolution function for now. Try to see the recipe I gave (slightly modified) you in action.

Letter combination


  • Input: “23”
  • Output: [“ad”, “ae”, “af”, “bd”, “be”, “bf”, “cd”, “ce”, “cf”].


Note: Before you start solving any problem, try different approaches: dynamic programming, greedy algorithms, divide and conquer, a combination of algorithms and data structures, etc. Coding is the last step.

My solution, in C++:

Since I knew already the size of the solution, I initialize my candidate with that size and just modified the character at position idx. If the size is not known, this can be done instead:

Sudoku solver


Even though the code is long, it is mostly because of isViableSolution. Otherwise, it is not very different from other backtracking problems.


6. Subsets

  • Subsets
  • Combinations
  • Permutations
  • Etc

Because conceptually they are similar: take the elements of a container (array, string, etc) and create subsets from them according to some rule(s).

You will notice that most of these are backtracking problems or very similar.


Note: The solution set must not contain duplicate subsets.


  • Input: nums = [1,2,3]
  • Output: [ [3], [1], [2], [1,2,3], [1,3], [2,3], [1,2], [] ]


  • One that contains a[i]
  • One that does not contain a[i]

Until you get to the end of the array.

Here’s a simple implementation of what we have discussed.

Subsets — containing duplicates

Good luck!




  • Input: [1,2,3]
  • Output: [ [1,2,3], [1,3,2], [2,1,3], [2,3,1], [3,1,2], [3,2,1] ] #### Solution

Very similar to the previous problem, but this time we need to consider all the elements of the array in our candidates. The way we move them around is by swapping elements at different positions.

Permutations with repetitions


  • Input: [1,1,2]
  • Output: [[1,1,2], [1,2,1], [2,1,1] ]


As you can see, there is no need to learn every single data structure and algorithm in the literature to solve a vast amount of problems. What is valuable, and can be trained, is the ability to systematically think through a problem and skills to turn your ideas into code.


Sorting and searching algorithms

7. Sorting

When you are facing a new problem, ask yourself:

  • Can I sort the input?. For example, sometimes you have to return indices, therefore sorting is not an option
  • How would this problem change if the elements were sorted?
  • How does sorting affect the overall complexity?. The best sorting algorithms have a complexity of O(n log n) — sorting integers can be done in O(n)

For instance, our first problem (Two sum) can be solved in linear time with the two-pointer technique because the input is sorted. However, if we have to sort, the complexity becomes O(n log n).

Here you have a couple of problems that can be easily solved after sorting the input.

Other solutions trade space for time, so it is worth considering them before you start writing any code.

Valid anagram

Example 1:

  • Input: s = “anagram”, t = “nagaram”
  • Output: true

Example 2:

  • Input: s = “rat”, t = “car”
  • Output: false


Intersection of two arrays

Example 1:

  • Input: nums1 = [1,2,2,1], nums2 = [2,2]
  • Output: [2,2]


Imagine we have the following arrays:

  • A = [1,2,4,5]
  • B = [2,3,5,7,9,11]

And two pointers, l for A and r for B, starting at index 0 in each array.

  • A[l] = 1
  • B[r] = 2

We need to increment l to see if A has a 2: B can’t contain a 1 to the right of r, because it is sorted.

In short, we compare both elements:

  • If they are the same, we include them to the array representing the intersection of both arrays and advance both pointers
  • If they are different, we move the pointer pointing to the smallest of the two elements.

The time complexity of this approach is O(n log n) — even though the two-pointer part is linear — and the space complexity is O(1) — not including the space needed to store the intersection, of course, in which case we could say it is O(min(length(A), length(B)).


8. Intervals

  • Modeling the interval as an array of two elements, a pair/tuple or a custom class (this is the cleanest option)
  • Sorting the input
  • Iterating through the sorted array and deciding what to do based on the starts/ends of the intervals

You can see this as yet another type of problem that can be simplified after sorting the input, which is why I have included it in this section.

I’m leaving here my solution to two exercises, based on what I have just described. Try them before reading my solutions.

Merge intervals

Example 1:

  • Input: intervals = [[1,3],[2,6],[8,10],[15,18]]
  • Output: [[1,6],[8,10],[15,18]]
  • Explanation: Since intervals [1,3] and [2,6] overlaps, merge them into [1,6].


Interval list intersections


9. Variations on binary search

What you might not know is its implementation’s most common bug. Assuming we are performing a binary search in the elements whose range goes from [l,r], the element in the middle is usually computed as:

const int m = (l + r) / 2;

Can you spot the problem here?

This line can overflow. The safe way of computing this middle element is:

const int m = l + (r - l) / 2;

Here’s a basic implementation of binary search, using C++ templates. If you understand how it works and how to implement it correctly, you can solve many problems that just have a little tweak in the requirements or that are binary search problems in disguise.

Integer square root

Example 1:

  • Input: 4
  • Output: 2


It is not critical to know this, but we can optimize a little:

  • If x is 0 or 1, its square root is x. This lets us start testing numbers from 2.
  • The upper bound of the range can be reduced to x/2, since (x/2)² = x²/4 >= x => x >= 4, which is true for any integer in the range [2,…]

With this, we can search in [2, x/2] and speed things up a bit.


10. Breadth-First Search

At any given point in the iteration, BFS visits all the nodes at the same distance from the origin. This will become clearer after some of these examples.

Word ladder

  • Only one letter can be changed at a time.
  • Each transformed word must exist in the word list.


  • Return 0 if there is no such transformation sequence.
  • All words have the same length.
  • All words contain only lowercase alphabetic characters.
  • You may assume no duplicates in the word list.
  • You may assume beginWord and endWord are non-empty and are not the same.

Example 1:

  • Input:
  • beginWord = “hit”,
  • endWord = “cog”,
  • wordList = [“hot”,”dot”,”dog”,”lot”,”log”,”cog”]
  • Output: 5
  • Explanation: As one shortest transformation is “hit” -> “hot” -> “dot” -> “dog” -> “cog”, return its length 5.


  • Nodes represent words
  • Edges connect words that only differ by one letter

With this setup, this problem is equivalent to finding a path between two nodes in a graph, which BFS can solve. Since all edges have the same weight (1), we do not need Dijkstra or any other fancier algorithm.

After you visit the DFS section:

What would happen if we use DFS instead? Do you see any benefits/drawbacks?

Order level tree traversal

Open the link for a graphical description of the problem.


This is one approach that can be applied to many other problems.


11. Depth-First Search

It is usually shorter to implement than BFS. Some people find it easier. Others, because of the recursive calls, not so much. It is up to you. Just make sure you think about potential stack overflow issues if the size of the stack starts to get big.

Some problems are much easier to be solved with DFS/recursion, that it is worth practicing.

Number of islands


  • Input: grid = [ [“1″,”1″,”1″,”1″,”0"], [“1″,”1″,”0″,”1″,”0"], [“1″,”1″,”0″,”0″,”0"], [“0″,”0″,”0″,”0″,”0"]]
  • Output: 1


As soon as you see a matrix, think of a graph. This problem (and its variations) is very straightforward:

  • Iterate through the matrix
  • For every 1 you find, increase the counter and sink the island

To sink the island, you need to visit all the surrounding 1s in the matrix, which is equivalent to visit all the neighbors of that node and of all its neighbors, which sounds a lot like a recursive problem.

You can try to solve this using BFS too, but DFS is much shorter.

Symmetric tree


I will leave this one here as an exercise for you. Just use my solution in case you get stuck.


12. Topological sort

It is that simple.

The best way to intuitively understand what this algorithm achieves is to imagine a bunch of tasks, some of which depend on others (task 1 cannot start until task 2 has finished). A topological sort will list all these tasks preserving this structure of dependencies.

Let’s solve a problem using this algorithm.

There are a total of n courses you have to take, labeled from 0 to n-1. Some courses may have prerequisites, for example, to take course 0 you have to first take course 1, which is expressed as a pair: [0,1]

Course schedule II

Example 1:

  • Input: 2, [[1,0]]
  • Output: [0,1]
  • Explanation: There are a total of 2 courses to take. To take course 1 you should have finished course 0. So the correct course order is [0,1].

Example 2:

  • Input: 4, [[1,0],[2,0],[3,1],[3,2]]
  • Output: [0,1,2,3] or [0,2,1,3]
  • Explanation: There are a total of 4 courses to take. To take course 3 you should have finished both courses 1 and 2. Both courses 1 and 2 should be taken after you finished course 0. So one correct course order is [0,1,2,3]. Another correct ordering is [0,2,1,3].


A prerequisite to applying topological sort is that the graph is directed and acyclic. From the problem description, we can see that the graph is directed. We can detect if it contains any cycles and compute a topological sort in the same pass.

A more indirect but still valid approach would be to first check whether it has cycles and only if there are no cycles, compute a topological sort.


Have fun!

Extending basic data structures

13. Dequeues

That’s it. Simple.

Let’s see how such a minor change can simplify some kind of these problems.

Sliding window maximum

*Note: Open the link for a better understanding of the problem (there is an image). *


  • Input: nums = [1,3,-1,-3,5,3,6,7], and k = 3
  • Output: [3,3,5,5,6,7]


  1. Remove elements in the dequeue which are outside of the current sliding window (one per iteration)
  2. Remove all elements in the dequeue which are smaller than the current element we are at since they cannot represent the max of that sliding window
  3. Add the current element to the deque
  4. Once we have completed the first sliding window, we can start adding elements to our solution. By design, the element at the front of our dequeue will contain the maximum of the sliding window, which is what we are interested in.

This technique can be applied to find the minimum or other properties of a contiguous block of data in an array.


14. Trie

There are variants where you store suffixes instead of prefixes, where you use compression to reduce the size of the trie, etc. But at its basic, it is another type of tree.

They are used everywhere:

  • Auto-complete
  • Spell checkers
  • IP routing
  • Longest prefix/suffix matching
  • Etc

Implement a trie


  • An extra boolean to indicate whether that node marks the end of a word
  • A data structure to store pointers to the node’s children: a hash table, an array of characters, etc.

Word search II


  • Input: board = [ [‘o’,’a’,’a’,’n’], [‘e’,’t’,’a’,’e’], [‘i’,’h’,’k’,’r’], [‘i’,’f’,’l’,’v’] ] words = [“oath”,”pea”,”eat”,”rain”]
  • Output: [“eat”,”oath”]


At every step, you will have built some candidate string and need to check if it belongs to the dictionary. A hash set containing all the words in the dictionary would give a very good performance. Why bothering with a trie then? Because a trie can tell you whether that path is worth exploring or not, improving the efficiency of our solution.

In our previous example, imagine we form the string “oa”.

We can check if this prefix (potential word) exists in our trie. It exists since we have added the word “oath”. Now imagine we keep moving right through our board and form the string “oaa”.

In our trie, there are no words that contain the prefix “oaa”, so we can backtrack at this point.

With a hash set that contains all the words, you could not do this type of prefix matching, unless you create two different tables: one for prefixes and another one for words.

This is a complex problem because it combines different elements and it is not easy to implement, so do not get discouraged if it takes you a few tries (no pun intended) to get it right.


15. Two instances of the same data structure

  • Queues
  • Stacks
  • Arrays

Do not limit yourself to these.

Find median from a stream of data

For example,

  • [2,3,4], the median is 3
  • [2,3], the median is (2 + 3) / 2 = 2.5

Design a data structure that supports the following two operations:

  • void addNum(int num) — Add an integer number from the data stream to the data structure.
  • double findMedian() — Return the median of all elements so far.




16. Bit manipulation

This is the most comprehensive site I have found on this topic. Use it as a reference.

Missing number

Example 1:

  • Input: [3,0,1]
  • Output: 2


  • Any bit XORed with itself produces a 0 -> a ^ a = 0
  • Any bit XORed with 0 produces the original bit -> a ^ 0 = a
  • XOR is associative = a ^ (b ^ c) = (a ^ b ) ^ c

If we XOR all the numbers in the array (integers in the range [0,n]) with all the integers in [0 to n], the pairs will produce zeroes and the missing number will be XORed with 0 (resulting in itself), solving our problem.

Power of two


Powers of two can be expressed in binary as a leading 1 and some 0s:

With this, it is simple to figure out if a number is a power of two or not. You can achieve fast if you know what the following line does (I am sure you can figure it out on your own):

This trick is worth knowing since it is used a lot.

Number of 1s

Example 1:

  • Input: 00000000000000000000000000001011
  • Output: 3
  • Explanation: The input binary string 00000000000000000000000000001011 has a total of three ‘1’ bits


Try to use the trick I showed you in the previous exercise to improve the performance of this algorithm (in the average case).


17. Top ‘K’ Elements

  • The largest/smallest
  • The closest/furthest to a point
  • The most frequent in the list
  • Etc

I have seen some of the following (or some sort of variation) asked in interviews.

There is no single data structure that will always give the right solution, but the following elements are very useful:

  • Hash table
  • Priority queue
  • Sorting (the input or as an intermediate step)

Priority queues usually provide a better complexity.

Top k frequent words

Example 1:

  • Input: [“i”, “love”, “leetcode”, “i”, “love”, “coding”], k = 2
  • Output: [“i”, “love”]
  • Explanation: “i” and “love” are the two most frequent words. Note that “i” comes before “love” due to a lower alphabetical order.


For this last part, you either:

  • Put all the elements with its frequencies in an array and sort it


  • Use a priority queue

It is worth knowing this second approach since it can be applied to other problems.

Here is a simple solution in C++ using a priority queue.


K closest points to origin

You may return the answer in any order. The answer is guaranteed to be unique (except for the order that it is in.)

Example 1:

  • Input: points = [[1,3],[-2,2]], K = 1
  • Output: [[-2,2]]


We use a max heap, where the root of the heap is the maximum element of the heap. Why the maximum? Our heap contains the K points closest to the origin and the root points to the element that is the furthest from the origin amongst all the other points. If we find a point closer to this one, it may still be further than the rest, but we need to drop our current top and add this new point to the heap.

18. K-way merge

Merge 2 sorted lists

Output: 1->1->2->3->4->4


Merge k sorted lists

Merge all the linked-lists into one sort linked-list and return it.

Example 1:

  • Input: lists = [[1,4,5],[1,3,4],[2,6]]
  • Output: [1,1,2,3,4,4,5,6]


  • The linked-lists are: [1->4->5, 1->3->4, 2->6]
  • Merging them into one sorted list: [1->1->2->3->4->4->5->6]


Now, instead of two sorted lists, we have K. We need to create a list from picking the next minimum element of all the lists. Since they are sorted, it will be the first element of one of these lists. Luckily, there is a data structure that returns its minimum element in O(1) and has an insertion complexity of O(log K): a priority queue.

For every list, we add to the priority queue a pair containing:

  • The head of that list
  • An index to remember to which list that element belongs

After we pop an element, we add its value to our solution and add to the priority queue the new head of that list (the pair stores the value and the index of the list).

With this information, try to solve the problem. You will learn much more from it than from reading my solution without trying first.

19. Rolling hash

The basic ideas that you need to remember to be able to reconstruct this algorithm are the following:

  • This is an improvement over the brute force method, where you compare s and every candidate substring, c, which is inefficient.
  • If two strings are the same, they will produce the same hash.
  • The inverse is not true: two different strings may produce the same hash.
  • Using a rolling hash function we can compute the hash of a new string in O(1)

If we put all this together, we will efficiently compute hashes and only compare c and s when they have the same hash, reducing the number of comparisons and thus the average complexity of our algorithm (worst case, we need to compare all substrings and are back to the brute force method).

Find string in text

Example 1:

  • Input: haystack = “hello”, needle = “ll”
  • Output: 2

Example 2:

  • Input: haystack = “aaaaa”, needle = “bba”
  • Output: -1


20. How to learn data structures and algorithms

If you are wondering how to learn data structures and algorithms, I already showed how to do it when I described Rabin-Karp:

  1. Understand what problem this algorithm or data structure is trying to solve: Here, learning where other alternatives fall short is helpful.
  2. Break it down to the key elements or steps that define the algorithm. Understand them individually and how they connect.
  3. From this, code the algorithm.

Example 1: Binary search

  1. Two points: a)Since the array is sorted, comparing our target to the middle element of the array lets us discard half of the array at every step (the half containing elements that are too large or too small compared to the target. b)If the pointers cross, this is l > r, there is no solution.

Example 2: Quad-tree

What is a quad-tree?

A quadtree is a tree data structure in which each internal node has exactly four children. Quadtrees are the two-dimensional analog of octrees and are most often used to partition a two-dimensional space by recursively subdividing it into four quadrants or regions

What problem is it trying to solve?

Imagine we are trying to model a physical system made of a bunch of particles, each having a certain mass. Every particle attracts every other particle in the system. If we have N particles, at every step of our simulation, the number of interactions we need to process grows as N². This will be extremely slow for large values of N.

Since the attraction between two particles decays with the square of the distance, we can neglect the contribution of particles that are far. How do we know which particles are far without iterating through all the particles (the problem we are trying to avoid in the first place)? Here is where quadtrees come in handy.

Using quadtrees, we can efficiently query all the particles in a certain region. The lookup takes O(logN) time, making our overall time for every step O(n log n) since we have to do this for N particles.

Notice how at this point we do not care so much about the implementation. This is more of a high-level understanding of the data structure and to get an idea of where it can be useful.

In the next section, we will get into more detail.

What are the key elements of a quad-tree?

A quadtree is a tree data structure in which each internal node has exactly four children.

  • They decompose space into adaptable cells
  • Each cell (or bucket) has a maximum capacity. When maximum capacity is reached, the bucket splits
  • The tree directory follows the spatial decomposition of the quadtree.

So, our quadtree class is not very different from any other tree that you may have encountered (including the trie I presented earlier). Each node will contain:

  • An integer representing its maximum capacity
  • A boolean indicating whether it is a leaf or not (which means, the node has been divided because it reached its maximum capacity)
  • Pointers to it 4 children
  • An array of containing the points included in that node

Knowing what problem a data structure is trying to solve and breaking it down to its basic elements makes it much easier to implement it and especially to remember what the data structure is about and when it can be used. All you have to “memorize” is its definitions and properties — which are easier to remember if you think about what it is trying to solve.

Putting it all together

If you want to learn more about this data structure, I recommend these two videos to see a quadtree in action.

Nice to know


The point is not to learn every single data structure or algorithm in a book. As you can see, most of them are either basic techniques or small tweaks on basic data structures you already knew. I want to show you that you can achieve a lot with what you already know.

Originally published at on September 27, 2020.

Learn to think like a software engineer. Ex SDE @amazon. Currently SDE @ebay. Find me on Twitter @codinglanguages

Learn to think like a software engineer. Ex SDE @amazon. Currently SDE @ebay. Find me on Twitter @codinglanguages