diff --git a/docs/anki.md b/docs/anki.md
index 7e46c96..6cb3de3 100644
--- a/docs/anki.md
+++ b/docs/anki.md
@@ -24,7 +24,7 @@ Here is a flashcard example:
 
 <center>![](img/anki.png)</center>
 
-The Anki versions (a clone of the flashcards from this repo) are available via a one-time GitHub sponsorship:
+The Anki versions (a clone of the flashcards from this repo) are available via one-time GitHub sponsorships:
 
 - **Algo Deck**: [$19 tier](https://github.com/sponsors/teivah/sponsorships?sponsor=teivah&tier_id=222440).
 - **Design Deck**: [$21 tier](https://github.com/sponsors/teivah/sponsorships?sponsor=teivah&tier_id=222441).
diff --git a/site/anki/index.html b/site/anki/index.html
index 3a19d3c..aecb2ff 100644
--- a/site/anki/index.html
+++ b/site/anki/index.html
@@ -603,7 +603,7 @@ <h1 id="anki">Anki</h1>
 <p>I used Anki myself with <a href="../">Algo Deck</a> and <a href="../designdeck/">Design Deck</a> and it paid off. This method played a key role in helping me land a role as <strong>L5 SWE at Google</strong> (senior software engineer).</p>
 <p>Here is a flashcard example:</p>
 <p><center><a class="glightbox" href="../img/anki.png" data-type="image" data-width="auto" data-height="auto" data-desc-position="bottom"><img alt="" src="../img/anki.png" /></a></center></p>
-<p>The Anki versions (a clone of the flashcards from this repo) are available via a one-time GitHub sponsorship:</p>
+<p>The Anki versions (a clone of the flashcards from this repo) are available via one-time GitHub sponsorships:</p>
 <ul>
 <li><strong>Algo Deck</strong>: <a href="https://github.com/sponsors/teivah/sponsorships?sponsor=teivah&amp;tier_id=222440">$19 tier</a>.</li>
 <li><strong>Design Deck</strong>: <a href="https://github.com/sponsors/teivah/sponsorships?sponsor=teivah&amp;tier_id=222441">$21 tier</a>.</li>
diff --git a/site/search/search_index.json b/site/search/search_index.json
index 833a6b2..5c22bb1 100644
--- a/site/search/search_index.json
+++ b/site/search/search_index.json
@@ -1 +1 @@
-{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Algo Deck","text":"Anki <p>Check the Anki version here.</p>"},{"location":"#array","title":"Array","text":""},{"location":"#algorithm-to-reverse-an-array","title":"Algorithm to reverse an array","text":"<pre><code>int i = 0;\nint j = a.length - 1;\nwhile (i &lt; j) {\n    swap(a, i++, j--);\n}\n</code></pre>"},{"location":"#array-complexity-access-search-insert-delete","title":"Array complexity: access, search, insert, delete","text":"<p>Access: O(1)</p> <p>Search: O(n)</p> <p>Insert: O(n)</p> <p>Delete: O(n)</p>"},{"location":"#binary-search-in-a-sorted-array-algorithm","title":"Binary search in a sorted array algorithm","text":"<pre><code>int lo = 0, hi = a.length - 1;\n\nwhile (lo &lt;= hi) {\n    int mid = lo + ((hi - lo) / 2);\n    if (a[mid] == key) {\n        return mid;\n    }\n    if (a[mid] &lt; key) {\n        lo = mid + 1;\n    } else {\n        hi = mid - 1;\n    }\n}\n</code></pre>"},{"location":"#further-reading","title":"Further Reading","text":"<ul> <li>Nearly All Binary Searches and Mergesorts are Broken by the Google AI Blog</li> </ul>"},{"location":"#find-an-element-in-a-rotated-sorted-array","title":"Find an element in a rotated sorted array","text":"<p>Solution: binary search</p> <p>Check first if the array is rotated. If not, apply normal binary search</p> <p>If rotated, find pivot (smallest element, only element whose previous is bigger)</p> <p>Then, check if the element is in 0..pivot-1 or pivot..len-1</p> <pre><code>int findElementRotatedArray(int[] a, int val) {\n    // If array not rotated\n    if (a[0] &lt; a[a.length - 1]) {\n        // We apply the normal binary search\n        return binarySearch(a, val, 0, a.length - 1);\n    }\n\n    int pivot = findPivot(a);\n\n    if (val &gt;= a[0] &amp;&amp; val &lt;= a[pivot - 1]) {\n        // Element is before the pivot\n        return binarySearch(a, val, 0, pivot - 1);\n    } else if (val &gt;= a[pivot] &amp;&amp; val &lt; a.length - 1) {\n        // Element is after the pivot\n        return binarySearch(a, val, pivot, a.length - 1);\n    }\n    return -1;\n}\n</code></pre>"},{"location":"#given-an-array-move-all-the-0-to-the-left-while-maintaining-the-order-of-the-other-elements","title":"Given an array, move all the 0 to the left while maintaining the order of the other elements","text":"<p>Example: 1, 0, 2, 0, 3, 0 =&gt; 0, 0, 0, 1, 2, 3</p> <p>Two pointers technique: read and write starting at the end of the array</p> <p>If read is on a 0, decrement read. Otherwise swap, decrement both</p> <pre><code>public void move(int[] a) {\n    int w = a.length - 1, r = a.length - 1;\n    while (r &gt;= 0) {\n        if (a[r] == 0) {\n            r--;\n        } else {\n            swap(a, r--, w--);\n        }\n    }\n}\n</code></pre> <p>Time complexity: O(n)</p> <p>Space complexity: O(1)</p>"},{"location":"#how-to-detect-if-an-element-is-a-pivot-in-a-rotated-sorted-array","title":"How to detect if an element is a pivot in a rotated sorted array","text":"<p>Only element whose previous is bigger (also the pivot is the smallest element)</p>"},{"location":"#how-to-find-a-pivot-element-in-a-rotated-array","title":"How to find a pivot element in a rotated array","text":"<p>Check first if the array is rotated</p> <p>Then, apply binary search (comparison with a[right] to know if we go left or right)</p> <pre><code>int findPivot(int[] a) {\n    int left = 0, right = a.length - 1;\n\n    // Array is not rotated\n    if (a[left] &lt; a[right]) {\n        return -1;\n    }\n\n    while (left &lt;= right) {\n        int mid = left + ((right - left) / 2);\n        if (mid &gt; 0 &amp;&amp; a[mid] &lt; a[mid - 1]) {\n            return a[mid];\n        }\n\n        if (a[mid] &lt; a[right]) {\n            // Pivot is on the left\n            right = mid - 1;\n        } else {\n            // Pivot is on the right\n            left = mid + 1;\n        }\n    }\n\n    return -1;\n}\n</code></pre>"},{"location":"#how-to-find-the-duplicates-in-an-array","title":"How to find the duplicates in an array","text":"<ul> <li>Hashtable</li> <li>Sorting the array then iterating over each element and check if previous = current</li> </ul>"},{"location":"#how-to-manage-a-dynamic-array","title":"How to manage a dynamic array","text":"<p>When full, create a new array of twice the size, copy items (System.arraycopy is optimized for that)</p> <p>Shrink: - Not when one-half full (otherwise worst case is too expensive: double-shrink-double-shrink etc.) - Solution: one-quarter full</p>"},{"location":"#how-to-test-if-the-array-is-sorted-in-ascending-or-descending-order","title":"How to test if the array is sorted in ascending or descending order","text":"<p>Test first and last element (no iteration)</p>"},{"location":"#rotate-an-array-by-n-elements-n-can-be-negative","title":"Rotate an array by n elements (n can be negative)","text":"<p>Example: 1, 2, 3, 4, 5 with n = 3 =&gt; 3, 4, 5, 1, 2</p> <ul> <li>Reverse the initial array</li> <li>Reverse from 0 to n-1</li> <li>Reverse from n to len - 1</li> </ul> <pre><code>void rotateArray(List&lt;Integer&gt; a, int n) {\n    if (n &lt; 0) {\n        n = a.size() + n;\n    }\n\n    reverse(a, 0, a.size() - 1);\n    reverse(a, 0, n - 1);\n    reverse(a, n, a.size() - 1);\n}\n</code></pre> <p>Time complexity: O(n)</p> <p>Memory complexity: O(1)</p>"},{"location":"#bit","title":"Bit","text":""},{"location":"#operator","title":"&amp; operator","text":"<p>AND bit by bit</p>"},{"location":"#operator_1","title":"&lt;&lt; operator","text":"<p>Shift on the left</p> <p>n * 2 &lt;=&gt; left shift by 1</p> <p>n * 4 &lt;=&gt; left shift by 2</p>"},{"location":"#operator_2","title":"&gt;&gt; operator","text":"<p>Shift on the right</p>"},{"location":"#operator_3","title":"&gt;&gt;&gt; operator","text":"<p>Logical shift (shift the sign bit as well)</p>"},{"location":"#operator_4","title":"^ operator","text":"<p>XOR bit by bit</p>"},{"location":"#bit-vector-structure","title":"Bit vector structure","text":"<p>Vector (linear sequence of numeric values stored contiguously in memory) in which each element is a bit (so either 0 or 1)</p>"},{"location":"#check-exactly-one-bit-is-set","title":"Check exactly one bit is set","text":"<pre><code>boolean checkExactlyOneBitSet(int num) {\n    return num != 0 &amp;&amp; (num &amp; (num - 1)) == 0;\n}\n</code></pre>"},{"location":"#clear-bits-from-i-to-0","title":"Clear bits from i to 0","text":"<pre><code>int clearBitsFromITo0(int num, int i) {\n    int mask = (-1 &lt;&lt; (i + 1));\n    return num &amp; mask;\n}\n</code></pre>"},{"location":"#clear-bits-from-most-significant-one-to-i","title":"Clear bits from most significant one to i","text":"<pre><code>int clearBitsFromMsbToI(int num, int i) {\n    int mask = (1 &lt;&lt; i) - 1;\n    return num &amp; mask;\n}\n</code></pre>"},{"location":"#clear-ith-bit","title":"Clear ith bit","text":"<pre><code>int clearBit(final int num, final int i) {\n    final int mask = ~(1 &lt;&lt; i);\n    return num &amp; mask;\n}\n</code></pre>"},{"location":"#flip-ith-bit","title":"Flip ith bit","text":"<pre><code>int flipBit(final int num, final int i) {\n    return num ^ (1 &lt;&lt; i);\n}\n</code></pre>"},{"location":"#get-ith-bit","title":"Get ith bit","text":"<pre><code>boolean getBit(final int num, final int i) {\n    return ((num &amp; (1 &lt;&lt; i)) != 0);\n}\n</code></pre>"},{"location":"#how-to-flip-one-bit","title":"How to flip one bit","text":"<p>b ^ 1</p>"},{"location":"#how-to-represent-signed-integers","title":"How to represent signed integers","text":"<p>Use the most significative bit to represent the sign. Yet, it is not enough (problem with this technique: 5 + (-5) != 0)</p> <p>Two's complement technique: take the one complement and add one</p> <p>-3: 1101</p> <p>-2: 1110</p> <p>-1: 1111</p> <p>0:  0000</p> <p>1:  0001</p> <p>2:  0010</p> <p>3:  0011</p> <p>The most significant bit still represents the sign</p> <p>Max integer value: 1...1 (31 bits)</p> <p>-1: 1...1 (32 bits)</p>"},{"location":"#set-ith-bit","title":"Set ith bit","text":"<pre><code>int setBit(final int num, final int i) {\n    return num | (1 &lt;&lt; i);\n}\n</code></pre>"},{"location":"#update-a-bit-from-a-given-value","title":"Update a bit from a given value","text":"<ul> <li>Clear this bit</li> <li>Apply OR on the result with a 0 or 1 left shifted to its index</li> </ul> <pre><code>int updateBit(int num, int i, boolean bit) {\n    int value = bit ? 1 : 0;\n    int mask = ~(1 &lt;&lt; i);\n    return (num &amp; mask) | (value &lt;&lt; i);\n}\n</code></pre>"},{"location":"#x-0s","title":"x &amp; 0s","text":"<p>0</p>"},{"location":"#x-1s","title":"x &amp; 1s","text":"<p>x</p>"},{"location":"#x-x","title":"x &amp; x","text":"<p>x</p>"},{"location":"#x-0s_1","title":"x ^ 0s","text":"<p>x</p>"},{"location":"#x-1s_1","title":"x ^ 1s","text":"<p>~x</p>"},{"location":"#x-x_1","title":"x ^ x","text":"<p>0</p>"},{"location":"#x-0s_2","title":"x | 0s","text":"<p>x</p>"},{"location":"#x-1s_2","title":"x | 1s","text":"<p>1s</p>"},{"location":"#x-x_2","title":"x | x","text":"<p>x</p>"},{"location":"#xor-operations","title":"XOR operations","text":"<p>0 ^ 0 = 0</p> <p>1 ^ 0 = 1</p> <p>0 ^ 1 = 1</p> <p>1 ^ 1 = 0</p> <p>n XOR 0 =&gt; keep</p> <p>n XOR 1 =&gt; flip</p>"},{"location":"#operator_5","title":"| operator","text":"<p>OR bit by bit</p>"},{"location":"#operator_6","title":"~ operator","text":"<p>Complement bit by bit</p>"},{"location":"#complexity","title":"Complexity","text":"<p>Big-O Cheat Sheet</p>"},{"location":"#01-knapsack-brute-force-complexity","title":"0/1 Knapsack brute force complexity","text":"<p>Time complexity: O(2^n) with n the number of items</p> <p>Space complexity: O(n)</p>"},{"location":"#01-knapsack-memoization-complexity","title":"0/1 Knapsack memoization complexity","text":"<p>Time and space complexity: O(n * c) with n the number items and c the capacity</p>"},{"location":"#01-knapsack-tabulation-complexity","title":"0/1 Knapsack tabulation complexity","text":"<p>Time and space complexity: O(n * c) with n the number of items and c the capacity</p> <p>Space complexity could even be improved to O(2*c) = O(c) as we need to store only the last 2 lines (using row%2):</p> <pre><code>int[][] dp = new int[2][c + 1];\n</code></pre>"},{"location":"#amortized-complexity-definition","title":"Amortized complexity definition","text":"<p>How much of a resource (time or memory) it takes to execute per operation on average</p>"},{"location":"#array-complexity-access-search-insert-delete_1","title":"Array complexity: access, search, insert, delete","text":"<p>Access: O(1)</p> <p>Search: O(n)</p> <p>Insert: O(n)</p> <p>Delete: O(n)</p>"},{"location":"#b-tree-complexity-access-insert-delete","title":"B-tree complexity: access, insert, delete","text":"<p>All: O(log n)</p>"},{"location":"#bfs-and-dfs-graph-traversal-time-and-space-complexity","title":"BFS and DFS graph traversal time and space complexity","text":"<p>Time: O(v + e) with v the number of vertices and e the number of edges</p> <p>Space: O(v)</p>"},{"location":"#bfs-and-dfs-tree-traversal-time-and-space-complexity","title":"BFS and DFS tree traversal time and space complexity","text":"<p>BFS: time O(v), space O(v)</p> <p>DFS: time O(v), space O(h) (height of the tree)</p>"},{"location":"#big-o","title":"Big O","text":"<p>Upper bound</p>"},{"location":"#big-omega","title":"Big Omega","text":"<p>Lower bound (fastest)</p>"},{"location":"#big-theta","title":"Big Theta","text":"<p>Theta(n) if both O(n) and Omega(n)</p>"},{"location":"#binary-heap-min-heap-or-max-heap-complexity-insert-get-min-max-delete-min-max","title":"Binary heap (min-heap or max-heap) complexity: insert, get min (max), delete min (max)","text":"<p>Insert: O(log (n))</p> <p>Get min (max): O(1)</p> <p>Delete min: O(log n)</p> <p>If not balanced O(n)</p> <p>If balanced O(log n)</p>"},{"location":"#bst-delete-algo-and-complexity","title":"BST delete algo and complexity","text":"<p>Find inorder successor and swap it</p> <p>Average: O(log n)</p> <p>Worst: O(h) if not self-balanced BST, otherwise O(log n)</p>"},{"location":"#bubble-sort-complexity-and-stability","title":"Bubble sort complexity and stability","text":"<p>Time: O(n\u00b2)</p> <p>Space: O(1)</p> <p>Stable</p>"},{"location":"#complexity-of-a-function-making-multiple-recursive-subcalls","title":"Complexity of a function making multiple recursive subcalls","text":"<p>Time: O(branches^depth) with branches the number of times each recursive call branches (english: 2 power 3)</p> <p>Space: O(depth) to store the call stack</p>"},{"location":"#complexity-to-create-a-trie","title":"Complexity to create a trie","text":"<p>Time and space: O(n * l) with n the number of words and l the longest word length</p>"},{"location":"#complexity-to-insert-a-key-in-a-trie","title":"Complexity to insert a key in a trie","text":"<p>Time: O(k) with k the size of the key</p> <p>Space: O(1) iterative, O(k) recursive</p>"},{"location":"#complexity-to-search-for-a-key-in-a-trie","title":"Complexity to search for a key in a trie","text":"<p>Time: O(k) with k the size of the key</p> <p>Space: O(1) iterative or O(k) recursive</p>"},{"location":"#counting-sort-complexity-stability-use-case","title":"Counting sort complexity, stability, use case","text":"<p>Time complexity: O(n + k) // n is the number of elements, k is the range (the maximum element)</p> <p>Space complexity: O(k)</p> <p>Stable</p> <p>Use case: known and small range of possible integers</p>"},{"location":"#doubly-linked-list-complexity-access-insert-delete","title":"Doubly linked list complexity: access, insert, delete","text":"<p>Access: O(n)</p> <p>Insert: O(1)</p> <p>Delete: O(1)</p>"},{"location":"#hash-table-complexity-search-insert-delete","title":"Hash table complexity: search, insert, delete","text":"<p>All: amortized O(1), worst O(n)</p>"},{"location":"#heapsort-complexity-stability-use-case","title":"Heapsort complexity, stability, use case","text":"<p>Time: Theta(n log n)</p> <p>Space: O(1)</p> <p>Unstable</p> <p>Use case: space constrained environment with O(n log n) time guarantee</p> <p>Yet, not stable and not cache friendly</p>"},{"location":"#insertion-sort-complexity-stability-use-case","title":"Insertion sort complexity, stability, use case","text":"<p>Time: O(n\u00b2)</p> <p>Space: O(1)</p> <p>Stable</p> <p>Use case: partially sorted structure</p>"},{"location":"#linked-list-complexity-access-insert-delete","title":"Linked list complexity: access, insert, delete","text":"<p>Access: O(n)</p> <p>Insert: O(1)</p> <p>Delete: O(1)</p>"},{"location":"#mergesort-complexity-stability-use-case","title":"Mergesort complexity, stability, use case","text":"<p>Time: Theta(n log n)</p> <p>Space: O(n)</p> <p>Stable</p> <p>Use case: good worst case time complexity and stable, good with linked list</p>"},{"location":"#quicksort-complexity-stability-use-case","title":"Quicksort complexity, stability, use case","text":"<p>Time: best and average O(n log n), worst O(n\u00b2) if the array is already sorted in ascending or descending order</p> <p>Space: O(log n) // In-place sorting algorithm</p> <p>Not stable</p> <p>Use case: in practice, quicksort is often faster than merge sort due to better locality (not applicable with linked list so in this case we prefer mergesort)</p>"},{"location":"#radix-sort-complexity-stability-use-case","title":"Radix sort complexity, stability, use case","text":"<p>Time complexity: O(nk) // n is the number of elements, k is the maximum number of digits for a number</p> <p>Space complexity: O(k)</p> <p>Stable</p> <p>Use case: if k &lt; log(n) (for example 1M of elements from 0..1000 as 4 &lt; log(1M))</p>"},{"location":"#recursivity-impacts-on-algorithm-complexity","title":"Recursivity impacts on algorithm complexity","text":"<p>Space impact as each call is added to the call stack</p> <p>Unless we use tail call recursion</p>"},{"location":"#red-black-tree-complexity-access-insert-delete","title":"Red-black tree complexity: access, insert, delete","text":"<p>All: O(log n)</p>"},{"location":"#selection-sort-complexity","title":"Selection sort complexity","text":"<p>Time: Theta(n\u00b2)</p> <p>Space: O(1)</p>"},{"location":"#stack-implementations-and-insertdelete-complexity","title":"Stack implementations and insert/delete complexity","text":"<ul> <li>Linked list with a pointer on the head</li> </ul> <p>Insert: O(1)</p> <p>Delete: O(1)</p> <ul> <li>Array</li> </ul> <p>Insert: O(n), amortized time O(1)</p> <p>Delete: O(1)</p>"},{"location":"#time-complexity-to-build-a-binary-heap","title":"Time complexity to build a binary heap","text":"<p>O(n)</p> <p>Time and space: O(v + e)</p>"},{"location":"#dynamic-programming","title":"Dynamic Programming","text":""},{"location":"#dynamic-programming-concept","title":"Dynamic programming concept","text":"<p>Break down a problem in smaller parts and store the results of these subproblems so that they only need to be computed once</p> <p>A DP algorithm will search through all of the possible subproblems (main difference with greedy algorithms)</p> <p>Based on either: - Memoization (top-down) - Tabulation (bottom-up)</p>"},{"location":"#memoization-vs-tabulation","title":"Memoization vs tabulation","text":"<p>Optimization technique to cache previously computed results</p> <p>Used by dynamic programming algorithms</p> <p>Memoization: top-down (start with a large, complex problem and break it down into smaller sub-problems)</p> <pre><code>f(x) {\n    if (mem[x] is undefined)\n        mem[x] = f(x-1) + f(x-2)\n    return mem[x]\n}\n</code></pre> <p>Tabulation: bottom-up (start with the smallest solution and then build up each solution until we arrive at the solution to the initial problem)</p> <pre><code>tabFib(n) {\n    mem[0] = 0\n    mem[1] = 1\n    for i = 2...n\n        mem[i] = mem[i-2] + mem[i-1]\n    return mem[n]\n}\n</code></pre>"},{"location":"#encoding","title":"Encoding","text":""},{"location":"#ascii-charset","title":"ASCII charset","text":"<p>128 characters</p>"},{"location":"#difference-encodingcharset","title":"Difference encoding/charset","text":"<p>Charset: set of characters to be used (e.g. ASCII 128 characters)</p> <p>Encoding: translation of a list of characters in binary</p> <p>Encoding is used because for all charset we can't guarantee 1 character = 1 byte</p> <p>Example: UTF-8 to encode Unicode characters using from 1 byte (english) up to 6 bytes</p>"},{"location":"#unicode-charset","title":"Unicode charset","text":"<p>Superset of ASCII with 2^21 characters</p>"},{"location":"#general","title":"General","text":""},{"location":"#before-finding-a-solution","title":"Before finding a solution","text":"<p>1) Make sure to understand the problem by listing: - Inputs - Outputs (what do we search) - Constraints</p> <p>2) Draw examples</p>"},{"location":"#comparator-implementation-to-order-two-integers","title":"Comparator implementation to order two integers","text":"<p>Ordering, min-heap: (a, b) -&gt; a - b</p> <p>Reverse ordering, max-heap: (a, b) -&gt; b - a</p> <p>7 ways: 1. a and b do not overlap 2. a and b overlap, b ends after a 3. a completely overlaps b 4. a and b overlap, a ends after b 5. b completely overlaps a 6. a and b do no overlap 7. a and b are equals</p>"},{"location":"#different-ways-for-two-intervals-to-relate-to-each-other-if-ordered-by-start-then-end","title":"Different ways for two intervals to relate to each other if ordered by start then end","text":"<p>2 different ways: - No overlap - Overlap // Merge intervals (start of the first interval, max of the two ends)</p>"},{"location":"#divide-and-conquer-algorithm-paradigm","title":"Divide and conquer algorithm paradigm","text":"<ol> <li>Divide: break a given problem into subproblems of same type</li> <li>Conquer: recursively solve these subproblems</li> <li>Combine: combine the answers to solve the initial problem</li> </ol> <p>Example with merge sort: 1. Split the array into two halves 2. Sort them (recursive call) 3. Merge the two halves</p>"},{"location":"#how-to-name-a-matrix-indexes","title":"How to name a matrix indexes","text":"<p>Use m[row][col] instead of m[y][x]</p>"},{"location":"#if-stucked-on-a-problem","title":"If stucked on a problem","text":"<ul> <li>Start with the smallest and easiest problem (e.g. 2 elements) and build a solution for that. Then, add elements and see if we can find a common pattern</li> <li>Greedy method</li> <li>Traversal technique</li> </ul>"},{"location":"#in-place-definition","title":"In place definition","text":"<p>Mutates an input</p>"},{"location":"#p-vs-np-problems","title":"P vs NP problems","text":"<p>P (polynomial): set of problems that can be solved reasonably fast (example: multiplication, sorting, etc.)</p> <p>Complexity is not exponential</p> <p>NP (non-deterministic polynomial): set of problems where given a solution, we can test is it is a correct one in a reasonable amount of time but finding the solution is not fast (example: a 1M*1M sudoku grid, traveling salesman problem, etc)</p> <p>NP-complete: hardest problems in the NP set</p> <p>There are other sets of problems that are not P nor NP as an answer is really hard to prove (example: best move in a chess game)</p> <p>P = NP means does being able to quickly recognize correct answers means there's also a quick way to find them?</p>"},{"location":"#solving-optimization-problems","title":"Solving optimization problems","text":"<ul> <li>Greedy method</li> <li>Dynamic programming (memoization or tabulation)</li> <li>Branch and bound (minimization problem only)</li> </ul>"},{"location":"#stable-property","title":"Stable property","text":"<p>Preserve the original order of elements with equal key</p>"},{"location":"#what-do-to-after-having-designed-a-solution","title":"What do to after having designed a solution","text":"<p>Testing on nominal cases then edge cases</p> <p>Time and space complexity</p>"},{"location":"#graph","title":"Graph","text":""},{"location":"#a-algorithm","title":"A* algorithm","text":"<p>Complete solution to find the shortest path to a target node</p> <p>Algorithm: - Put initial state in a priority queue - While priority queue is not empty: poll an element and inserts all neighbours - If target is reached, update a min variable</p> <p>Priority is computed using the evaluation function: f(n) = h + g where h is an heuristic (local cost to visit a node) and g is the cost so far (length of the path so far)</p>"},{"location":"#backedge-definition","title":"Backedge definition","text":"<p>An edge from a node to itself or to an ancestor</p>"},{"location":"#best-first-search-algorithm","title":"Best-first search algorithm","text":"<p>Greedy solution (non-complete) to find the shortest path to a target node</p> <p>Algorithm: - Put initial state in a priority queue - While target not reached: poll an element and inserts all neighbours</p> <p>Priority is computed using the evaluation function: f(n) = h where h is an heuristic (local cost to visit a node)</p>"},{"location":"#bfs-dfs-graph-traversal-use-cases","title":"BFS &amp; DFS graph traversal use cases","text":"<p>BFS: shortest path</p> <p>DFS: does a path exist, does a cycle exist (memo: D for Does)</p> <p>DFS stores a single path at a time, requires less memory than BFS (on average but same space complexity)</p>"},{"location":"#bfs-and-dfs-graph-traversal-time-and-space-complexity_1","title":"BFS and DFS graph traversal time and space complexity","text":"<p>Time: O(v + e) with v the number of vertices and e the number of edges</p> <p>Space: O(v)</p>"},{"location":"#bidirectional-search","title":"Bidirectional search","text":"<p>Run two simultaneous BFS, one from the source, one from the target</p> <p>Once their searches collide, we found a path</p> <p>If branching factor of a tree is b and the distance to the target vertex is d, then the normal BFS/DFS searching time complexity would we O(b^d)</p> <p>Here it is O(b^(d/2))</p>"},{"location":"#connected-graph-definition","title":"Connected graph definition","text":"<p>If there is a path between every pair of vertices, the graph is called connected</p> <p>Otherwise, the graph consists of multiple isolated subgraphs</p>"},{"location":"#difference-best-first-search-and-a-algorithms","title":"Difference Best-first search and A* algorithms","text":"<p>Best-first search is a greedy solution: not complete // a solution can be not optimal</p> <p>A*: complete</p>"},{"location":"#dijkstra-algorithm","title":"Dijkstra algorithm","text":"<p>Input: graph, initial vertex</p> <p>Output: for each vertex: shortest path and previous node // The previous node is the one we are coming from in the shortest path. To find the shortest path between two nodes, we need to iterate backwards.  Example: A -&gt; C =&gt; E, D, A</p> <p></p> <p>Algorithm: - Init the shortest distance to MAX except for the initial node - Init a priority queue where the comparator will be on the total distance so far - Init a set to store all visited node - Add initial vertex to the priority queue - While queue is not empty: Poll a vertex (mark it visited) and check the total distance to each neighbour (current distance + distance so far), update shortest and previous arrays if smaller. If destination was unvisited, adds it to the queue</p> <pre><code>void dijkstra(GraphAjdacencyMatrix graph, int initial) {\n    Set&lt;Integer&gt; visited = new HashSet&lt;&gt;();\n\n    int n = graph.vertex;\n    int[] shortest = new int[n];\n    int[] previous = new int[n];\n    for (int i = 0; i &lt; n; i++) {\n        if (i != initial) {\n            shortest[i] = Integer.MAX_VALUE;\n        }\n    }\n\n    // Entry: key=vertex, value=distance so far\n    PriorityQueue&lt;Entry&gt; minHeap = new PriorityQueue&lt;&gt;((e1, e2) -&gt; e1.value - e2.value);\n    minHeap.add(new Entry(initial, 0));\n\n    while (!minHeap.isEmpty()) {\n        Entry current = minHeap.poll();\n        int source = current.key;\n        int distanceSoFar = current.value;\n\n        // Get neighbours\n        List&lt;GraphAjdacencyMatrix.Edge&gt; edges = graph.getEdge(source);\n\n        for (GraphAjdacencyMatrix.Edge edge : edges) {\n            // For each neighbour, check the total distance\n            int distance = distanceSoFar + edge.distance;\n            if (distance &lt; shortest[edge.destination]) {\n                shortest[edge.destination] = distance;\n                previous[edge.destination] = source;\n            }\n\n            // Add the element in the queue if not visited\n            if (!visited.contains(edge.destination)) {\n                minHeap.add(new Entry(edge.destination, distance));\n            }\n        }\n\n        visited.add(source);\n    }\n\n    print(shortest);\n    print(previous);\n}\n</code></pre>"},{"location":"#dynamic-connectivity-problem","title":"Dynamic connectivity problem","text":"<p>Given a set of nodes and edges: are two nodes connected (directly or in-directly)?</p> <p>Two methods: - union(2, 5) // connect object 2 with object 5 - connected(1 , 6) // is object 1 connected to object 6?</p>"},{"location":"#further-reading_1","title":"Further Reading","text":"<ul> <li>Dynamic Connectivity Problem by Omar El Gabry</li> </ul>"},{"location":"#dynamic-connectivity-problem-quick-find-solution","title":"Dynamic connectivity problem - Quick-find solution","text":"<p>Array of integer of size N initialized with their index (0: 0, 1: 1 etc.).</p> <p>If two indexes have the same value, they belong to the same group.</p> <ul> <li>Is connected: id[p] == id[q] // O(1)</li> <li>Union: change all elements in the array whose value is equals to id[q] and set them to id[p] // O(n)</li> </ul>"},{"location":"#dynamic-connectivity-problem-quick-union-solution","title":"Dynamic connectivity problem - Quick-union solution","text":"<p>Init: integer array of size N</p> <p>Interpretation: id[i] is parent of i, root parent if id[i] == i</p> <ul> <li>Is connected: check if p and q have the same parent // O(n)</li> <li>Union: set the id of p's root to the id of q's root // O(n)</li> </ul>"},{"location":"#dynamic-connectivity-problem-weighted-quick-union-solution","title":"Dynamic connectivity problem - Weighted Quick-union solution","text":"<p>Modify quick-union to avoid tall tree</p> <p>Keep track of the size of each tree (number of nodes): extra array size[i] to count number of objects in the tree rooted at i</p> <p>O(n) extra space</p> <ul> <li>Union: link root of smaller tree to root of larger tree // O(log(n))</li> <li>Is connected: root(p) == root(q) // O(log(n))</li> </ul>"},{"location":"#given-n-tasks-from-0-to-n-1-and-a-list-of-relations-so-that-a-b-means-a-must-be-scheduled-before-b-how-to-know-if-it-is-possible-to-schedule-all-the-tasks-no-cycle","title":"Given n tasks from 0 to n-1 and a list of relations so that a -&gt; b means a must be scheduled before b, how to know if it is possible to schedule all the tasks (no cycle)","text":"<p>Solution: topological sort</p> <p>If there's a cycle in the relations, it means it is not possible to shedule all the tasks</p> <p>There is a cycle if the produced sorted array size is different from n</p>"},{"location":"#graph-definition","title":"Graph definition","text":"<p>A way to represent a network, or a collection of inteconnected objects</p> <p>G = (V, E) with V a set of vertices (or nodes) and E a set of edges (or links)</p>"},{"location":"#graph-traversal-bfs","title":"Graph traversal: BFS","text":"<p>Traverse broad into the graph by visiting the sibling/neighbor before children nodes (one level of children at a time)</p> <p>Iterative using a queue</p> <p>Algorithm: similar with tree except we need to mark the visited nodes, can start with any nodes</p> <pre><code>Queue&lt;Node&gt; queue = new LinkedList&lt;&gt;();\nNode first = graph.nodes.get(0);\nqueue.add(first);\nfirst.markVisitied();\n\nwhile (!queue.isEmpty()) {\n    Node node = queue.poll();\n    System.out.println(node.name);\n\n    for (Edge edge : node.connections) {\n        if (!edge.end.visited) {\n            queue.add(edge.end);\n            edge.end.markVisited();\n        }\n    }\n}\n</code></pre>"},{"location":"#graph-traversal-dfs","title":"Graph traversal: DFS","text":"<p>Traverse deep into the graph by visiting the children before sibling/neighbor nodes (traverse down one single path)</p> <p>Walk through a path, backtrack until we found a new path</p> <p>Algorithm: recursive or iterative using a stack (same algo than BFS except we use a queue instead of a stack)</p>"},{"location":"#how-to-compute-the-shortest-path-between-two-nodes-in-an-unweighted-graph","title":"How to compute the shortest path between two nodes in an unweighted graph","text":"<p>BFS traversal by using an array to keep track of the min distance distances[i] gives the shortest distance between the input node and the node of id i</p> <p>Algorithm: no need to keep track of the visited node, it is replaced by a test on the distance array</p> <pre><code>Queue&lt;Node&gt; queue = new LinkedList&lt;&gt;();\nqueue.add(parent);\nint[] distances = new int[graph.nodes.size()];\nArrays.fill(distances, -1);\ndistances[parent.id] = 0;\n\nwhile (!queue.isEmpty()) {\n    Node node = queue.poll();\n    for (Edge edge : node.connections) {\n        if (distances[edge.end.id] == -1) {\n            queue.add(edge.end);\n            distances[edge.end.id] = distances[node.id] + 1;\n        }\n    }\n}\n</code></pre>"},{"location":"#how-to-detect-a-cycle-in-a-directed-graph","title":"How to detect a cycle in a directed graph","text":"<p>Using DFS by marking the visited nodes, there is a cycle if a visited node is also part of the current stack</p> <p>The stack can be managed as a boolean array</p> <pre><code>boolean isCyclic(DirectedGraph g) {\n    boolean[] visited = new boolean[g.size()];\n    boolean[] stack = new boolean[g.size()];\n\n    for (int i = 0; i &lt; g.size(); i++) {\n        if (isCyclic(g, i, visited, stack)) {\n            return true;\n        }\n    }\n    return false;\n}\n\nboolean isCyclic(DirectedGraph g, int node, boolean[] visited, boolean[] stack) {\n    if (stack[node]) {\n        return true;\n    }\n\n    if (visited[node]) {\n        return false;\n    }\n\n    stack[node] = true;\n    visited[node] = true;\n\n    List&lt;DirectedGraph.Edge&gt; edges = g.getEdges(node);\n    for (DirectedGraph.Edge edge : edges) {\n        int destination = edge.destination;\n        if (isCyclic(g, destination, visited, stack)) {\n            return true;\n        }\n    }\n\n    // Backtrack\n    stack[node] = false;\n\n    return false;\n}\n</code></pre>"},{"location":"#how-to-detect-a-cycle-in-an-undirected-graph","title":"How to detect a cycle in an undirected graph","text":"<p>Using DFS</p> <p>Idea: for every visited vertex v, if there is an adjacent u such that u is already visited and u is not the parent of v, then there is a cycle</p> <pre><code>public boolean isCyclic(UndirectedGraph g) {\n    boolean[] visited = new boolean[g.size()];\n    for (int i = 0; i &lt; g.size(); i++) {\n        if (!visited[i]) {\n            if (isCyclic(g, i, visited, -1)) {\n                return true;\n            }\n        }\n    }\n    return false;\n}\n\nprivate boolean isCyclic(UndirectedGraph g, int v, boolean[] visited, int parent) {\n    visited[v] = true;\n\n    List&lt;UndirectedGraph.Edge&gt; edges = g.getEdges(v);\n    for (UndirectedGraph.Edge edge : edges) {\n        if (!visited[edge.destination]) {\n            if (isCyclic(g, edge.destination, visited, v)) {\n                return true;\n            }\n        } else if (edge.destination != parent) {\n            return true;\n        }\n    }\n    return false;\n}\n</code></pre>"},{"location":"#how-to-name-a-graph-with-directed-edges-and-without-cycle","title":"How to name a graph with directed edges and without cycle","text":"<p>Directed Acyclic Graph (DAG)</p>"},{"location":"#how-to-name-a-graph-with-few-edges-and-with-many-edges","title":"How to name a graph with few edges and with many edges","text":"<p>Sparse: few edges</p> <p>Dense: many edges</p>"},{"location":"#how-to-name-the-number-of-edges","title":"How to name the number of edges","text":"<p>Degree of a vertex</p>"},{"location":"#how-to-represent-the-edges-of-a-graph-structure-and-complexity","title":"How to represent the edges of a graph (structure and complexity)","text":"<ol> <li> <p>Using an adjacency matrix: two-dimensional array of boolean with a[i][j] is true if there is an edge between node i and j</p> </li> <li> <p>Time complexity: O(1)</p> </li> <li>Space complexity: O(v\u00b2) with v the number of vertices</li> </ol> <p>Problem: - If graph is undirected: half of the space is useless - If graph is sparse, we still have to consume O(v\u00b2) space</p> <ol> <li> <p>Using an adjacency list: array (or map) of linked list with a[i] represents the edges for the node i</p> </li> <li> <p>Time complexity: O(d) with d the degree of a vertex</p> </li> <li>Space complexity: O(2*e) with e the number of edges</li> </ol>"},{"location":"#topological-sort-complexity","title":"Topological sort complexity","text":"<p>Time and space: O(v + e)</p>"},{"location":"#topological-sort-technique","title":"Topological sort technique","text":"<p>If there is an edge from U to V, then U &lt;= V</p> <p>Possible only if the graph is a DAG</p> <p>Algo: - Create a graph representation (adjacency list) and an in degree counter (Map) - Zero them for each vertex - Fill the adjacency list and the in degree counter for each edge - Add in a queue each vertex whose in degree count is 0 (source vertex with no parent) - While the queue is not empty, poll a vertex from it then decrement the in degree of its children (no removal) <p>To check if there is a cycle, we must compare the size of the produced array to the number of vertices</p> <pre><code>List&lt;Integer&gt; sort(int vertices, int[][] edges) {\n    if (vertices == 0) {\n        return Collections.EMPTY_LIST;\n    }\n\n    List&lt;Integer&gt; sorted = new ArrayList&lt;&gt;(vertices);\n    // Adjacency list graph\n    Map&lt;Integer, List&lt;Integer&gt;&gt; graph = new HashMap&lt;&gt;();\n    // Count of incoming edges for each vertex\n    Map&lt;Integer, Integer&gt; inDegree = new HashMap&lt;&gt;();\n\n    for (int i = 0; i &lt; vertices; i++) {\n        inDegree.put(i, 0);\n        graph.put(i, new LinkedList&lt;&gt;());\n    }\n\n    // Init graph and inDegree\n    for (int[] edge : edges) {\n        int parent = edge[0];\n        int child = edge[1];\n\n        graph.get(parent).add(child);\n        inDegree.put(child, inDegree.get(child) + 1);\n    }\n\n    // Create a source queue and add each source (a vertex whose inDegree count is 0)\n    Queue&lt;Integer&gt; sources = new LinkedList&lt;&gt;();\n    for (Map.Entry&lt;Integer, Integer&gt; entry : inDegree.entrySet()) {\n        if (entry.getValue() == 0) {\n            sources.add(entry.getKey());\n        }\n    }\n\n    while (!sources.isEmpty()) {\n        int vertex = sources.poll();\n        sorted.add(vertex);\n\n        // For each vertex, we will decrease the inDegree count of its children\n        List&lt;Integer&gt; children = graph.get(vertex);\n        for (int child : children) {\n            inDegree.put(child, inDegree.get(child) - 1);\n            if (inDegree.get(child) == 0) {\n                sources.add(child);\n            }\n        }\n    }\n\n    // Topological sort is not possible as the graph has a cycle\n    if (sorted.size() != vertices) {\n        return new ArrayList&lt;&gt;();\n    }\n\n    return sorted;\n}\n</code></pre>"},{"location":"#travelling-salesman-problem","title":"Travelling salesman problem","text":"<p>Find the shortest possible route that visits every city (vertex) exactly once</p> <p>Possible solutions: - Greedy: nearest neighbour - Dynamic programming: compute optimal solution for a path of length n by using information already known for partial tours of length n-1 (time complexity: n^2 * 2^n)</p>"},{"location":"#two-types-of-graphs","title":"Two types of graphs","text":"<p>Directed graph (with directed edges)</p> <p>Undirected graph (with undirected edges)</p>"},{"location":"#greedy","title":"Greedy","text":""},{"location":"#best-first-search-algorithm_1","title":"Best-first search algorithm","text":"<p>Greedy solution (non-complete) to find the shortest path to a target node</p> <p>Algorithm: - Put initial state in a priority queue - While target not reached: poll an element and inserts all neighbours</p> <p>Priority is computed using the evaluation function: f(n) = h where h is an heuristic (local cost to visit a node)</p>"},{"location":"#greedy-algorithm","title":"Greedy algorithm","text":"<p>Algorithm paradigm of making the locally optimal choice at each stage using a heuristic function</p> <p>A locally optimal function does not necesseraly mean to not have a global context for taking a decision</p> <p>Never reconsider a choice (main difference with dynamic programming)</p> <p>Solution found may not be the most optimal one</p>"},{"location":"#greedy-algorithm-structure","title":"Greedy algorithm: structure","text":"<p>Often, the global context is spread into a priority queue</p>"},{"location":"#greedy-technique","title":"Greedy technique","text":"<p>Identify an optimal subproblem or substructure in the problem and determine how to reach it</p> <p>Focus on what you have now (don't think about what comes next)</p> <p>We may want to apply the traversal technique to have a global context for the identification part (a map of letters/positions etc.)</p>"},{"location":"#technique-optimization-problems-requiring-a-min-or-max","title":"Technique - Optimization problems requiring a min or max","text":"<p>Greedy technique</p>"},{"location":"#hash-table","title":"Hash Table","text":""},{"location":"#hash-table-complexity-search-insert-delete_1","title":"Hash table complexity: search, insert, delete","text":"<p>All: amortized O(1), worst O(n)</p>"},{"location":"#hash-table-implementation","title":"Hash table implementation","text":"<ul> <li>Array of linked list</li> <li>Hash code function to give the array index</li> </ul> <p>Resize the array when a threshold is reached</p> <p>If extreme nonuniform distribution, could be replaced by array of BST</p>"},{"location":"#heap","title":"Heap","text":""},{"location":"#binary-heap-min-heap-or-max-heap-complexity-insert-get-min-max-delete-min-max_1","title":"Binary heap (min-heap or max-heap) complexity: insert, get min (max), delete min (max)","text":"<p>Insert: O(log (n))</p> <p>Get min (max): O(1)</p> <p>Delete min: O(log n)</p>"},{"location":"#binary-heap-min-heap-or-max-heap-data-structure-used-for-the-implementation","title":"Binary heap (min-heap or max-heap) data structure used for the implementation","text":"<p>Using an array</p> <p>If children at index i: - Left children: 2 * i + 1 - Right children: 2 * i + 2 - Parent: (i - 1) / 2</p>"},{"location":"#binary-heap-min-heap-or-max-heap-definition","title":"Binary heap (min-heap or max-heap) definition","text":"<p>A binary heap is a a complete binary tree with min-heap or max-heap property ordering. Also called min heap or max heap.</p> <p>Min heap: each node smaller than its children, min value element at the root.</p> <p>Two operations: insert(), getMin()</p> <p>Difference BST: in a BST, each smaller element is on the left and greater element on the right, here a smaller element can be found on the left or the right side.</p>"},{"location":"#binary-heap-min-heap-or-max-heap-delete-min","title":"Binary heap (min-heap or max-heap) delete min","text":"<p>Replace min element (root) with the last node (left-most, lowest-level node because a binary heap is a complete binary tree)</p> <p>If violations, swap with the smallest child (level by level)</p>"},{"location":"#binary-heap-min-heap-or-max-heap-insert-algorithm","title":"Binary heap (min-heap or max-heap) insert algorithm","text":"<p>Insert node at the end (left-most spot because a binary heap is a complete binary tree)</p> <p>If violations, swap with parents until no more violation</p>"},{"location":"#binary-heap-min-heap-or-max-heap-use-cases","title":"Binary heap (min-heap or max-heap) use-cases","text":"<p>Priority queue</p>"},{"location":"#comparator-implementation-to-order-two-integers_1","title":"Comparator implementation to order two integers","text":"<p>Ordering, min-heap: (a, b) -&gt; a - b</p> <p>Reverse ordering, max-heap: (a, b) -&gt; b - a</p>"},{"location":"#convert-an-array-into-a-binary-heap-in-place","title":"Convert an array into a binary heap in place","text":"<p>For i from 0 to n-1, swap recursively element a[i] until min/max heap violation on its node</p>"},{"location":"#find-the-median-of-a-stream-of-numbers-2-methods-insertint-and-int-findmedian","title":"Find the median of a stream of numbers, 2 methods insert(int) and int findMedian()","text":"<p>Solution: two heap technique</p> <p>Keep two heaps and maintain the balance by transfering an element from one heap to another if not balanced</p> <p>Return the median (difference if even or odd)</p> <pre><code>// First half\nPriorityQueue&lt;Integer&gt; maxHeap = new PriorityQueue&lt;&gt;((a, b) -&gt; b - a);\n// Second half\nPriorityQueue&lt;Integer&gt; minHeap = new PriorityQueue&lt;&gt;();\n\npublic void insertNum(int n) {\n    // First element\n    if (minHeap.isEmpty()) {\n        minHeap.add(n);\n        return;\n    }\n\n    // Insert into min or max heap\n    Integer minSecondHalf = minHeap.peek();\n    if (n &gt;= minSecondHalf) {\n        minHeap.add(n);\n    } else {\n        maxHeap.add(n);\n    }\n\n    // Is balanced?\n    if (minHeap.size() &gt; maxHeap.size() + 1) {\n        maxHeap.add(minHeap.poll());\n    } else if (maxHeap.size() &gt; minHeap.size() + 1) {\n        minHeap.add(maxHeap.poll());\n    }\n}\n\npublic double findMedian() {\n    // Even\n    if (minHeap.size() == maxHeap.size()) {\n        return (double) (minHeap.peek() + maxHeap.peek()) / 2;\n    }\n\n    // Odd\n    if (minHeap.size() &gt; maxHeap.size()) {\n        return minHeap.peek();\n    }\n    return maxHeap.peek();\n}\n</code></pre>"},{"location":"#given-an-unsorted-array-of-numbers-find-the-k-largest-numbers-in-it","title":"Given an unsorted array of numbers, find the K largest numbers in it","text":"<p>Solution: using a min heap but we keep only K elements in it</p> <pre><code>public static List&lt;Integer&gt; findKLargestNumbers(int[] nums, int k) {\n    PriorityQueue&lt;Integer&gt; minHeap = new PriorityQueue&lt;&gt;();\n\n    // Put the first K numbers\n    for (int i = 0; i &lt; k; i++) {\n        minHeap.add(nums[i]);\n    }\n\n    // Iterate on the rest of the array\n    // Check whether the current element is bigger than the smallest one\n    for (int i = k; i &lt; nums.length; i++) {\n        if (nums[i] &gt; minHeap.peek()) {\n            minHeap.poll();\n            minHeap.add(nums[i]);\n        }\n    }\n\n    return toList(minHeap);\n}\n\npublic static List&lt;Integer&gt; toList(PriorityQueue&lt;Integer&gt; minHeap) {\n    List&lt;Integer&gt; list = new ArrayList&lt;&gt;(minHeap.size());\n    while (!minHeap.isEmpty()) {\n        list.add(minHeap.poll());\n    }\n\n    return list;\n}\n</code></pre> <p>Space complexity: O(k)</p>"},{"location":"#heapsort-algorithm","title":"Heapsort algorithm","text":"<ul> <li>Build a max heap from the array</li> <li>For i from n-1 to 0:</li> <li>Swap the largest element (at index 0) with i</li> <li>Heapify the remaining elements (0.. i -1) by putting the root element at its correct position (keep swapping element with biggest child until there is a max heap violation on a node)</li> </ul>"},{"location":"#is-binary-heap-stable","title":"Is binary heap stable?","text":"<p>Stable</p>"},{"location":"#time-complexity-to-build-a-binary-heap_1","title":"Time complexity to build a binary heap","text":"<p>O(n)</p>"},{"location":"#two-heaps-technique","title":"Two heaps technique","text":"<p>Keep two heaps: - A max heap for the first half - Then a min heap for the second half</p> <p>May be required to balance them to have at most a difference in terms of size of 1</p>"},{"location":"#why-binary-heap-over-bst-for-priority-queue","title":"Why binary heap over BST for priority queue?","text":"<p>BST needs an extra pointer to the min or max value (otherwise finding the min or max is O(log n))</p> <p>Implemented using an array: faster in practice (better locality, more cache friendly)</p> <p>Building a binary heap is O(n), instead of O(n log n) for a BST</p>"},{"location":"#linked-list","title":"Linked List","text":""},{"location":"#algorithm-to-reverse-a-linked-list","title":"Algorithm to reverse a linked list","text":"<pre><code>public ListNode reverse(ListNode head) {\n    ListNode previous = null;\n    ListNode current = head;\n\n    while (current != null) {\n        // Keep temporary next node\n        ListNode next = current.next;\n        // Change link\n        current.next = previous;\n        // Move previous and current\n        previous = current;\n        current = next;\n    }\n\n    return previous;\n}\n</code></pre>"},{"location":"#doubly-linked-list","title":"Doubly linked list","text":"<p>Each node contains a pointer to the previous and the next node</p>"},{"location":"#doubly-linked-list-complexity-access-insert-delete_1","title":"Doubly linked list complexity: access, insert, delete","text":"<p>Access: O(n)</p> <p>Insert: O(1)</p> <p>Delete: O(1)</p>"},{"location":"#get-the-middle-of-a-linked-list","title":"Get the middle of a linked list","text":"<p>Using the runner technique</p>"},{"location":"#iterate-over-two-linked-lists","title":"Iterate over two linked lists","text":"<pre><code>while (l1 != null || l2 != null) {\n\n}\n</code></pre>"},{"location":"#linked-list-complexity-access-insert-delete_1","title":"Linked list complexity: access, insert, delete","text":"<p>Access: O(n)</p> <p>Insert: O(1)</p> <p>Delete: O(1)</p>"},{"location":"#linked-list-questions-prerequisite","title":"Linked list questions prerequisite","text":"<p>Single or doubly linked list?</p>"},{"location":"#queue-implementations-and-insertdelete-complexity","title":"Queue implementations and insert/delete complexity","text":"<ol> <li>Linked list with pointers on head and tail</li> </ol> <p>Insert: O(1)</p> <p>Delete: O(1)</p> <ol> <li>Circular buffer if queue has a fixed size using a read and write pointer</li> </ol> <p>Insert: O(1)</p> <p>Delete: O(1)</p>"},{"location":"#ring-buffer-or-circular-buffer-structure","title":"Ring buffer (or circular buffer) structure","text":"<p>Data structure using a single, fixed-sized buffer as if it were connected end-to-end</p>"},{"location":"#what-if-we-need-to-iterate-backwards-on-a-singly-linked-list-in-constant-space-without-mutating-the-input","title":"What if we need to iterate backwards on a singly linked list in constant space without mutating the input?","text":"<p>Reverse the liked list (or a subpart only), implement the algo then reverse it again to the initial state</p>"},{"location":"#math","title":"Math","text":""},{"location":"#a-a-property","title":"a = a property","text":"<p>Reflexive</p>"},{"location":"#if-a-b-and-b-c-then-a-c-property","title":"If a = b and b = c then a = c property","text":"<p>Transitive</p>"},{"location":"#if-a-b-then-b-a-property","title":"If a = b then b = a property","text":"<p>Symmetric</p>"},{"location":"#logarithm-definition","title":"Logarithm definition","text":"<p>Inverse function to exponentiation</p> <ul> <li>log2(1) = 0</li> <li>log2(2) = 1</li> <li>log2(4) = 2</li> <li>log2(8) = 3</li> <li>log2(16) = 4</li> <li>etc.</li> </ul>"},{"location":"#median-of-a-sorted-array","title":"Median of a sorted array","text":"<p>If odd: middle value</p> <p>If even: average of the two middle values (1, 2, 3, 4 =&gt; (2 + 3) / 2 = 2.5)</p>"},{"location":"#n-choose-k-problems","title":"n-choose-k problems","text":"<p>From a set of n items, choose k items with 0 &lt;= k &lt;= n</p> <p>P(n, k)</p> <p>Order matters: n! / (n - k)! // How many permutations</p> <p>Order does not matter: n! / ((n - k)! k!) // How many combinations</p>"},{"location":"#probability-pa-b-inter","title":"Probability: P(a \u2229 b) // inter","text":"<p>P(a \u2229 b) = P(a) * P(b)</p>"},{"location":"#probability-pa-b-union","title":"Probability: P(a \u222a b) // union","text":"<p>P(a \u222a b) = P(a) + P(b) - P(a \u2229 b)</p>"},{"location":"#probability-pba-probability-of-a-knowing-b","title":"Probability: Pb(a) // probability of a knowing b","text":"<p>Pb(a) = P(a \u2229 b) / P(b)</p>"},{"location":"#queue","title":"Queue","text":""},{"location":"#dequeue-data-structure","title":"Dequeue data structure","text":"<p>Double ended queue for which elements can be added or removed from either the front (head) or the back (tail)</p>"},{"location":"#queue_1","title":"Queue","text":"<p>FIFO (First In First Out)</p>"},{"location":"#queue-implementations-and-insertdelete-complexity_1","title":"Queue implementations and insert/delete complexity","text":"<ol> <li>Linked list with pointers on head and tail</li> </ol> <p>Insert: O(1)</p> <p>Delete: O(1)</p> <ol> <li>Circular buffer if queue has a fixed size using a read and write pointer</li> </ol> <p>Insert: O(1)</p> <p>Delete: O(1)</p>"},{"location":"#recursion","title":"Recursion","text":""},{"location":"#how-to-handle-a-recursive-function-that-need-to-return-a-list","title":"How to handle a recursive function that need to return a list","text":"<p>Input: - Result List - Current iteration element</p> <p>Output: void</p> <pre><code>void f(List&lt;String&gt; result, String current) {\n    // Do something\n    result.add(...);\n}\n</code></pre>"},{"location":"#how-to-handle-a-recursive-function-that-need-to-return-a-maximum-value","title":"How to handle a recursive function that need to return a maximum value","text":"<p>Implementation: return max(f(a), f(b))</p>"},{"location":"#loop-inside-of-a-recursive-function","title":"Loop inside of a recursive function?","text":"<p>Might be a code smell. The iteration is already brought by the recursion itself.</p>"},{"location":"#sort","title":"Sort","text":""},{"location":"#bubble-sort-algorithm","title":"Bubble sort algorithm","text":"<p>Walk through a collection and compares 2 elements at a time</p> <p>If they are out of order, swap them</p> <p>Continue until the entire collection is sorted</p>"},{"location":"#bubble-sort-complexity-and-stability_1","title":"Bubble sort complexity and stability","text":"<p>Time: O(n\u00b2)</p> <p>Space: O(1)</p> <p>Stable</p>"},{"location":"#counting-sort-complexity-stability-use-case_1","title":"Counting sort complexity, stability, use case","text":"<p>Time complexity: O(n + k) // n is the number of elements, k is the range (the maximum element)</p> <p>Space complexity: O(k)</p> <p>Stable</p> <p>Use case: known and small range of possible integers</p>"},{"location":"#counting-sort-algorithm","title":"Counting sort algorithm","text":"<p>If range r is known</p> <p>1) Create an array of size r where each a[i] represents the number of occurences of i</p> <p>2) Modify the array to store the cumulative sum (if a=[1, 3, 0, 2] =&gt; [1, 4, 4, 6])</p> <p>3) Right shift the array with a backward iteration (element at index 0 is 0 =&gt; [0, 1, 4, 4])    Now a[i] represents the first index of i if array was sorted</p> <p>4) Create the sorted array by filling the elements from their first index</p>"},{"location":"#heapsort-algorithm_1","title":"Heapsort algorithm","text":"<ul> <li>Build a max heap from the array</li> <li>For i from n-1 to 0:</li> <li>Swap the largest element (at index 0) with i</li> <li>Heapify the remaining elements (0.. i -1) by putting the root element at its correct position (keep swapping element with biggest child until there is a max heap violation on a node)</li> </ul>"},{"location":"#heapsort-complexity-stability-use-case_1","title":"Heapsort complexity, stability, use case","text":"<p>Time: Theta(n log n)</p> <p>Space: O(1)</p> <p>Unstable</p> <p>Use case: space constrained environment with O(n log n) time guarantee</p> <p>Yet, not stable and not cache friendly</p>"},{"location":"#insertion-sort-algorithm","title":"Insertion sort algorithm","text":"<p>From i to 0..n, insert a[i] to its correct position to the left (0..i)</p> <p>Used by humans</p>"},{"location":"#insertion-sort-complexity-stability-use-case_1","title":"Insertion sort complexity, stability, use case","text":"<p>Time: O(n\u00b2)</p> <p>Space: O(1)</p> <p>Stable</p> <p>Use case: partially sorted structure</p>"},{"location":"#mergesort-algorithm","title":"Mergesort algorithm","text":"<p>Splits a collection into 2 halves, sort the 2 halves (recursive call) then merge them together to form one sorted collection</p> <pre><code>void mergeSort(int[] a) {\n    int[] helper = new int[a.length];\n    mergeSort(a, helper, 0, a.length - 1);\n}\n\nvoid mergeSort(int a[], int helper[], int lo, int hi) {\n    if (lo &lt; hi) {\n        int mid = (lo + hi) / 2;\n\n        mergeSort(a, helper, lo, mid);\n        mergeSort(a, helper, mid + 1, hi);\n        merge(a, helper, lo, mid, hi);\n    }\n}\n\nprivate void merge(int[] a, int[] helper, int lo, int mid, int hi) {\n    // Copy into helper\n    for (int i = lo; i &lt;= hi; i++) {\n        helper[i] = a[i];\n    }\n\n    int p1 = lo; // Pointer on the first half\n    int p2 = mid + 1; // Pointer on the second half\n    int index = lo; // Index of a\n\n    // Copy the smallest values from either the left or the right side back to the original array\n    while (p1 &lt;= mid &amp;&amp; p2 &lt;= hi) {\n        if (helper[p1] &lt;= helper[p2]) {\n            a[index] = helper[p1];\n            p1++;\n        } else {\n            a[index] = helper[p2];\n            p2++;\n        }\n        index++;\n    }\n\n    // Copy the eventual rest of the left side of the array into the target array\n    while (p1 &lt;= mid) {\n        a[index] = helper[p1];\n        index++;\n        p1++;\n    }\n}\n</code></pre>"},{"location":"#further-reading_2","title":"Further Reading","text":"<ul> <li>Making Sense of Merge Sort - Part 1 by Vaidehi Joshi</li> <li>Making Sense of Merge Sort - Part 2 by Vaidehi Joshi</li> </ul>"},{"location":"#mergesort-complexity-stability-use-case_1","title":"Mergesort complexity, stability, use case","text":"<p>Time: Theta(n log n)</p> <p>Space: O(n)</p> <p>Stable</p> <p>Use case: good worst case time complexity and stable, good with linked list</p>"},{"location":"#quicksort-algorithm","title":"Quicksort algorithm","text":"<p>Sort a collection by repeatedly choosing a pivot and partitioning the collection around it (smaller before, larger after)</p> <p>Here the pivot will be the last element of the subarray</p> <p>In an ideal world, the pivot would be the middle element so that we partition the array in two subsets of equal size</p> <p>The worst case is to find a pivot element at the top left or top right index of the subarray</p> <pre><code>void quickSort(int[] a) {\n    quickSort(a, 0, a.length - 1);\n}\n\nvoid quickSort(int a[], int lo, int hi) {\n    if (lo &lt; hi) {\n        int pivot = partition(a, lo, hi);\n        quickSort(a, lo, pivot - 1);\n        quickSort(a, pivot + 1, hi);\n    }\n}\n\n// Returns an index so that all element before that index are smaller\n// And all element after are bigger\nint partition(int a[], int lo, int hi) {\n    int pivot = a[hi];\n    int pivotIndex = lo; // Will represent the pivot index\n\n    // Iterate using the two pointers technique\n    for (int i = lo; i &lt; hi; i++) {\n        // If the current index is smaller, swap and increment pivot index\n        if (a[i] &lt;= pivot) {\n            swap(a, pivotIndex++, i);\n        }\n    }\n\n    swap(a, pivotIndex, hi);\n    return pivotIndex;\n}\n</code></pre>"},{"location":"#quicksort-complexity-stability-use-case_1","title":"Quicksort complexity, stability, use case","text":"<p>Time: best and average O(n log n), worst O(n\u00b2) if the array is already sorted in ascending or descending order</p> <p>Space: O(log n) // In-place sorting algorithm</p> <p>Not stable</p> <p>Use case: in practice, quicksort is often faster than merge sort due to better locality (not applicable with linked list so in this case we prefer mergesort)</p>"},{"location":"#radix-sort-algorithm","title":"Radix sort algorithm","text":"<p>Sort by applying counting sort on one digit at a time (least to most significant) Each new level must be stable (if equals, keep the order of the previous level)</p> <p>Example:</p> <ul> <li>53, 89, 150, 36, 633, 233</li> <li>Counting sort on digit 0 =&gt; 150, 53, 633, 36, 89</li> <li>Counting sort on digit 1 =&gt; 633, 233, 36, 150, 53, 89</li> <li>Counting sort on digit 2 =&gt; 36, 53, 89, 150, 233, 633 // If does not exist (like 36) it is replaced by 0</li> </ul>"},{"location":"#radix-sort-complexity-stability-use-case_1","title":"Radix sort complexity, stability, use case","text":"<p>Time complexity: O(nk) // n is the number of elements, k is the maximum number of digits for a number</p> <p>Space complexity: O(k)</p> <p>Stable</p> <p>Use case: if k &lt; log(n) (for example 1M of elements from 0..1000 as 4 &lt; log(1M))</p>"},{"location":"#selection-sort-algorithm","title":"Selection sort algorithm","text":"<p>From i to 0..n, find repeatedly the min element then swap it with i</p>"},{"location":"#selection-sort-complexity_1","title":"Selection sort complexity","text":"<p>Time: Theta(n\u00b2)</p> <p>Space: O(1)</p>"},{"location":"#shuffling-an-array","title":"Shuffling an array","text":"<p>Fisher-Yates shuffle algorithm: - Iterate over each element (i) - Pick a random index (from 0 to i included) and swap with the current element</p>"},{"location":"#stack","title":"Stack","text":""},{"location":"#stack_1","title":"Stack","text":"<p>LIFO (Last In First Out)</p>"},{"location":"#stack-implementations-and-insertdelete-complexity_1","title":"Stack implementations and insert/delete complexity","text":"<ul> <li>Linked list with a pointer on the head</li> </ul> <p>Insert: O(1)</p> <p>Delete: O(1)</p> <ul> <li>Array</li> </ul> <p>Insert: O(n), amortized time O(1)</p> <p>Delete: O(1)</p>"},{"location":"#string","title":"String","text":""},{"location":"#first-check-to-test-if-two-strings-are-a-permutation-or-a-rotation-of-each-other","title":"First check to test if two strings are a permutation or a rotation of each other","text":"<p>Same length</p>"},{"location":"#how-to-print-all-the-possible-permutations-of-a-string","title":"How to print all the possible permutations of a string","text":"<p>Recursion with backtracking</p> <pre><code>void permute(String s) {\n    permute(s, 0);\n}\n\nvoid permute(String s, int index) {\n    if (index == s.length() - 1) {\n        System.out.println(s);\n        return;\n    }\n\n    for (int i = index; i &lt; s.length(); i++) {\n        s = swap(s, index, i);\n        permute(s, index + 1);\n        s = swap(s, index, i);\n    }\n}\n</code></pre>"},{"location":"#rabin-karp-substring-search","title":"Rabin-Karp substring search","text":"<p>Searching a substring s in a string b takes O(s(b-s)) time</p> <p>Trick: compute the hash of each substring s</p> <p>Sliding window of size s</p> <p>Time complexity: O(b)</p> <p>If hash matches, check if the string are equals (as two different strings can have the same hash)</p>"},{"location":"#string-permutation-vs-rotation","title":"String permutation vs rotation","text":"<p>Permutation: contains the same characters in an order that can be different (abdc and dabc)</p> <p>Rotation: rotates according to a pivot</p>"},{"location":"#string-questions-prerequisite","title":"String questions prerequisite","text":"<p>Case sensitive?</p> <p>Encoding?</p>"},{"location":"#technique","title":"Technique","text":"<p>14 Patterns to Ace Any Coding Interview Question by Fahim ul Haq</p>"},{"location":"#01-knapsack-brute-force-technique","title":"0/1 Knapsack brute force technique","text":"<p>Recursive approach: solve f(c, i) with c is the remaining capacity and i is th current item index At each level, we branch with the item at index i (if enough capacity) and without it</p> <pre><code>public int knapsack(int[] profits, int[] weights, int c) {\n    return knapsack(profits, weights, c, 0, 0);\n}\n\npublic int knapsack(int[] profits, int[] weights, int c, int i, int sum) {\n    if (i == profits.length || c &lt;= 0) {\n        return sum;\n    }\n\n    // Not\n    int sum1 = knapsack(profits, weights, c, i + 1, sum);\n\n    // With\n    int sum2 = 0;\n    if (weights[i] &lt;= c) {\n        sum2 = knapsack(profits, weights, c - weights[i], i + 1, sum + profits[i]);\n    }\n\n    return Math.max(sum1, sum2);\n}\n</code></pre>"},{"location":"#01-knapsack-memoization-technique","title":"0/1 Knapsack memoization technique","text":"<p>Memoization: store a[c][i] (c is the remaining capacity, i is the current item index)</p> <p>As we need to store the 0 capacity, we have to init the array this way:</p> <p>int[][] a = new int[c + 1][n] // n is the number of items</p> <p>Time and space complexity: O(n * c)</p> <pre><code>public int knapsack(int[] profits, int[] weights, int capacity) {\n    // Capacity from 1 to n\n    Integer[][] a = new Integer[capacity][profits.length];\n    return knapsack(profits, weights, capacity, 0, 0, a);\n}\n\npublic int knapsack(int[] profits, int[] weights, int capacity, int i, int sum, Integer[][] a) {\n    if (i == profits.length || capacity == 0) {\n        return sum;\n    }\n\n    // If value already exists, return \n    if (a[capacity - 1][i] != null) {\n        return a[capacity][i];\n    }\n\n    // With\n    int sum1 = knapsack(profits, weights, capacity, i + 1, sum, a);\n    // Without\n    int sum2 = 0;\n    if (weights[i] &lt;= capacity) {\n        sum2 = knapsack(profits, weights, capacity - weights[i], i + 1, sum + profits[i], a);\n    }\n\n    a[capacity - 1][i] = Math.max(sum1, sum2);\n    return a[capacity - 1][i];\n}\n</code></pre>"},{"location":"#01-knapsack-tabulation-technique","title":"0/1 Knapsack tabulation technique","text":"<p>Two dimensional array: a[n + 1][c + 1] // n the number of items and c the max capacity</p> <p>First row and first column are set to 0</p> <p>a[row][col] represent the max profit with items 1..row at capacity col</p> <p>remainingWeight = col - itemWeight // col: current max capacity</p> <p>a[row][col] = max(a[row - 1][col], itemValue + a[row - 1][remainingWeight]) // max between item not selected and item selected + max remaining weight</p> <p>If remainingWeight &lt; 0, we can't chose the item so a[row][col] = a[row - 1][col]</p> <p>Return last element of the array</p> <pre><code>public int solveKnapsack(int[] profits, int[] weights, int capacity) {\n    int[][] a = new int[profits.length + 1][capacity + 1];\n\n    for (int row = 1; row &lt; profits.length + 1; row++) {\n        int value = profits[row - 1];\n        int weight = weights[row - 1];\n        for (int col = 1; col &lt; capacity + 1; col++) {\n            int remainingWeight = col - weight;\n            if (remainingWeight &lt; 0) {\n                a[row][col] = a[row - 1][col];\n            } else {\n                a[row][col] = Math.max(\n                        a[row - 1][col],\n                        value + a[row - 1][remainingWeight]\n                );\n            }\n        }\n    }\n\n    return a[profits.length][capacity];\n}\n</code></pre> <p>If we need to compute a result like \"determine if a subset exists\" that return a boolean, the array type is boolean[][]</p> <p>As we are only interested in the previous row, we can also use an int[2][n] array</p>"},{"location":"#backtracking-technique","title":"Backtracking technique","text":"<p>Solution for solving a problem recursively</p> <p>Loop: - apply() // Apply a change - try() // Try a solution - reverse() // Reverse apply</p>"},{"location":"#cyclic-sort-technique","title":"Cyclic sort technique","text":"<p>Iterate over each number of an array and swap it to its correct position</p> <p>At the end, we may iterate on the array to check which number is not at its correct position</p> <p>If numbers are not within the 1 to n range, we can simply drop them</p> <p>Alternative: marker technique (mark a result by setting a[i] to negative for example)</p>"},{"location":"#greedy-technique_1","title":"Greedy technique","text":"<p>Identify an optimal subproblem or substructure in the problem and determine how to reach it</p> <p>Focus on what you have now (don't think about what comes next)</p> <p>We may want to apply the traversal technique to have a global context for the identification part (a map of letters/positions etc.)</p>"},{"location":"#k-way-merge-technique","title":"K-way merge technique","text":"<p>Given K sorted array, technique to perform a sorted traversal of all the elements of all arrays</p> <ul> <li>First, push the first element of each array in a min heap</li> <li>While min heap not empty, take min element and push the next element of the same array</li> </ul> <p>We need to keep track of which structure the min element come from (tracking the array index or taking the next node if it's a linked list)</p>"},{"location":"#runner-technique","title":"Runner technique","text":"<p>Iterate over the linked list with two pointers simultaneously either with: - One ahead by a fixed amount - One faster</p> <p>This technique can also be applied on other problems where we need to find a cycle (f(slow) and f(f(fast)) may converge)</p>"},{"location":"#simplification-technique","title":"Simplification technique","text":"<p>Simplify the problem. If solvable, generalize to the initial problem.</p> <p>Example: sort the array first</p>"},{"location":"#sliding-window-technique","title":"Sliding window technique","text":"<p>Range of elements in a specific window size</p> <p>Two pointers left and right: - Move right while condition is valid - Move left if condition is not valid</p>"},{"location":"#subsets-technique","title":"Subsets technique","text":"<p>Technique to find all the possible permutations or combinations</p> <p>Start with an empty set, for each element of the input, add them to all the existing subsets to create new subsets</p> <p>Example: - Given [1, 5, 3] - =&gt; [] // Start - =&gt; [], [1] - =&gt; [], [1], [5], [1,5] - =&gt; [], [1], [5], [1,5], [3], [1,3], [1,5,3]</p> <p>For each level, we iterate from 0 to size // size is the fixed size of the list</p> <pre><code>List&lt;List&lt;Integer&gt;&gt; findSubsets(int[] a) {\n    List&lt;List&lt;Integer&gt;&gt; subsets = new ArrayList&lt;&gt;();\n    // Add subset []\n    subsets.add(new ArrayList&lt;&gt;());\n\n    for (int n : a) {\n        // Fix the current size\n        int size = subsets.size();\n        for (int i = 0; i &lt; size; i++) {\n            // Copy subset\n            ArrayList&lt;Integer&gt; newSubset = new ArrayList&lt;&gt;(subsets.get(i));\n            // Add element\n            newSubset.add(n);\n            subsets.add(newSubset);\n        }\n    }\n\n    return subsets;\n}\n</code></pre>"},{"location":"#technique-dealing-with-cycles-in-a-linked-list-or-an-array","title":"Technique - Dealing with cycles in a linked list or an array","text":"<p>Runner technique</p>"},{"location":"#technique-find-all-the-permutations-or-combinations","title":"Technique - Find all the permutations or combinations","text":"<p>Subsets technique or recursion + backtracking</p>"},{"location":"#technique-find-an-element-in-a-sorted-array-or-linked-list","title":"Technique - Find an element in a sorted array or linked list","text":"<p>Binary search</p>"},{"location":"#technique-find-or-calculate-something-among-all-the-contiguous-subarrays-of-a-given-size","title":"Technique - Find or calculate something among all the contiguous subarrays of a given size","text":"<p>Sliding window technique</p> <p>Example: - Given an array, find the average of all subarrays of size \u2018K\u2019 in it</p>"},{"location":"#technique-find-the-longestshortest-substring-or-subarray","title":"Technique - Find the longest/shortest substring or subarray","text":"<p>Sliding window technique</p> <p>Example: - Longest substring with K distinct characters - Longest substring without repeating characters</p>"},{"location":"#technique-find-the-smallestlargestmedian-element-of-a-set","title":"Technique - Find the smallest/largest/median element of a set","text":"<p>Two heaps technique</p>"},{"location":"#technique-finding-a-certain-element-in-a-linked-list-eg-middle","title":"Technique - Finding a certain element in a linked list (e.g. middle)","text":"<p>Runner technique</p>"},{"location":"#technique-given-a-sorted-array-find-a-set-of-elements-that-fullfill-certain-conditions","title":"Technique - Given a sorted array, find a set of elements that fullfill certain conditions","text":"<p>Two pointers technique</p> <p>Example: - Given a sorted array and a target sum, find a pair in the array whose sum is equal to the given target - Given an array of unsorted numbers, find all unique triplets in it that add up to zero - Comparing strings containing backspaces</p>"},{"location":"#technique-given-an-array-of-size-n-containing-integer-from-1-to-n-eg-with-one-duplicate","title":"Technique - Given an array of size n containing integer from 1 to n (e.g. with one duplicate)","text":"<p>Cyclic sort technique</p>"},{"location":"#technique-given-time-intervals","title":"Technique - Given time intervals","text":"<p>Traversal technique</p> <p>Iterate with two pointers, one over the starts, another one over the ends</p> <p>Handle the element with the lowest value first and generate an event</p> <p>Example: how many rooms for n meetings =&gt; meeting started, meeting started, meeting ended etc.</p>"},{"location":"#technique-how-to-get-the-k-biggestsmallestfrequent-elements","title":"Technique - How to get the K biggest/smallest/frequent elements","text":"<p>Top K elements technique</p>"},{"location":"#technique-optimization-problems-requiring-a-min-or-max_1","title":"Technique - Optimization problems requiring a min or max","text":"<p>Greedy technique</p>"},{"location":"#technique-problems-featuring-a-list-of-sorted-arrays-merge-or-find-the-smallest-element","title":"Technique - Problems featuring a list of sorted arrays (merge or find the smallest element)","text":"<p>K-way merge technique</p>"},{"location":"#technique-scheduling-problem-with-n-tasks-where-each-task-can-have-constraints-to-be-completed-before-others","title":"Technique - Scheduling problem with n tasks where each task can have constraints to be completed before others","text":"<p>Topological sort technique</p>"},{"location":"#technique-situations-like-priority-queue-or-scheduling","title":"Technique - Situations like priority queue or scheduling","text":"<p>Heap data structure</p> <p>Possibly two heaps technique</p>"},{"location":"#top-k-elements-technique-biggest-and-smallest","title":"Top K elements technique (biggest and smallest)","text":"<p>Finding the K biggest elements: - Min heap - Add k elements - Then iterate over the remaining elements, if current &gt; min =&gt; remove min, add current</p> <p>Finding the k smallest elements: - Max heap - Add k elements - Then iterate over the remaining elements, if current &lt; max =&gt; remove max, add current</p>"},{"location":"#topological-sort-technique_1","title":"Topological sort technique","text":"<p>If there is an edge from U to V, then U &lt;= V</p> <p>Possible only if the graph is a DAG</p> <p>Algo: - Create a graph representation (adjacency list) and an in degree counter (Map) - Zero them for each vertex - Fill the adjacency list and the in degree counter for each edge - Add in a queue each vertex whose in degree count is 0 (source vertex with no parent) - While the queue is not empty, poll a vertex from it then decrement the in degree of its children (no removal) <p>To check if there is a cycle, we must compare the size of the produced array to the number of vertices</p> <pre><code>List&lt;Integer&gt; sort(int vertices, int[][] edges) {\n    if (vertices == 0) {\n        return Collections.EMPTY_LIST;\n    }\n\n    List&lt;Integer&gt; sorted = new ArrayList&lt;&gt;(vertices);\n    // Adjacency list graph\n    Map&lt;Integer, List&lt;Integer&gt;&gt; graph = new HashMap&lt;&gt;();\n    // Count of incoming edges for each vertex\n    Map&lt;Integer, Integer&gt; inDegree = new HashMap&lt;&gt;();\n\n    for (int i = 0; i &lt; vertices; i++) {\n        inDegree.put(i, 0);\n        graph.put(i, new LinkedList&lt;&gt;());\n    }\n\n    // Init graph and inDegree\n    for (int[] edge : edges) {\n        int parent = edge[0];\n        int child = edge[1];\n\n        graph.get(parent).add(child);\n        inDegree.put(child, inDegree.get(child) + 1);\n    }\n\n    // Create a source queue and add each source (a vertex whose inDegree count is 0)\n    Queue&lt;Integer&gt; sources = new LinkedList&lt;&gt;();\n    for (Map.Entry&lt;Integer, Integer&gt; entry : inDegree.entrySet()) {\n        if (entry.getValue() == 0) {\n            sources.add(entry.getKey());\n        }\n    }\n\n    while (!sources.isEmpty()) {\n        int vertex = sources.poll();\n        sorted.add(vertex);\n\n        // For each vertex, we will decrease the inDegree count of its children\n        List&lt;Integer&gt; children = graph.get(vertex);\n        for (int child : children) {\n            inDegree.put(child, inDegree.get(child) - 1);\n            if (inDegree.get(child) == 0) {\n                sources.add(child);\n            }\n        }\n    }\n\n    // Topological sort is not possible as the graph has a cycle\n    if (sorted.size() != vertices) {\n        return new ArrayList&lt;&gt;();\n    }\n\n    return sorted;\n}\n</code></pre>"},{"location":"#traversal-technique","title":"Traversal technique","text":"<p>Traverse the input and generate another data structure or optional events</p> <p>Start the problem from this new state</p>"},{"location":"#two-heaps-technique_1","title":"Two heaps technique","text":"<p>Keep two heaps: - A max heap for the first half - Then a min heap for the second half</p> <p>May be required to balance them to have at most a difference in terms of size of 1</p>"},{"location":"#two-pointers-technique","title":"Two pointers technique","text":"<p>Two pointers iterating through the data structure in tandem until one or both pointers hit a certain condition</p> <p>Often useful when structure is sorted. If not sorted, we may want to sort it first.</p> <p>Most of the times (not always): first pointer is at the start, the second pointer is at the end</p> <p>The two pointers can also be on two different ds, still iterating in tandem (e.g. comparing strings containing backspaces)</p> <p>Time complexity is linear</p>"},{"location":"#what-if-we-need-to-iterate-backwards-on-a-singly-linked-list-in-constant-space-without-mutating-the-input_1","title":"What if we need to iterate backwards on a singly linked list in constant space without mutating the input?","text":"<p>Reverse the liked list (or a subpart only), implement the algo then reverse it again to the initial state</p>"},{"location":"#tree","title":"Tree","text":""},{"location":"#2-3-tree","title":"2-3 tree","text":"<p>Self-balanced BST =&gt; O(log n) complexity</p> <p>Either: - 2-node: contains a single value and has two children - 3-node: contains two values and has three children - Leaf: 1 or 2 keys</p> <p>Insert: find proper leaf and insert the value in-place. If the leaf has 3 values (called temporary 4-node), split the node into three 2-node and insert the middle value into the parent.</p>"},{"location":"#avl-tree","title":"AVL tree","text":"<p>If tree is not balanced, rearange the nodes with single or double rotations</p>"},{"location":"#b-tree-complexity-access-insert-delete_1","title":"B-tree complexity: access, insert, delete","text":"<p>All: O(log n)</p>"},{"location":"#b-tree-definition-and-use-case","title":"B-tree: definition and use case","text":"<p>Self-balanced BST =&gt; O(log n) complexity</p> <p>Can have more than two children (generalization of 2-3 tree)</p> <p>Use-case: huge amount of data that cannot fit in main memory but disk space.</p> <p>Height is kept low to reduce the disk accesses.</p> <p>Match how page disk are working</p>"},{"location":"#balanced-binary-tree-definition","title":"Balanced binary tree definition","text":"<p>The balance factor of each node (the difference between the two subtree heights) should never exceed 1</p> <p>Guarantee of O(log n) search</p>"},{"location":"#balanced-bst-use-case-b-tree-red-black-tree-avl-tree","title":"Balanced BST use case: B-tree, Red-black tree, AVL tree","text":"<ul> <li>B-tree: paging from disk (database)</li> <li>Red-black tree: fairly frequents inserts, deletes or retrievals</li> <li>AVL tree: many retrievals, infrequent inserts and deletes</li> </ul>"},{"location":"#bfs-and-dfs-tree-traversal-time-and-space-complexity_1","title":"BFS and DFS tree traversal time and space complexity","text":"<p>BFS: time O(v), space O(v)</p> <p>DFS: time O(v), space O(h) (height of the tree)</p>"},{"location":"#binary-tree-bfs-traversal","title":"Binary tree BFS traversal","text":"<p>Level order traversal (level by level)</p> <p>Iterative algorithm: use a queue, put the root, iterate while queue is not empty</p> <pre><code>Queue&lt;Node&gt; queue = new LinkedList&lt;&gt;();\nqueue.add(root);\n\nwhile(!queue.isEmpty()) {\n    Node node = queue.poll();\n    visit(node);\n\n    if(node.left != null) {\n        queue.add(node.left);\n    }\n    if(node.right != null) {\n        queue.add(node.right);\n    }\n}\n</code></pre>"},{"location":"#binary-tree-definition","title":"Binary tree definition","text":"<p>Tree with each node having up to two children</p>"},{"location":"#binary-tree-dfs-traversal-in-order-pre-order-and-post-order","title":"Binary tree DFS traversal: in-order, pre-order and post-order","text":"<ul> <li>In-order: left-root-right</li> <li>Pre-order: root-left-right</li> <li>Post-order: left-right-root</li> </ul> <p>It's depth first so:</p> <p></p> <ul> <li>In-order: 1, 2, 3, 4, 5, 6, 7</li> <li>Pre-order: 3, 2, 1, 5, 4, 6, 7</li> <li>Post-order: 1, 2, 4, 7, 6, 5, 3</li> </ul>"},{"location":"#binary-tree-complete","title":"Binary tree: complete","text":"<p>Every level of the tree is fully filled, with last level filled from the left to the right</p>"},{"location":"#binary-tree-full","title":"Binary tree: full","text":"<p>Each node has 0 or 2 children</p>"},{"location":"#binary-tree-perfect","title":"Binary tree: perfect","text":"<p>2^l - 1 nodes with l the level: 1, 3, 7, etc. nodes</p> <p>Every level is fully filled</p>"},{"location":"#bst-complexity-access-insert-delete","title":"BST complexity: access, insert, delete","text":"<p>If not balanced O(n)</p> <p>If balanced O(log n)</p>"},{"location":"#bst-definition","title":"BST definition","text":"<p>Binary tree in which every node must fit the property: all left descendents &lt;= n &lt; all right descendents</p> <p>Implementation: optional key, value, left, right</p>"},{"location":"#bst-delete-algo-and-complexity_1","title":"BST delete algo and complexity","text":"<p>Find inorder successor and swap it</p> <p>Average: O(log n)</p> <p>Worst: O(h) if not self-balanced BST, otherwise O(log n)</p>"},{"location":"#bst-insert-algo","title":"BST insert algo","text":"<p>Search for key or value (by recursively going left or right depending on the comparison) then insert a new node or reset the value (no swap)</p> <p>Complexity: worst O(n)</p> <pre><code>public TreeNode insert(TreeNode root, int a) {\n    if (root == null) {\n        return new TreeNode(a);\n    }\n\n    if (root.val &lt;= a) { // Left\n        root.left = insert(root.left, a);\n    } else { // Right\n        root.right = insert(root.right, a);\n    }\n\n    return root;\n}\n</code></pre>"},{"location":"#bst-questions-prerequisite","title":"BST questions prerequisite","text":"<p>Is it a self-balanced BST? (impacts: O(log n) time complexity guarantee)</p>"},{"location":"#complexity-to-create-a-trie_1","title":"Complexity to create a trie","text":"<p>Time and space: O(n * l) with n the number of words and l the longest word length</p>"},{"location":"#complexity-to-insert-a-key-in-a-trie_1","title":"Complexity to insert a key in a trie","text":"<p>Time: O(k) with k the size of the key</p> <p>Space: O(1) iterative, O(k) recursive</p>"},{"location":"#complexity-to-search-for-a-key-in-a-trie_1","title":"Complexity to search for a key in a trie","text":"<p>Time: O(k) with k the size of the key</p> <p>Space: O(1) iterative or O(k) recursive</p>"},{"location":"#given-a-binary-tree-algorithm-to-populate-an-array-to-represent-its-level-by-level-traversal","title":"Given a binary tree, algorithm to populate an array to represent its level-by-level traversal","text":"<p>Solution: BFS by popping only a fixed number of elements (queue.size)</p> <pre><code>public static List&lt;List&lt;Integer&gt;&gt; traverse(TreeNode root) {\n    List&lt;List&lt;Integer&gt;&gt; result = new LinkedList&lt;&gt;();\n    Queue&lt;TreeNode&gt; queue = new LinkedList&lt;&gt;();\n    queue.add(root);\n    while (!queue.isEmpty()) {\n        List&lt;Integer&gt; level = new ArrayList&lt;&gt;();\n\n        int levelSize = queue.size();\n        // Pop only levelSize elements\n        for (int i = 0; i &lt; levelSize; i++) {\n            TreeNode current = queue.poll();\n            level.add(current.val);\n            if (current.left != null) {\n                queue.add(current.left);\n            }\n            if (current.right != null) {\n                queue.add(current.right);\n            }\n        }\n        result.add(level);\n    }\n    return result;\n}\n</code></pre>"},{"location":"#how-to-calculate-the-path-number-of-a-node-while-traversing-using-dfs","title":"How to calculate the path number of a node while traversing using DFS?","text":"<p>Example: 1 -&gt; 7 -&gt; 3 gives 173</p> <p>Solution: sum = sum * 10 + n</p> <pre><code>private int dfs(TreeNode node, int sum) {\n    if (node == null) {\n        return 0;\n    }\n\n    sum = 10 * sum + node.val;\n\n    // Do something\n}\n</code></pre>"},{"location":"#min-or-max-value-in-a-bst","title":"Min (or max) value in a BST","text":"<p>Move recursively on the left (on the right)</p>"},{"location":"#red-black-tree","title":"Red-Black tree","text":"<p>Self-balanced BST =&gt; O(log n) complexity</p> <ul> <li>Root node always black</li> <li>Incoming node is red</li> <li>Red violation: child and parent are red</li> <li>Resolve violation by recoloring and/or restructuring</li> </ul>"},{"location":"#further-reading_3","title":"Further Reading","text":"<p>Binary Trees: Red Black by David Pynes</p>"},{"location":"#red-black-tree-complexity-access-insert-delete_1","title":"Red-black tree complexity: access, insert, delete","text":"<p>All: O(log n)</p>"},{"location":"#reverse-a-binary-tree-algo","title":"Reverse a binary tree algo","text":"<pre><code>public void reverse(Node node) {\n    if (node == null) {\n        return;\n    }\n\n    Node temp = node.right;\n    node.right = node.left;\n    node.left = temp;\n\n    reverse(node.left);\n    reverse(node.right);\n}\n</code></pre>"},{"location":"#trie-definition-implementation-and-use-case","title":"Trie definition, implementation and use case","text":"<p>Tree-like data structure with empty root and where each node store characters</p> <p>Each path down the tree represent a word (until a null node that represents the end of the word)</p> <p>Usually implemented using a map of children (or a fixed size array with ASCII charset for example)</p> <p>Use case: dictionnary (save memory)</p> <p>Also known as prefix tree</p>"},{"location":"#why-to-use-bst-over-hash-table","title":"Why to use BST over hash table","text":"<p>Sorted keys</p> <p>#tree</p>"},{"location":"anki/","title":"Anki","text":"<p>Anki is a free software (Windows/Mac/Linux/iPhone/Android) designed to help remembering information. Anki relies on the concept of spaced repetition which is a proven technique to increase the rate of memorization. Here's a 2-minute video that delves into spaced repetition:</p> <p>Michael A. Nielsen, \"Augmenting Long-term Memory\"</p> <p>The single biggest change that Anki brings about is that it means memory is no longer a haphazard event, to be left to chance. Rather, it guarantees I will remember something, with minimal effort. That is, Anki makes memory a choice.</p> <p>I used Anki myself with Algo Deck and Design Deck and it paid off. This method played a key role in helping me land a role as L5 SWE at Google (senior software engineer).</p> <p>Here is a flashcard example:</p> <p></p> <p>The Anki versions (a clone of the flashcards from this repo) are available via a one-time GitHub sponsorship:</p> <ul> <li>Algo Deck: $19 tier.</li> <li>Design Deck: $21 tier.</li> <li>Algo Deck and Design Deck: $29 tier.</li> </ul> <p>Trusted by over 100 developers.</p>"},{"location":"designdeck/","title":"Design Deck","text":"Anki <p>Check the Anki version here.</p>"},{"location":"designdeck/#cache","title":"Cache","text":""},{"location":"designdeck/#cache-aside","title":"Cache aside","text":"<p>Application is responsible for reading and writing to the DB (using write-through or write-back policy)</p> <p>The cache doesn't interact with the storage directly</p> <p></p>"},{"location":"designdeck/#cache-aside-vs-read-through","title":"Cache aside vs. read-through","text":"<p>Cache aside: - Data model can be different from DB</p> <p>Read-through: - Same data model as DB - Can use the refresh-ahead pattern</p>"},{"location":"designdeck/#cache-eviction-policy","title":"Cache eviction policy","text":"<ul> <li>LRU (Least Recently Used)</li> <li>LFU (Least Frequently Used)</li> <li>FIFO</li> </ul>"},{"location":"designdeck/#cache-locations","title":"Cache locations","text":"<ul> <li>Client caching</li> <li>CDN</li> <li>In memory</li> <li>Distributed cache</li> <li>Database caching (query or object)</li> </ul>"},{"location":"designdeck/#cache-refresh-ahead","title":"Cache: refresh-ahead","text":"<p>Cache to automatically refresh any recently accessed entry prior to its expiration</p> <p>Used with read-through cache</p> <ul> <li>Pro: can result in reduced latency</li> <li>Con: not accurately predicting which items are likely to be needed in the future</li> </ul>"},{"location":"designdeck/#cache-write-through-vs-write-back","title":"Cache: write through vs. write back","text":"<p>Main difference: consistency</p> <p>Write through: 1. Write to the cache and the DB in a single DB transaction (may still lead to cache inconsistency if the DB commit failed) 2. Return</p> <p>Write back: 1. Write to the cache 2. Return 3. Asynchronously store in DB</p>"},{"location":"designdeck/#four-main-distributed-cache-benefits","title":"Four main distributed cache benefits","text":"<ul> <li>Improve read latency</li> <li>Can improve availability (e.g., DB unavailable, responses are served from the cache)</li> <li>Save computation time (e.g., SQL computation)</li> <li>Independently scalable from the rest of the system</li> </ul>"},{"location":"designdeck/#main-metric-for-cache","title":"Main metric for cache","text":"<p>Cache hit ratio: hits / total accesses</p>"},{"location":"designdeck/#read-through-cache","title":"Read-through cache","text":"<p>Read-through cache sits in-line with the DB</p> <p>Single entry point</p>"},{"location":"designdeck/#when-to-use-a-cache","title":"When to use a cache","text":"<ul> <li>Speed up reads</li> <li>Response complex to compute</li> </ul>"},{"location":"designdeck/#cloud","title":"Cloud","text":""},{"location":"designdeck/#cdn","title":"CDN","text":"<p>Content Delivery Network</p> <p>Network of geographically dispersed servers used to deliver static content (images, CSS, Javascript files, etc.)</p> <p>Two kinds of CDNs: - Push CDN: we are responsible for providing the content - Pull CDN: CDN is responsible for pulling the right content (expiration to be used)</p> <p>Pull is easier to handle whereas push gives us more flexibility</p> <p>Use case for pull: Docker Hub S3 layer</p>"},{"location":"designdeck/#db","title":"DB","text":""},{"location":"designdeck/#3-main-reasons-to-partition-data","title":"3 main reasons to partition data","text":"<ul> <li>Scalability</li> <li>Improve performance of write heavy systems (usually, for example, key range partitioning can improve reads)</li> <li>Dataset doesn\u2019t fit into a single node</li> </ul>"},{"location":"designdeck/#acid-property","title":"ACID property","text":"<ul> <li> <p>Atomic: all transaction succeeds or none does (all or nothing)</p> </li> <li> <p>Consistency: from one valid state to another (invariants must always be true)</p> </li> </ul> <p>Not necessarily a property of the DB (e.g., foreign key constraint), can be a property of the application (e.g., credits and debits must be balanced)</p> <p>Different from consistency in eventual consistency (which is more about convergence as the matter is replicating data)</p> <ul> <li>Isolation: a transaction is not affected by another ongoing transaction (a transaction cannot read from another transaction that has not yet been completed)</li> </ul> <p>Refers to serializability</p> <ul> <li>Durability: once a transaction is committed, it will remain in the system</li> </ul>"},{"location":"designdeck/#anti-entropy","title":"Anti-entropy","text":"<p>Optimization to favor latency over consistency when writing to a DB (e.g., leaderless replication)</p> <p>Background process to constantly looks for differences in data</p> <p>Could be used as an alternative or in conjunction with read repair</p>"},{"location":"designdeck/#byzantine-fault-tolerant","title":"Byzantine fault-tolerant","text":"<p>A system is Byzantine fault-tolerant if it continues to operate correctly if in the case of a Bizantine's problem (some of the nodes malfunctioning, not obeying the protocol or malicious attackers).</p>"},{"location":"designdeck/#calm-theorem","title":"CALM theorem","text":"<p>Consistency As Logical Monotonicity</p> <p>A program has a consistent, coordination-free (e.g., consensus-free) distributed implementation if and only if it is monotonic</p> <p>Consistency in this context doesn't mean linearizability. It focuses on the consistency of the program's output while traditional consistency focus on the consistency of reads and writes.</p> <p>In CALM, a consistent program is one that produces the same output no matter in which order the inputs are processed and despite any conflicts.</p> <p>Said differently, does the implementation produce the outcome we expect despite any race condition that may arise.</p>"},{"location":"designdeck/#cap-theorem","title":"CAP theorem","text":"<p>Consistency, availability, partition tolerance (e.g., one node cut off from the rest of the cluster because of a network partition) =&gt; pick 2 out of 3</p> <p>C refers to linearizability</p>"},{"location":"designdeck/#caveat-of-serializability","title":"Caveat of serializability","text":"<p>It's possible that serial order is different from the order in which transactions were actually run (latest may not win)</p> <p>If not, we need a stricter isolation level: strict serializability (serializability + linearizability)</p>"},{"location":"designdeck/#chain-replication","title":"Chain replication","text":"<p>Replication protocol that uses a different topology than leader based replication protocols like Raft</p> <p>Left-most process referred as the chain's head, right-most as the chain's tail: - Client send writes to the head, which updates its local state and forwards to the next process in the chain - Next process updates its local state and forwards to the next process in the chain - Etc. - Once the update is received by the tail, the ack flows back to the head which replies to the client that the write succeeded</p> <p></p> <p>Fault tolerance is delegated to a dedicated component: control plane - If head fails: the control plane removes it and makes the next as the head - If intermediate node fails: the control plane removes it temporarily from the chain, and then adds it back eventually as the tail - If tail fails: the control plane removes it and makes the predecessor as the new tail</p> <p>Benefits: - Strongly consistent protocol - Reads are served from the tail without contacting other replicas first which allows a lower response time</p> <p>Drawbacks: - Writes are slower than quorum-based replication. - A single slow node can slow down all writes. - As reads are served from a single node, it can't be scaled horizontally. A mitigation is to allow intermediate nodes to serve reads but they can do it only if a read is considered as clean (the ack for this object has been returned to the predecessor). // The tail serves as the authority of the latest clean version</p> <p>Notes: - To avoid the overhead of having a single node handling the writes, we can find a way to shard data and handle multiple chains (see https://engineering.fb.com/2022/05/04/data-infrastructure/delta/)</p>"},{"location":"designdeck/#chain-replication-vs-consensus","title":"Chain replication vs. consensus","text":"<p>Similar consistency guarantees</p> <p>Chain replication: - Optimized for reads for CP systems - Better read availability: a chain of n nodes can tolerate up to n-2 nodes failure</p> <p>Example with 5 nodes:     - Chain replication: tolerate up to 3 nodes failure     - Consensus with R=3 and W=3: tolerate up to 2 nodes failure</p> <p>Consensus: - Optimized for writes for CP systems</p>"},{"location":"designdeck/#change-data-capture-cdc","title":"Change data capture (CDC)","text":"<p>A datastore is selected as the authoritative source of data where all update operations are performed</p> <p>An event log is then created from this datastore that is consumed by all the remaining operations the same way as in event sourcing</p>"},{"location":"designdeck/#concurrency-control","title":"Concurrency control","text":"<p>Ensures that correct results for concurrent operations are generated</p> <p>Pessimistic: lock (mutual exclusion)</p> <p>Optimistic: checks for conflicts at the end of a transaction</p> <p>In the end, concurrency control serves the same purpose as atomicity</p>"},{"location":"designdeck/#consensus","title":"Consensus","text":"<p>Set of processes agreeing on some data value in a fault-tolerant way</p> <p>Satisfies safety and liveness</p>"},{"location":"designdeck/#consistency-models","title":"Consistency models","text":"<p>Describe what expectations clients might have in terms of possible returned values despite the existence of multiple copies of data and concurrent access to it</p> <p>Not the C in ACID but the C in CAP (converging to an end state)</p> <p></p> <ul> <li> <p>Eventual consistency: all the nodes converge to the same state (not necessarily the latest)</p> </li> <li> <p>Write follow reads: ensures that writes are ordered after writes that were observed by previous read operations</p> </li> </ul> <p>Example:     - P1 reads value =&gt; foo     - P1 updates value to bar       =&gt; Every node will converge to bar (a process can't read bar, then foo, regardless of the process)       Also known as session causality</p> <ul> <li> <p>Monotonic reads consistency: a client doing several reads in sequence will never go backward in time</p> </li> <li> <p>Monotonic writes consistency: values originating from the same client appear in the order the client has executed them</p> </li> <li> <p>Read-after-write-consistency: if a client performs a write, then this write if visible during subsequent reads</p> </li> </ul> <p>Also known as read-your-writes consistency</p> <ul> <li> <p>Causal consistency: operations that are causally related need to be seen in the same order by all the nodes</p> </li> <li> <p>Sequential consistency: operations appear to take place in some total order, and that order is consistent with the order of operations from each individual clients</p> </li> </ul> <p>Twitter example: no guarantee between which tweet is seen first between two friends posting at the same time, but the ordering is guaranteed for the same friend</p> <ul> <li>Linearizability: make a system appear as if there is only a single copy of the data and all operations are atomic (one operation at a time)</li> </ul> <p>Even though there may be multiple replicas, the application does not need to worry about them</p> <p>C in CAP</p> <p>Real time guarantees</p>"},{"location":"designdeck/#cqrs","title":"CQRS","text":"<p>Command Query Responsibility Segregation</p> <p>Dissociate writes (command) from reads (query)</p> <p>Pros: - Allows creating stores per use case (e.g., analytics, geospatial) - Scale the read and write parts independently</p> <p>Cons: - Eventual consistency between the stores</p>"},{"location":"designdeck/#crdt","title":"CRDT","text":"<p>Conflict-free Replicated Data Types</p> <p>Data structure that is replicated across nodes: - Replicas are updated independently, concurrently and without coordination - An algo (part of the data type) can perform a deterministic conflict resolution - Replicas are guaranteed to eventually converge to the same state   =&gt; Strong eventual consistency</p> <p>Used in the context of collaborative applications</p> <p>Note: CRDTs can be combined to form new CRDTs</p>"},{"location":"designdeck/#crdt-and-collaborative-applications-eg-google-docs","title":"CRDT and collaborative applications (e.g., Google Docs)","text":"<p>Compared to OT, each character has a stable identifier (even if characters are added or deleted)</p> <p>Example: 0 is the beginning of the document, 1 is the end of the document, every character has a fractional number as an ID</p> <p>May lead to interleaving problems (e.g;, two inserted words by two users are interleaved: \"Alice\", \"Bob\" =&gt; \"BoAlibce\"</p> <p>Interleaving depends on the merging algorithm used (e.g., Treedoc doesn't lead to interleaving)</p>"},{"location":"designdeck/#db-indexes-tradeoff","title":"DB indexes tradeoff","text":"<p>Speed up read query but slow down writes</p>"},{"location":"designdeck/#db-internal-components","title":"DB internal components","text":"<ul> <li>Transport layer accepting requests</li> <li>Query processor determining the most efficient way to run queries</li> <li>Execution engine</li> <li>Storage engine</li> </ul>"},{"location":"designdeck/#db-read-vs-write-heavy-latency-vs-consistency-availability-vs-consistency-acid-vs-non-acid","title":"DB: read vs. write-heavy, latency vs. consistency, availability vs. consistency, ACID vs. non-ACID","text":""},{"location":"designdeck/#delta-crdts","title":"Delta CRDTs","text":"<p>Optimized state-based CRDTs where only recently applied changes to a state are replicated instead of the full state</p>"},{"location":"designdeck/#denormalization","title":"Denormalization","text":"<p>Introduce some amount of duplication in a normalized dataset in order to speed up reads (e.g., denormalized document, cache or index)</p> <p>Cons: - Requires more space - May slow down writes</p>"},{"location":"designdeck/#design-consideration-when-partitioning-data","title":"Design consideration when partitioning data","text":"<p>Should match the primary access pattern</p>"},{"location":"designdeck/#downside-of-distributed-transactions","title":"Downside of distributed transactions","text":"<p>Performance penalty</p> <p>Example: distributed transactions in MySQL are reported to be over 10 times slower than single-node transactions</p>"},{"location":"designdeck/#event-sourcing","title":"Event sourcing","text":"<p>Ensures that all changes to application state are stored as a sequence of events</p>"},{"location":"designdeck/#eventual-consistency-requirements","title":"Eventual consistency requirements","text":"<ul> <li>Eventual delivery: every update applied at a replica is eventually applied to all replicas</li> <li>Convergence: guarantees that replicas have applied the same updates eventually reach the same state</li> </ul>"},{"location":"designdeck/#examples-of-solutions-offering-leader-election-abstractions","title":"Examples of solutions offering leader election abstractions","text":"<ul> <li>etcd (linearizable)</li> <li>ZooKeeper (not linearizable for read operations)</li> </ul>"},{"location":"designdeck/#federation","title":"Federation","text":"<p>Splits up DB by function</p>"},{"location":"designdeck/#fencing-token","title":"Fencing token","text":"<p>Monotonically increasing token that increments whenever a client acquires a distributed lock</p> <p>Use case: when writing to a DB, if the provided token has a lower value than the current one, rejects the write</p> <p>Solve possible issues with lease as an update has to be made from the latest token</p>"},{"location":"designdeck/#gossip-protocol","title":"Gossip protocol","text":"<p>Peer-to-peer protocol based on the way epidemics spread</p> <p>No central registry and the only way to spread common data is to rely on each member to pass it along to their neighbors</p> <p>Useful when broadcasting to a large number of processes like thousands or more, where a deterministic protocol wouldn't scale</p>"},{"location":"designdeck/#graph-db-main-use-case","title":"Graph DB main use case","text":"<p>Relational can handle simple cases of many-to-many relationships</p> <p>Yet, if the connections become more complex, it's more natural to start modeling data as a graph</p>"},{"location":"designdeck/#hinted-handoff","title":"Hinted handoff","text":"<p>Optimization to favor latency over consistency when writing to a DB</p> <p>If a coordinator node cannot contact the necessary number of replicas, it stores locally the result of the operation and forward it to the failed node(s) after they recovered</p> <p>Used in sloppy quorums</p>"},{"location":"designdeck/#hot-spot-in-partitioning","title":"Hot spot in partitioning","text":"<p>Partition is heavily loaded compared to others</p> <p>Also called skew</p>"},{"location":"designdeck/#in-a-database-strategy-to-handle-rebalancing","title":"In a database, strategy to handle rebalancing","text":"<p>Not based on key hashing as a rebalancing would be huge</p> <p>Simple solution: Create many more partitions than nodes and assign several partitions to each node (e.g., a db running on a cluster of 10 nodes may be split into 10k partitions). When a node is added to the cluster, it will steal a few partitions from every existing node</p>"},{"location":"designdeck/#isolation-levels","title":"Isolation levels","text":"<p>Degree to which transactions are isolated from other concurrent execution transactions</p> <p>Isolations come at a performance cost (more coordination and synchronization)</p> <p></p> <ul> <li>Dirty writes: a transaction overwrites a value that has previously been written by another transaction that is still in-flight and has not been committed yet</li> </ul> <p>=&gt; Can violate integrity constraints</p> <ul> <li>Dirty reads: a transaction observes a write from a transaction that hasn't been committed yet</li> </ul> <p>=&gt; decisions can be taken based on data updates that can be rolled back</p> <ul> <li> <p>Fuzzy reads: a transaction reads a value twice but sees a different value in each read because a committed transaction updated the value between the two reads</p> </li> <li> <p>Lost updates: two transactions reads the same value and then try to update it to two different values, only one update survives</p> </li> </ul> <p>Example: Two transactions read the current inventory size (say 100 items), add respectively 5 and 10 items and then store back the size. Depending on the execution order, then final order can be 110 instead of 115.</p> <ul> <li> <p>Read skew: an integrity constraint seems to be violated because a transaction can only see partial results of another transaction</p> </li> <li> <p>Write skew: when two transactions read the same objects, and then updates some of those objects</p> </li> </ul> <p>Example: Two on-call doctors for a shift. Both feeling unwell, and decide to request leave. They both click the button at the same time. In the case of a write skew, the two transactions can succeed as for both, when reading the number of available doctors, it was more than one.</p> <ul> <li>Phantom reads: when a transaction does a predicate-based read and another transaction writes or removes a data matched by this predicate while the first transaction is still in flight</li> </ul> <p>Example: Transaction A computes the max and average age of employees. Transaction B is interleaved and inserts a lot of old employees. Thus, the average age could be larger than the max.</p>"},{"location":"designdeck/#known-crdts","title":"Known CRDTs","text":"<p>Counter: - Grow-only counter: increment only - Positive-negative counter: increment and decrement (combination of two grow only counter: one positive, one negative)</p> <p>Register (a memory cell storing whatever): - LWW-register: total order using timestamps - Multi-value register: keep track of causality, in case of conflicts it returns all conflicting cases (analogy: Git with an interactive merge resolution)</p> <p>Set: - Grow-only set: once an element is added it can't be removed - Two-phase set: elements can be added and removed (combination of two grow only set) - LWW-element set (last-write-wins): similar to two-phase set but we associate a timestamp for each element to resolve conflicts - Observed-remove set: use tags instead of timestamps; each element is associated to a list of add-tags and a list of remove-tags (example: vector clocks) - Sequence: used to build collaborative applications (e.g., Treedoc)</p>"},{"location":"designdeck/#last-write-wins-lww","title":"Last-write-wins (LWW)","text":"<p>Conflict resolution based on timestamp</p> <p>Used by DynamoDB or Cassandra to resolve conflicts</p> <p>Shouldn't happen in single-master replication</p>"},{"location":"designdeck/#leader-election","title":"Leader election","text":"<p>Algorithm to guarantee at most one leader at any given time (safety) and that an election eventually completes (liveness)</p>"},{"location":"designdeck/#lsm-tree","title":"LSM tree","text":"<p>Log-Structured Merge tree</p> <p>Consists of smaller mutable memory-resident (memtable) and larger immutable disk-resident (SSTable) components</p> <p>Memtables data are sorted and flushed on disk when their size reaches a configurable threshold or periodically</p> <p>Because of a memtable is just a special case of buffer, durability is not guaranteed (durability must be brought by replication)</p> <p>Examples: Lucene, Cassandra, Bitcask, etc.</p>"},{"location":"designdeck/#lsm-tree-vs-b-tree","title":"LSM tree vs. B-tree","text":"<p>LSM-tree faster for writes, slower for reads because it has to check multiple data structures (bigger read amplification): memtable and SSTable</p> <p>Compaction can impact ongoing requests</p> <p>B-tree faster for reads, slower for writes as it must write every piece of data at least twice in the WAL &amp; tree itself (bigger write amplification)</p> <p>Each key exists in exactly one place =&gt; easier to offer strong transactional semantics</p>"},{"location":"designdeck/#main-difference-between-consistency-models-and-isolation-levels","title":"Main difference between consistency models and isolation levels","text":"<p>Consistency models: applies to single-object operations</p> <p>Isolation levels: applies to multi-object operations</p>"},{"location":"designdeck/#merkle-tree","title":"Merkle tree","text":"<p>A tree in which every leaf is labelled with the hash of a data block: - Level n contains the data blocks - Level n-1 the hash of one data block - Level n-2 the hash of 2 data blocks - Level 1 the hash of all the data blocks</p> <p></p> <p>Efficient and secure verification of the contents of a large data structure</p> <p>Allows reducing data transfered between a client and a server. For example, if we want to compare a merkle tree stored on a server with one store on the client, they can both exchange their top hash. If different, we can delve in and only get the data blocks which have changed.</p>"},{"location":"designdeck/#monotonic-reads-consistency-implementation","title":"Monotonic reads consistency implementation","text":"<p>One way to achieve it is to make sure each user always makes their reads from the same replica</p>"},{"location":"designdeck/#mvcc","title":"MVCC","text":"<p>Multiversion Concurrency Control</p> <p>A possible implementation of optimistic concurrency control and snapshot isolation level</p> <p>MVCC allows reads and writes to proceed with minimal coordination on the storage level since reads can continue accessing older values until the new ones are committed</p>"},{"location":"designdeck/#n1-select-problem","title":"N+1 select problem","text":"<p>Assuming a one-to-many relationship between 2 tables A and B =&gt; A 1-* B</p> <p>If we want to iterate through all the A and for each one, print the list of B, the naive implementation would be: - <code>select * from A</code> - And then for each A, <code>select * from B where A_ID = ?</code></p> <p>Alternatively, we could reduce the number of rount-trips to the DB from N+1 to 2 with a simple <code>select * from B</code></p> <p>Most ORM tools prevent N+1 selects</p>"},{"location":"designdeck/#nosql-main-types-and-main-architecture-principles","title":"NoSQL: main types and main architecture principles","text":"<p>Key-value store, document store, column-oriented store or graph DB</p> <ul> <li>Mainly partition-based</li> <li>Leverage eventual consistency</li> </ul>"},{"location":"designdeck/#operation-based-crdts","title":"Operation-based CRDTs","text":"<p>Commutative replicated data types</p> <p>Replication is made in propagating the update operation</p> <p>Operations characteristics: - Must be commutative. - Not necessarily idempotent. If idempotent, OK. If not, it's up to the delivery layer to ensure the operations are delivered without duplication. - Delivered in causal order.</p>"},{"location":"designdeck/#operational-transformation-ot-concept-and-main-drawback","title":"Operational transformation (OT): concept and main drawback","text":"<p>A way to handle collaborative applications</p> <p>Receive update operations and depending on the operations that occur concurrently, transform them</p> <p>Example: - Initial state: \"helo\" - Concurrently: user 1 inserts \"l\" at position 3 and user 2 inserts \"!\" at position 4 - If transaction for user 1 completes before the one of user 2, we end up with \"hell!o\" instead of \"hello!\" - OT will transorm the transaction from user 2 into: insert \"!\" at position 5</p> <p>Drawback: all the communications go through a central server (e.g., impossible with systems at scale such as Google Docs)</p> <p>Replaced with CRDT</p>"},{"location":"designdeck/#optimistic-concurrency-control-pros-and-cons","title":"Optimistic concurrency control: pros and cons","text":"<p>Perform badly if high contention as it leads to a high proportion of retry, thus making performance worse</p> <p>If not much contention, it tends to perform better than pessimistic</p>"},{"location":"designdeck/#pacelc-theorem","title":"PACELC theorem","text":"<p>If case of a network partition (P): we should choose between availability (A) or consistency (C)</p> <p>Else, in the absence of partition (E): we should choose between latency (L) or consistency (C)</p> <p>Most systems are either: - AP/EL - CP/EC</p>"},{"location":"designdeck/#partitioning-sharding","title":"Partitioning (sharding)","text":"<p>Split up a large dataset that is too big for a single machine into smaller parts and spread them across several machines</p> <p>Define the partition type based on the primary access pattern</p>"},{"location":"designdeck/#partitioning-criteria","title":"Partitioning criteria","text":"<p>Range partitioning: keys are sorted and a partition owns all the keys from some minimum up to some maximum (example: MySQL RANGE COLUMNS partitioning) - Pros: efficient range queries - Cons: Risk of hot spots, requires repartitioning to potentially split a range into two subranges if a partition gets too big</p> <p>Hash partitioning: hash function is applied to each key and a partition owns a range of hashes</p>"},{"location":"designdeck/#partitioning-methods","title":"Partitioning methods","text":"<p>Horizontal partitioning: partition by rows</p> <p>Vertical partitioning: partition by columns (create tables with fewer columns)</p> <p>Rationale: if the subtables have different access patterns (e.g., a column is a blob that we rarely consume, we can create a vertical partitioning to store this blob not on the primary disk)</p> <p>Also called normalization</p>"},{"location":"designdeck/#quorum","title":"Quorum","text":"<p>Minimum number of nodes that need to vote on an operation before it can be considered successful</p> <p>Usually: majority</p>"},{"location":"designdeck/#raft","title":"Raft","text":"<p>Leader election and replication algorithms</p>"},{"location":"designdeck/#leader-election_1","title":"Leader election","text":"<p>Using a state machine to elect a leader</p> <p>Each process is in one of these three states: leader, candidate (part of the election process), follower</p>"},{"location":"designdeck/#replication","title":"Replication","text":"<p>The leader stores the sequence of operations altering the state into a local ordered log</p> <p>Then, this log is replicated across followers Each entry is considered as committed when it has been replicated on a majority of nodes</p> <p>Replication enables consensus</p>"},{"location":"designdeck/#read-repair","title":"Read repair","text":"<p>Optimization to favor latency over consistency when writing to a DB (e.g., leaderless replication)</p> <p>If a coordinator node receives conflicting values from the contacted replicas (which shouldn't happen in case of single-master replication for example), it resolves the conflict by: - Resolving the conflict (e.g., LWW) - Forwarding it to the stale replica - Responding to the read request</p>"},{"location":"designdeck/#relation-between-replication-factor-write-consistency-and-read-consistency","title":"Relation between replication factor, write consistency and read consistency","text":"<p>Given: - N: number of replicas - W: number of nodes that have to ack a write for it to succeed - R: number of nodes that have to respond to a read operation for it to succeed</p> <p>If R+W &gt; N, the system can guarantee to return the most recent written value because there's always an overlap between read and write sets (consistency)</p> <p>Notes: - In case of read-heavy systems, we want to minimize R - If W = 1 and R = N, durability isn't guaranteed in the presence of failure - If W &lt; (N+1)/2, it may leads to write conflicts (e.g., W &lt; 2 if 3 nodes) - If R+W &lt;= N, weak/eventual consistency</p>"},{"location":"designdeck/#replication-vs-partition-impacts","title":"Replication vs. partition: impacts","text":"<p>Replication: - Read-heavy - Availability &gt; consistency</p> <p>Partition: - Write-heavy (splitting up data across different shards)</p>"},{"location":"designdeck/#schema-on-read-vs-schema-on-write","title":"Schema-on-read vs. schema-on-write","text":"<p>Schema-on-read: implicit schema but not enforced by the DB (also called schemaless but misleading)</p> <p>Schema-on-write: explicit schema, the DB ensures all writes are conforming to it (e.g., relational DB)</p>"},{"location":"designdeck/#serializability","title":"Serializability","text":"<p>I in ACID (strong isolation level)</p> <p>Equivalent to serial execution (no interleaving due to concurrent transactions)</p>"},{"location":"designdeck/#serializable-snapshot-isolation-ssi","title":"Serializable Snapshot Isolation (SSI)","text":"<p>Snapshot Isolation (SI) allows write skew</p> <p>SSI is a stricter isolation level than SI preventing write skew: check at runtime for conflicts between transactions</p> <p>Downside: increase the number of aborted transactions</p>"},{"location":"designdeck/#single-leader-multi-leader-leaderless-replication","title":"Single-leader, multi-leader, leaderless replication","text":""},{"location":"designdeck/#single-leader","title":"Single-leader","text":"<p>All writes go through one leader</p> <p>Pro: ensure consistency</p> <p>Con: all writes go through a single node (bottleneck)</p>"},{"location":"designdeck/#multi-leader","title":"Multi-leader","text":"<p>Rarely makes sense within a single datacenter (benefits rarely outweigh the added complexity) but used in multi-datacenter contexts</p> <p>DB must resolve the conflicts in a convergent way</p> <p>Use cases: - One leader per datacenter</p> <p></p> <ul> <li>Clients with offline operation</li> <li>Collaborative editing</li> </ul> <p>Different topologies:</p> <p></p> <p>Most used: all-to-all</p> <p>Pro: not limited to the write throughput of a single node</p> <p>Con: possible write conflicts</p>"},{"location":"designdeck/#leaderless-replication","title":"Leaderless replication","text":"<p>Client sends its writes to several replicas in parallel</p> <p>Read requests are also sent in parallel to multiple replicas (this way, if a write hasn't been replicated yet to one replica, it won't lead to stale data)</p> <p>Rely on read repair and anti-entropy mechanisms</p> <p>Rely on quorum to know how long to wait for a request (not perfect: if a write fails because we didn't reach a quorum, what shall we do about the replicas where the write has already been committed)</p> <p>Examples: Cassandra, DynamoDB, Riak</p> <p>Pro: throughput</p> <p>Con: quorums are not perfect, provide illusion of strong consistency when  in reality, it's often not true</p>"},{"location":"designdeck/#sloppy-quorum","title":"Sloppy quorum","text":"<p>In case of a quorum of w nodes to accept a write: if we can't reach w, the DB accepts the write replicate it to nodes that aren't among the ones on which the value usually lives</p> <p>Relies on hinted handoff</p>"},{"location":"designdeck/#snapshot-isolation-si","title":"Snapshot Isolation (SI)","text":"<p>Guarantee that all reads made in a transaction will see a consistent snapshot of the database</p> <p>In practice, it reads the last committed values that existed at the time it started</p> <p>Allows write skew</p>"},{"location":"designdeck/#snapshot-isolation-common-implementation","title":"Snapshot Isolation common implementation","text":"<p>MVCC</p>"},{"location":"designdeck/#sstable","title":"SSTable","text":"<p>Sorted String Table, immutable components of a LSM tree</p> <p>Sorted immutable data structure</p> <p>It consists of 2 components: index files and data files</p> <p>The index (based on a hashtable or a B-tree) holds the keys and the data entries (offsets in the data file where the actual records are located)</p> <p>Data files hold records in key order</p>"},{"location":"designdeck/#state-based-crdts-definition-and-requirements","title":"State-based CRDTs: definition and requirements","text":"<p>Convergent replicated data types</p> <p>Replication is made in propagating the full local state to replicas</p> <p>States are merged with a function which must be: - Commutative - Idempotent - Associative   =&gt; Update monotonically increase the internal state according to some partial order rules defined (e.g., max of two values, union of two sets)</p> <p>=&gt; Delivery layer doesn't have to guarantee causal ordering nor idempotency, only eventual delivery</p>"},{"location":"designdeck/#strong-eventual-consistency-definition-and-requirements","title":"Strong eventual consistency: definition and requirements","text":"<p>Stronger guarantee than eventual consistency</p> <p>Based on the fact that we can define a deterministic outcome for any conflict</p> <p>Requires: - Eventual delivery: every update applied to a replica is eventually applied to all replicas - Strong convergence: guarantees that replicas that have executed the same updates have the same state (with eventual consistency, the guarantee is that the replicas eventually reach the same state, once consensus is reached)</p> <p>Strong convergence requires convergent replicated data types (part of CRDT family)</p> <p>Main difference with eventual consistency: - Leaderless replication - No consensus needed, instead, it relies on a deterministic outcome for any conflict</p> <p>A solution to the CAP theorem</p>"},{"location":"designdeck/#three-phase-commit-3pc","title":"Three-phase commit (3PC)","text":"<p>Failure-resilient refinement of 2PC</p> <p>Unlike 2PC, satisfies liveness but not safety</p>"},{"location":"designdeck/#transaction","title":"Transaction","text":"<p>A unit of work performed in a database system, representing a change, which can be potentially composed of multiple operations</p>"},{"location":"designdeck/#two-main-approaches-to-partition-a-table-that-has-secondary-indexes","title":"Two main approaches to partition a table that has secondary indexes","text":"<p>Partitioning secondary indexes by document: - Each partition maintains its own secondary index - Write: one partition - Query on the index: requires querying multiple partitions (scatter/gather)</p> <p>Optimized from writes</p> <p>Example: Elasticsearch, MongoDB, Cassandra, Riak, etc.</p> <p>Partitioning secondary indexes by term: - Global index covering all the partitions (to be replicated) - Write: multiple partitions are updated (for resiliency) - Query on the index: served from one partition containing the index</p> <p>Optimized from reads</p>"},{"location":"designdeck/#two-types-of-crdts","title":"Two types of CRDTs","text":"<p>Operation-based and state-based</p> <p>Operation-based require less bandwidth</p> <p>State based require less assumptions about the delivery layer</p>"},{"location":"designdeck/#two-phase-commit-2pc","title":"Two-phase commit (2PC)","text":"<p>Protocol used to implement atomic transaction commits across multiple processes</p> <p>Satisfies safety but not liveness</p>"},{"location":"designdeck/#wal","title":"WAL","text":"<p>Write-ahead log (or redo log)</p> <p>Append-only file to which every modification must be written</p> <p>Used for restoration in the event of a DB crash: - Durability - Atomicity (allows to identify the operations on progress and complete or undo them)</p>"},{"location":"designdeck/#when-relational-vs-when-document","title":"When relational vs. when document","text":"<p>Relational (schema-on-write): - Better support for joins - Many-to-one and many-to-many relationships - ACID</p> <p>Document (schema-on-read): - Schema flexibility - Better performance due to locality - Closer to the data structures used by the application - In general not ACID - In general write-heavy</p>"},{"location":"designdeck/#when-to-use-a-column-oriented-store","title":"When to use a column-oriented store","text":"<p>Because columns are stored contiguously: analytical workloads (computing average values, finding trends, etc.)</p> <p>Flexible schema</p> <p>Limited space (storing same data type together offers a better compression ratio)</p>"},{"location":"designdeck/#why-db-schemaless-is-misleading","title":"Why DB schemaless is misleading","text":"<p>There is an implicit schema but not enforced by the DB</p> <p>More accurate term: schema-on-read</p> <p>Different from relational DB with shema-on-write where the schema is explicit and the DB ensures all written data conforms to it</p> <p>Similar to dynamic vs. static type checking in a programming language</p>"},{"location":"designdeck/#why-is-in-memory-faster","title":"Why is in-memory faster","text":"<p>Not necessarily because they don't need to read from disk (even a disk-based storage engine may never need to read from disk if enough memory)</p> <p>Can be faster because they avoid the overhead of encoding in a form that can be written to disk</p>"},{"location":"designdeck/#write-and-read-amplification","title":"Write and read amplification","text":"<p>Ratio of the amount of data written/read to the disk versus the amount of data intended to be written</p>"},{"location":"designdeck/#write-heavy-and-replication-type","title":"Write heavy and replication type","text":"<p>Do not rely on single-master replication as it heavily impacts the scaling of write-heavy systems</p> <p>Instead, rely on leaderless replication</p> <p>Trade off: consistency is harder to guarantee </p>"},{"location":"designdeck/#design","title":"Design","text":""},{"location":"designdeck/#auditing","title":"Auditing","text":"<p>Checking the integrity of data</p>"},{"location":"designdeck/#backward-vs-forward-compatibility","title":"Backward vs. forward compatibility","text":""},{"location":"designdeck/#bloom-filter","title":"Bloom filter","text":"<p>Probabilistic, memory-efficient data structure for approximating the content of a set</p> <p>Can tell if a key does not appear in the DB</p>"},{"location":"designdeck/#causality","title":"Causality","text":"<p>Causal dependency: one event causing another</p> <p>Happened-before relationship</p>"},{"location":"designdeck/#concurrent-operations","title":"Concurrent operations","text":"<p>Not only operations that happen at the same time but also operations made without knowing about each other</p> <p>Example: - Concurrent to-do list operations with a current \"Buy milk\" item - User 1 deletes it - User 2 doesn't have an internet connection, modifies it into \"Buy soy milk\", and then is connected again =&gt; this modification may have been done one hour after user 1 deletion</p>"},{"location":"designdeck/#consistent-hashing","title":"Consistent hashing","text":"<p>Special kind of hashing such that when a resize occurs, only 1/n percent of the keys need to be rebalanced (n: number of nodes)</p> <p>Solutions: - Ring consistent hash with virtual nodes to improve the distribution - Jump consistent hash: faster but nodes must be numbered sequentially (e.g., if we have 3 servers foo, bar, and baz =&gt; we can't decide to remove bar)</p>"},{"location":"designdeck/#design-impacts-of-sharing","title":"Design impacts of sharing","text":"<p>May decrease: - Availability - Performance - Scalability</p>"},{"location":"designdeck/#design-read-heavy-vs-write-heavy-impacts","title":"Design: read-heavy vs. write-heavy impacts","text":"<p>Read heavy: - Leverage replication - Leverage denormalization</p> <p>Write heavy: - Leverage partition (usually) - Leverage normalization</p>"},{"location":"designdeck/#different-types-of-message-failure","title":"Different types of message failure","text":"<ul> <li>Delayed</li> <li>Dropped</li> <li>Duplicated</li> <li>Out-of-order</li> </ul>"},{"location":"designdeck/#event-log-vs-message-queue","title":"Event log vs. message queue","text":"<p>Event log: - Consumers are free to select the point of the log they want to consume messages from, which is not necessarily the head - Log is immutable, messages cannot be removed by consumers (removed by a GC running periodically)</p>"},{"location":"designdeck/#exactly-once-delivery","title":"Exactly-once delivery","text":"<p>Impossible to achieve</p> <p>However, we can achieve exactly-once processing using a dedup or by requiring the consumers to be idempotent</p>"},{"location":"designdeck/#flp-impossibility","title":"FLP impossibility","text":"<p>In an asynchronous distributed system, there's no consensus algorithm that can satisfy: - Agreement - Validity - Termination - And fault tolerance</p>"},{"location":"designdeck/#geohashing","title":"Geohashing","text":"<p>Encode geographic coordinates into a short string called a cell with varying resolutions</p> <p>The more letters in the string, the more precise the location</p> <p></p> <p>Main use case: - Proximity searches in O(1)</p>"},{"location":"designdeck/#hashing-definition-and-size-of-md5-and-sha256","title":"Hashing definition and size of MD5 and SHA256","text":"<p>Map data of arbitrary size to fixed-size values</p> <p>Examples: - MD5: 16 bytes - SHA256: 32 bytes</p>"},{"location":"designdeck/#hdfs","title":"HDFS","text":"<p>Distributed filesystem: - Fault tolerant - Scalable - Optimised for batch operations</p> <p>Architecture: - Single master (maintain filesystem metadata, inform clients about which server store a specific part of a file) - Multiple data nodes</p> <p>Leverage: - Partitioning: each file is partitioned into multiple chunks =&gt; performance - Replication =&gt; availability</p> <p>Read: communicates with the master node to identify the servers containing the relevant chunks</p> <p>Write: chain replication</p>"},{"location":"designdeck/#how-to-reduce-sharing","title":"How to reduce sharing","text":"<ul> <li>Decompose stateful and stateless parts of a system =&gt; makes scaling easier</li> <li>Partitioning =&gt; fault isolation</li> </ul>"},{"location":"designdeck/#hyperloglog","title":"HyperLogLog","text":"<p>Used to approximate cardinality of a set</p> <p>Optimization for space over perfect accuracy</p>"},{"location":"designdeck/#backing-idea","title":"Backing idea","text":"<p>Coin flip game: you flip a coin, if head, flip again, if tail stop</p> <p>If a player reaches n flips, it means that on average, he tried 2n+1 times</p>"},{"location":"designdeck/#algo","title":"Algo","text":"<p>For an ID, we will count how many consecutive 0 (head) bits on the left</p> <p>Example: 001110 =&gt; 2</p> <p>Hence, on average we should have seen 22+1 visitors</p> <p>Requirement: we need visitors ID to be uniform =&gt; either if the ID is randomly generated or by hashing them (if ID is auto incremented for example)</p> <p>Required memory: log(log(m)) with m the number of  unique visitors</p> <p>Problem with this algo: it depends on luck. For example, if user 00000001 connects every day =&gt; the system will always approximate 28 visitors</p>"},{"location":"designdeck/#bucketing","title":"Bucketing","text":"<p>Distribute to multiple counters and aggregate the results (possible because each counter is very small)</p> <p>If we want 4 counters, we distribute the ID based on the first 2 bits</p> <p>Result: 2(n1 + n2 + n3 + n4) / 4</p> <p>Problem: mean is highly impacted with large outliers</p> <p>Solution: use harmonic mean</p>"},{"location":"designdeck/#idempotent","title":"Idempotent","text":"<p>If executed more than once it has the same effect as if it was executed once</p>"},{"location":"designdeck/#latency-numbers-every-programmer-should-know","title":"Latency numbers every programmer should know","text":"<ul> <li>Read 1MB sequentially from memory: 250 \u00b5s</li> <li>Round trip within the same datacenter: 0.5 ms</li> <li>Read 1MB sequentially from SSD: 1 ms</li> <li>Disk seek: 10 ms</li> <li>Read 1MB sequentially from disk: 20 ms</li> <li>Send round trip packet over continents: ~100 ms</li> </ul>"},{"location":"designdeck/#lease","title":"Lease","text":"<p>Lock with an expiry timeout after which the lock is automatically released</p> <p>May lead to situations where two nodes believe they hold the lock (for example, when the expiry signal hasn't been caught yet by the first node because of a GC or CPU throttling)</p> <p>Can be solved using a fencing token</p>"},{"location":"designdeck/#least-loaded-endpoint-load-balancing-strategy","title":"Least loaded endpoint load balancing strategy","text":"<p>Not efficient</p> <p>A more efficient option is to randomly pick two servers and route the request to the least-loaded one of the two</p>"},{"location":"designdeck/#liveness-property","title":"Liveness property","text":"<p>Something good will eventually occur</p> <p>Example: leader is elected, eventual consistency</p>"},{"location":"designdeck/#load-balancing","title":"Load balancing","text":"<p>Route requests across a pool of servers</p>"},{"location":"designdeck/#load-shedding","title":"Load shedding","text":"<p>Action to reduce the load on something</p> <p>Example: when the CPU utilization reaches a threshold, the server can start returning errors</p> <p>A more special form of load shedding is selective client throttling, where an application assigns different quotas to each of its clients</p>"},{"location":"designdeck/#locality","title":"Locality","text":"<p>Performance optimization to put several pieces of data in the same place</p>"},{"location":"designdeck/#log","title":"Log","text":"<p>Append-only, totally ordered sequence of messages</p> <p>Each message is: - Appended at the end of the log - Is assigned a unique sequential index</p> <p>Example: Kafka</p>"},{"location":"designdeck/#log-compaction","title":"Log compaction","text":"<p>Throw away duplicate keys in the log and keep only the most recent update for each key</p>"},{"location":"designdeck/#main-drawback-of-shared-nothing-architectures","title":"Main drawback of shared-nothing architectures","text":"<p>Reduce flexibility</p> <p>If the application needs to access to new data access patterns in an efficient way, it might be hard to provide it given the system's data have been partitioned in a specific way</p> <p>Example: attempting to query by a secondary attribute that is not the partitioning key might require to access all the nodes of the system</p>"},{"location":"designdeck/#mapreduce","title":"MapReduce","text":"<p>Programming model for processing large amounts of data in bulk across many machines: - Map: processes a set of key/value pairs and produces as output another set of intermediate key/value pairs. - Reduce: receives all the values for each key and returns a single value, essentially merging all the values according to some logic</p>"},{"location":"designdeck/#microservices-pros-and-cons","title":"Microservices: pros and cons","text":"<p>Pros: - Organizational (each team dictates its own release schedule, etc.) - Codebase is easier to digest - Strong boundaries - Independent scaling - Independent data model</p> <p>Cons: - Eventual consistency - Remote calls - Harder to operate (more complex)</p>"},{"location":"designdeck/#number-of-values-to-generate-to-reach-50-chances-of-collision-32-bit-64-bit-and-128-bit-hash","title":"Number of values to generate to reach 50% chances of collision: 32-bit, 64-bit, and 128-bit hash","text":"<ul> <li>32: 80 k</li> <li>64: 5 billion</li> <li>128: 3e+18 (1 billion hashes generated every second for 100 years)</li> </ul>"},{"location":"designdeck/#orchestration-vs-choreography","title":"Orchestration vs. choreography","text":"<p>Orchestration: single central system responsible for coordinating the execution</p> <p>Choreography: no need for a central coordinator, each system is aware of the previous and the next</p>"},{"location":"designdeck/#outbox-pattern","title":"Outbox pattern","text":"<p>Used to update a DB and publish an event in a transactional fashion</p> <p>Within a transaction, persist in the DB (insert, update or delete) and insert at the same time a new row in an event table</p> <p>Implements a worker that checks the event table, publishes an event and deletes the row (at least once guarantee)</p>"},{"location":"designdeck/#perfect-hashing","title":"Perfect hashing","text":"<p>No collision, only possible if we know the keys up front</p> <p>Given k elements, the hashing function returns an int between 0 and k</p>"},{"location":"designdeck/#quadtree","title":"Quadtree","text":"<p>Tree data structure where each internal node has exactly four children: NE, NW, SE, SW</p> <p>Main use case: - Improve geospatial caching (e.g., 1km in an urban area isn't the same as 1km outside cities)</p> <p>Source: https://engblog.yext.com/post/geolocation-caching</p>"},{"location":"designdeck/#rate-limiting-throttling-definition-and-algos","title":"Rate-limiting (throttling): definition and algos","text":"<p>Mechanism that rejects a request when a specific quota is exceeded</p>"},{"location":"designdeck/#token-bucket-algo","title":"Token bucket algo","text":"<p>Token of a pre-defined capacity, put back in the bucket periodically:</p> <p></p>"},{"location":"designdeck/#leaking-bucket-algo","title":"Leaking bucket algo","text":"<p>Uses a FIFO queue When a request arrives, checks if the queue is full: - If yes: request is dropped - If not: added to the queue   =&gt; Requests pulled from the queue at regular intervals</p> <p></p>"},{"location":"designdeck/#rebalancing","title":"Rebalancing","text":"<p>Move data or services from one node to another in order to spread the load fairly</p>"},{"location":"designdeck/#rest","title":"REST","text":"<p>Architectural style where the server exposes a set of resources</p> <p>All communications must be stateless and cacheable</p> <p>Relies mainly on HTTP but not mandatory</p>"},{"location":"designdeck/#rest-vs-grpc","title":"REST vs. gRPC","text":"<p>REST (architectural style): - Universality - Standardization (status code, ETag, If-Match, etc.)</p> <p>gRPC (RPC framework): - Contract - Binary protocol (faster, less bandwidth) // We could use HTTP/2 without gRPC and leverage binary protocols but it would require more efforts - Bidirectional</p>"},{"location":"designdeck/#safety-property","title":"Safety property","text":"<p>Something bad will never happen</p> <p>Example: leader election eventually completes</p>"},{"location":"designdeck/#saga","title":"Saga","text":"<p>Distributed transaction composed of a set of local transactions</p> <p>Each transactions has a corresponding compensation action to undo its changes</p> <p>Usually, a Saga is implemented with an orchestrator that manages the execution of the transactions and handles the compensations if needed</p>"},{"location":"designdeck/#scalability","title":"Scalability","text":"<p>System's ability to cope with increased load</p>"},{"location":"designdeck/#scalability-ceiling","title":"Scalability ceiling","text":"<p>Hard limit (e.g., device maximum throughput)</p>"},{"location":"designdeck/#shared-nothing-architectures","title":"Shared-nothing architectures","text":"<p>Reduce coordination and contention so that every request can be processed independently by a single node or group of nodes</p> <p></p> <p>Increase availability, performance, and scalability</p>"},{"location":"designdeck/#source-of-truth","title":"Source of truth","text":"<p>Holds the authoritative version of the data</p>"},{"location":"designdeck/#split-brain","title":"Split-brain","text":"<p>Network partition =&gt; nodes unable to communicate with each other =&gt; multiple nodes believing they are the leader</p> <p>As a node is unaware that another node is still functioning, it can lead to data corruption or data loss</p>"},{"location":"designdeck/#throughput","title":"Throughput","text":"<p>The rate of work performed</p>"},{"location":"designdeck/#total-vs-partial-order","title":"Total vs. partial order","text":"<p>Total order: a binary relation that can be used to compare any 2 elements of a set with each other</p> <p>Partial order: a binary relation that can be used to compare only some of the elements of a set with each other</p> <p>Total ordering in distributed systems is rarely mandatory</p>"},{"location":"designdeck/#uuid","title":"UUID","text":"<p>128-bit number</p> <p>Collision probability: after generating 1 billion UUID every second for ~100 years, the probability of creating a single duplicate reaches 50%</p>"},{"location":"designdeck/#validation-vs-verification","title":"Validation vs. verification","text":"<p>Validation: process of analyzing the parts of the system and building mental models that reflects the interaction of those parts</p> <p>Example: validate the quality of water by inspecting all the pipes and infrastructure to capture, clean and deliver water</p> <p>Verification: process of analyzing output at a system boundary</p> <p>Example: validate the quality of water by testing the water (output) coming from a sink</p>"},{"location":"designdeck/#vector-clock","title":"Vector clock","text":"<p>Algorithm that generates partial ordering of events and detects causality violation</p>"},{"location":"designdeck/#why-asynchronous-communication","title":"Why asynchronous communication","text":"<p>Reduce temporal coupling (not connected at the same time) =&gt; processes execute at independent rates, without blocking the sender</p> <p>If the interaction pattern isn't request/response with client blocking until it receives the response</p>"},{"location":"designdeck/#http","title":"HTTP","text":""},{"location":"designdeck/#301-vs-302","title":"301 vs. 302","text":"<p>301: redirect permanently</p> <p>302: redirect temporarily</p>"},{"location":"designdeck/#403-or-404","title":"403 or 404?","text":"<p>Retuning 403 can leak existence of a resource</p> <p>Example: Apple is secretly working on super cars and creates an internal GET <code>https://apple.com/supercar</code> endpoint</p> <p>Returning 403 means the user doesn't have the rights to access the resource, but leaks the existence of <code>/supercar</code></p>"},{"location":"designdeck/#cookie","title":"Cookie","text":"<p>Small files stored on a user's computer to hold specific data (e.g., language preference)</p> <p>Requests made by the browser will contain cookies data</p> <p>Types of cookies: - Session cookies: only lasts for the duration of a session - Persistent cookies: outlast user session - Third-party cookies: used for advertising</p>"},{"location":"designdeck/#four-main-http2-features","title":"Four main HTTP/2 features","text":"<ul> <li>Request multiplexing: multiple requests over a single TCP connection   =&gt; Prioritization can now be part of the request</li> <li>Server push</li> <li>Binary protocol (lower overhead in decoding data, smaller network footprint)</li> <li>Header compression</li> </ul>"},{"location":"designdeck/#hls","title":"HLS","text":"<p>HTTP live streaming: video streaming protocol</p>"},{"location":"designdeck/#http_1","title":"HTTP","text":"<p>Request/response protocol used to encode and transport information between a client and a server Stateless (each request is executed independently)</p> <p>The request and the response are 2 standard message types exchanged in a single HTTP transaction - Request: method, URL, HTTP version, headers, body - Response: HTTP version, status, reason, headers, body</p> <p>Example of a POST request:</p> <p>```http request POST https://example.com HTTP/1.0 Host: example.com User-Agent: Mozilla/4.0 Content-Length: 5</p> <p>Hello ```</p> <p>Application layer protocol (OSI level 7)</p> <p>Relies on a transport protocol (OSI level 4, TCP most of the time but not mandatory) for error detection, flow control, reliability, etc.</p>"},{"location":"designdeck/#http-cache-control-header","title":"HTTP cache-control header","text":"<p>Allows setting how long to cache a response</p> <p>Part of the response header (hence, cached by the browser) but can be part of the request header too (hence, cached on server side)</p> <p>If request header marked as private, the results are intended for a single user (then won't be cached by a load balancer for example)</p>"},{"location":"designdeck/#http-etag","title":"HTTP Etag","text":"<p>Entity tag header that allows clients to make conditional requests</p> <p>Server returns an ETag being the date and time of the last update of a resource</p> <p>Client sends a <code>If-Match</code> header to update a resource only if clients have the most recent version</p>"},{"location":"designdeck/#http-keep-alive","title":"HTTP keep-alive","text":"<p>Maintain a persistent TCP connection (reduces the number of TCP and HTTPS handshakes)</p>"},{"location":"designdeck/#http-methods-safeness-and-idempotence","title":"HTTP methods: safeness and idempotence","text":"<ul> <li>GET: safe, idempotent</li> <li>PUT: not safe, idempotent</li> <li>POST: not safe, not idempotent</li> <li>DELETE: not safe, idempotent</li> </ul>"},{"location":"designdeck/#http-safe-method","title":"HTTP safe method","text":"<p>Doesn't have any visible side effects and can be cached</p>"},{"location":"designdeck/#http-status-code-429","title":"HTTP status code 429","text":"<p>When clients are throttled, the most common way is to return a 429 (Too Many Requests)</p> <p>The response can also include a Retry-After header indicating how long to wait before making a new request (in seconds)</p>"},{"location":"designdeck/#http-status-codes","title":"HTTP status codes","text":"<ul> <li>2xx: success</li> <li>3xx: redirection</li> <li>4xx: client error</li> <li>5xx: server error</li> </ul>"},{"location":"designdeck/#what-happens-if-you-type-googlecom-in-your-browser","title":"What happens if you type google.com in your browser","text":"<ul> <li>URL parsing</li> <li>HSTS lookup (HTTP Strict Transport Security: list of websites that have requested to be contacted via HTTPS only)</li> <li>DNS lookup:<ul> <li>Is DNS record cached in browser?</li> <li>If not present, check if the hostname can be resolved by reference in the local hosts file</li> <li>If not present, DNS lookup (typically to the ISP DNS) // Uses ARP to get the MAC address of the DNS IP address</li> </ul> </li> <li>Opens a TCP socket</li> <li>TCP handshake</li> <li>HTTPS handshake</li> <li>HTTP request (GET, port 80)</li> <li>Receive HTML, javascript (to be executed on client side), and images. Data can be cached by the browser using HTTP Etag.</li> </ul> <p>Source: https://github.com/alex/what-happens-when</p>"},{"location":"designdeck/#kafka","title":"Kafka","text":""},{"location":"designdeck/#consumer-types","title":"Consumer types","text":"<p>Without consumer group: each consumer will receive all the messages in a topic</p> <p>With consumer group: each consumer will receive a subset of the messages</p> <p>Each consumer is assigned to multiple partitions (zero to many)</p> <p>A partition is always assigned to only one consumer</p> <p>If there are more consumers than partitions, some consumers will not be assigned to any partition (scalability ceiling)</p>"},{"location":"designdeck/#durabilityavailability-and-latencythroughput-tradeoffs","title":"Durability/availability and latency/throughput tradeoffs","text":"<p>Source: https://developers.redhat.com/articles/2022/05/03/fine-tune-kafka-performance-kafka-optimization-theorem#kafka_priorities_and_the_cap_theorem</p>"},{"location":"designdeck/#log-compaction_1","title":"Log compaction","text":"<p>Log compaction is a mechanism to give per-record retention to a topic</p> <p>It ensures that Kafka will always retain at least the last message for each key of a given partition</p> <p>A partition that is not yet compacted may have more than one message with the same key</p> <p>Property: - <code>retention.ms</code>: maximum time the topic will retain old log segments before deleting or compacting them (default: 7 days)</p> <p>For low-throughput topic (topics with segments that should be rolled out because of <code>segment.ms</code> rather than <code>segment.bytes</code>), we should ensure that segment.ms is lower than <code>retention.ms</code></p>"},{"location":"designdeck/#offset","title":"Offset","text":"<p>A strictly increasing identifier per partition</p>"},{"location":"designdeck/#partition","title":"Partition","text":"<p>Topics are divided into partitions</p> <p>A partition is an ordered, immutable log of messages</p> <p>No guaranteed ordering per topic with multiple partitions</p> <p>Yet, the ordering is guaranteed per partition</p>"},{"location":"designdeck/#partition-distribution","title":"Partition distribution","text":"<p>The client implements a partitioner based on the key (e.g., hash(key) % number of partitions)</p> <p>This is not done on Kafka's side</p> <ul> <li>Default hash in Java: murmur2</li> <li>Default hash in Go: FNV-1a</li> </ul> <p>If key is empty: round-robin</p>"},{"location":"designdeck/#rebalancing_1","title":"Rebalancing","text":"<p>Not possible to decrease the number of partitions: topic has to be recreated</p> <p>Possible to increase the number of partitions</p> <p>Possible issue: no more guaranteed ordering as one key may be assigned to a different partition</p>"},{"location":"designdeck/#segment","title":"Segment","text":"<p>Each partition is divided into segments</p> <p>Instead of storing all the messages of a partition in a single file, Kafka splits them into chunks called segments A log segment is a file identified by the first message offset it contains</p> <p>Properties: - <code>segment.bytes</code>: maximum segment file size before creating a new segment (default: 1GB) - <code>segment.ms</code>: period after which a new segment is created, even if the segment is not full (default: 7 days)</p>"},{"location":"designdeck/#shared-subscription","title":"Shared subscription","text":"<p>Distribute messages</p> <p>All the consumers from one consumer group receive a portion of the messages</p> <p>One partition is assigned to one consumer, one consumer can listen to multiple partitions</p>"},{"location":"designdeck/#math","title":"Math","text":""},{"location":"designdeck/#associative-property","title":"Associative property","text":"<p>A binary operation is associative if rearranging the parentheses in an expression will not change the result</p> <p>Example: <code>+</code> is associative; e.g., (2 + 3) + 4 = 2 + (3 + 4)</p>"},{"location":"designdeck/#commutative-property","title":"Commutative property","text":"<p>A binary operation is commutative if changing the order of the operands doesn't change the result</p> <p>Example: <code>+</code> is commutative, <code>/</code> isn't commutative</p>"},{"location":"designdeck/#harmonic-mean","title":"Harmonic mean","text":"<p>x1: probability of p1 (e.g. 0.5)</p> <p>Less sensitive to large outliers</p>"},{"location":"designdeck/#network","title":"Network","text":""},{"location":"designdeck/#arp-protocol","title":"ARP protocol","text":"<p>Map an IP address to a MAC address</p>"},{"location":"designdeck/#average-connection-speed-in-usa","title":"Average connection speed in USA","text":"<p>42 Mbps</p>"},{"location":"designdeck/#backpressure","title":"Backpressure","text":"<p>A node limits its own rate of sending in order to avoid overloading. Queueing is done on the sender side.</p> <p>Also known as flow control</p> <p>Example: TCP flow control</p>"},{"location":"designdeck/#bandwidth","title":"Bandwidth","text":"<p>Maximum amount of data that can be transferred in a unit of time</p>"},{"location":"designdeck/#bgp","title":"BGP","text":"<p>Border Gateway Protocol: Routing system of the internet</p> <p>When a client submits data via the Internet, BGP is responsible for looking at all of the available paths that data could travel and picking the best route</p> <p>Note: The chosen route isn't necessarily the fastest one, it can be the cheapest one. See https://technology.riotgames.com/news/fixing-internet-real-time-applications-part-i.</p>"},{"location":"designdeck/#cors","title":"CORS","text":"<p>Cross-origin resource sharing</p> <p>Mechanism to allow restricted resources on a page to be requested from another domain outside the domain from which the resource was served</p> <p>It extends and adds flexibility to SOP (Same-Origin Policy, same domain)</p> <p>Example: User visits A and the page attempts to fetch data from B: 1. Browser sends a GET request to B with Origin header A 2. Server may respond with: - Access-Control-Allow-Origin (ACAO) header set to the domain A - ACAO set to a wildcard (*) indicating that the requests from all domains are allowed - An error if the server does not allow a cross-origin request</p>"},{"location":"designdeck/#difference-ping-heartbeat","title":"Difference ping &amp; heartbeat","text":"<p>Ping: sends messages to a process and expects a response within a specified time period (request-reply)</p> <p>Heartbeat: a process is actively notifying its peers that it's still running by sending a message (notification)</p>"},{"location":"designdeck/#difference-tcp-udp","title":"Difference TCP &amp; UDP","text":"<ul> <li>Connection-oriented / Connectionless</li> <li>Reliable / Unreliable</li> <li>Ordered / Unordered</li> <li>Heavyweight / Leightweight</li> </ul>"},{"location":"designdeck/#difference-view-materialized-view","title":"Difference view &amp; materialized view","text":"<p>A view is just an abstraction (SQL request is rewritten to match the actual schema)</p> <p>A materialized view is a copy (written to disk)</p>"},{"location":"designdeck/#dns","title":"DNS","text":"<p>Domain Name System: automatic translation between a name and an IP address</p> <p></p> <p>Notes: - Usually the local DNS configuration is the ISP one (config initialized from the router or static config) - The browser, the OS and the DNS resolver all use caches internally - A TTL is used to inform the cache how long the entry is valid</p>"},{"location":"designdeck/#dns-lookup-push-or-pull","title":"DNS lookup: push or pull","text":"<p>DNS is based on the pull mode: - If record is present: DNS will return it - If record isn't present: DNS will pull the value, store it, and then return it</p> <p>Notes: - New DNS records are immediate - DNS updates are slow because of TTL (there is no propagation, we wait for cached records to expire)</p>"},{"location":"designdeck/#health-checks-passive-vs-active","title":"Health checks: passive vs. active","text":"<p>Passive: performed by the load balancer as it routes incoming requests (e.g., 503)</p> <p>Active: the load balancer actively checking the health of the servers via a query to their health endpoint</p>"},{"location":"designdeck/#internet-model","title":"Internet model","text":"<p>A network of networks</p> <p></p>"},{"location":"designdeck/#layer-4-vs-layer-7-load-balancer","title":"Layer 4 vs. layer 7 load balancer","text":"<p>Layer 4 is faster and requires less computing resources than layer 7 is but less flexible</p> <p>Layer 4: look at the info at the transport layer to distribute the requests (source, destination, port)</p> <p>Forward packet using NAT</p> <p>Layer 7: look at the info at the application layer to distribute the requests (header, message, etc.)</p> <p>Terminate the network traffic, read then open a connection to the target server</p> <p>A layer 7 can de-multiplex individual HTTP requests where multiple concurrent streams are multiplexed on the same TCP connection</p>"},{"location":"designdeck/#mac-address","title":"MAC address","text":"<p>A unique identifier assigned to a network interface</p>"},{"location":"designdeck/#max-size-of-a-tcp-packet","title":"Max size of a TCP packet","text":"<p>64K</p>"},{"location":"designdeck/#mqtt-lwt","title":"MQTT LWT","text":"<p>Last Will and Testament</p> <p>Whenever a client is marked as disconnected (proper disconnection or heartbeat failure), it triggers to send a message in a particular topic</p>"},{"location":"designdeck/#ntp","title":"NTP","text":"<p>Network Time Protocol: used to synchronize clocks</p>"},{"location":"designdeck/#osi-model","title":"OSI model","text":"<p>7 layers: 1. Physical: transmission of raw bits over a physical link (e.g., USB, Bluetooth) 2. Data link: responsible from moving a packet of data from one node to a neighbouring node 3. Network: provides a way of sending packets between nodes that are not directly linked and might belong to other networks (e.g., IP, iptables routing) 4. Transport: application to application communication, based on ports when multiple applications on the same node wants to communicate (e.g., TCP, UDP) 5. Session 6. Presentation 7. Application: protocol of exchanges between the two sides (e.g., DNS, HTTP)</p>"},{"location":"designdeck/#routers","title":"Routers","text":"<p>A way to connect networks that are connected with each other (used for the Internet)</p> <p>Capable of routing packets properly across networks so that they reach their destination successfully</p> <p>Based on the fact that an IP has a network prefix</p>"},{"location":"designdeck/#routers-buffering","title":"Routers buffering","text":"<p>Routers use queuing (buffering) to address network congestion</p> <p>A buffer has a fixed size and a fixed number of packets</p> <p>If no available buffer: packet is dropped</p> <p>Note: not a way to increase the throughput</p>"},{"location":"designdeck/#routers-processing","title":"Routers processing","text":"<p>Per-packet processing, no buffering</p> <p>Impacts: - It\u2019s faster to route 10 packets of 1000 bytes than 20 packets of 500 bytes - Sending small packets more frequently can fill the router buffer more quickly</p> <p>Source: https://technology.riotgames.com/news/fixing-internet-real-time-applications-part-i</p>"},{"location":"designdeck/#routing-table","title":"Routing table","text":"<ul> <li>Network destination and mask (together form the network identifier)</li> <li>Gateway: next node to which a packet has to be sent</li> <li>Interface: corresponding interface through which the gateway can be reached</li> </ul> <p>Example:</p> Destination Network mask Gateway Interface 0.0.0.0 0.0.0.0 240.1.1.3 if1 240.1.1.0 255.255.255.0 0.0.0.0 if1"},{"location":"designdeck/#service-mesh","title":"Service mesh","text":"<p>All network traffic from a client goes through a process co-located on the same machine (sidecar)</p> <p>Used to facilitate service-to-service communications</p>"},{"location":"designdeck/#switch","title":"Switch","text":"<p>Receive frame and forward to specific links they are addressed to. Used for local networks.</p> <p>Example: Ethernet frame</p> <p></p> <p>To do this, the switch maintains a switch table that maps MAC addresses to the corresponding interfaces that lead to them</p> <p>At first, the switch table is empty. If the entry is empty, a frame is forwarded to all the interfaces (switches are self-learning)</p>"},{"location":"designdeck/#tcp-congestion-control","title":"TCP congestion control","text":"<p>Determine dynamically the throughput (the number of segments that can be sent without an ack): - Increase exponentially for every segment ack - Decrease with a missed ack</p> <p>Upon a new connection, the size of the window is set to a system default</p> <p>It's one of the reasons why reusing a TCP connection leads to a performance increase</p>"},{"location":"designdeck/#tcp-connection-backlog","title":"TCP connection backlog","text":"<p>SYN requests are queued before being accepted by a user-mode process</p> <p>When there are too many requests for the process, the backlog reaches a limit and SYN packets are dropped (to be later retransmitted by the client)</p>"},{"location":"designdeck/#tcp-flow-control","title":"TCP flow control","text":"<p>A receiver communicates back to the sender the size of the buffer when acknowledging a segment</p> <p>Backpressure mechanism</p>"},{"location":"designdeck/#tcp-handshake","title":"TCP handshake","text":"<p>3-way handshake - syn (sender to receiver) - syn-ack (receiver to sender) // ack the segment number received - ack (sender to receiver) // ack the segment number received</p>"},{"location":"designdeck/#websocket","title":"Websocket","text":"<p>Communication protocol (layer 7) provides a full-duplex communication channel over a single TCP connection and bidirectional streaming capabilities</p> <p>Different from HTTP but compatible with HTTP (starts as an HTTP connection and then is upgraded via a well-defined handshake to a TCP connection)</p> <p>Obsolete with HTTP/2</p>"},{"location":"designdeck/#why-cant-we-rely-on-the-system-clock-in-distributed-systems","title":"Why can't we rely on the system clock in distributed systems?","text":"<ul> <li>There's no guarantee that times are synchronized</li> <li>In the case of an NTP synchronization, the system clock of one node can jump backward in time</li> </ul>"},{"location":"designdeck/#reliability","title":"Reliability","text":""},{"location":"designdeck/#bulkhead-pattern","title":"Bulkhead pattern","text":"<p>Provides guaranteed fault isolation by design</p> <p>Based on the idea of partitioning a shared resource to isolate failures</p>"},{"location":"designdeck/#cascading-failure","title":"Cascading failure","text":"<p>A process in a system of interconnected parts in which the failure of one or few parts can trigger the failure of other parts and so on</p>"},{"location":"designdeck/#causal-consistency-implementation","title":"Causal consistency implementation","text":"<p>When a replica receives a new write, it doesn't apply it locally immediately. First, it checks whether the write's dependencies have been committed locally. If not, it waits until the required version appears.</p>"},{"location":"designdeck/#circuit-breaker","title":"Circuit breaker","text":"<p>Used to prevent a network or service failure from cascading to other failures</p> <p>Implemented on the client-side</p> <p>Three states: - Closed: accept requests - Open: do not accept requests and fail immediately - Half-open: give the service another chance (can also be implemented using a probe)</p> <p>The circuit can be opened when the health endpoint of the service is down or when the number of consecutive errors reaches a threshold</p>"},{"location":"designdeck/#exponential-backoff","title":"Exponential backoff","text":"<p>Wait time increased exponentially after every retry attempt</p>"},{"location":"designdeck/#fault-tolerance","title":"Fault tolerance","text":"<p>Property of a system that can continue operating correctly in the presence of failure of its components</p>"},{"location":"designdeck/#jitter","title":"Jitter","text":"<p>Introduces a part of randomness to avoid synchronized retry spikes experienced during cascading failures</p>"},{"location":"designdeck/#knee-point","title":"Knee point","text":"<p>Moment when linear scalability is not possible anymore</p>"},{"location":"designdeck/#phi-accrual-failure-detector","title":"Phi-accrual failure detector","text":"<p>Instead of treating failure node failure as a binary problem (up or down), a phi-accrual failure detector has a continuous scale, capturing the probability of the monitored process's crash</p> <p>Works by maintaining a sliding window, collecting arrival times of the most recent heartbeats</p> <p>Used to approximate the arrival time of the next heartbeat and compute a suspicion level (how certain the failure detector is about a failure)</p>"},{"location":"designdeck/#retry-amplification","title":"Retry amplification","text":"<p>Having retries at multiple levels of the dependency chain can amplify the number of retry</p> <p>The deeper a service in the chain, the higher the load it will be exposed to due to amplification:</p> <p></p> <p>In case of a long dependency chain, perhaps we should only retry at a single level of the chain</p>"},{"location":"designdeck/#security","title":"Security","text":""},{"location":"designdeck/#authentication","title":"Authentication","text":"<p>Process of determining whether someone or something is who or what it declares itself to be</p>"},{"location":"designdeck/#certificate-authorities","title":"Certificate authorities","text":"<p>Organizations issuing certificates by signing them</p>"},{"location":"designdeck/#cipher","title":"Cipher","text":"<p>Encryption algorithm</p>"},{"location":"designdeck/#confidentiality","title":"Confidentiality","text":"<p>Process of protecting information from being accessed by unauthorized parties</p> <p>Mainly achieved via encryption</p>"},{"location":"designdeck/#integrity","title":"Integrity","text":"<p>The process of preserving the accuracy and completeness of data over its entire lifecycle, so that they cannot be modified in an unauthorized or undetected manner</p>"},{"location":"designdeck/#mutual-tls","title":"Mutual TLS","text":"<p>Add client authentication using a certificate</p>"},{"location":"designdeck/#oauth-2","title":"OAuth 2","text":"<p>Standard for access delegation</p> <p>Process - Client gets a token from an authorization server - Makes a request to a server using the token - Server validates the token to the authorization server</p> <p>Notes: some token types like JWT are self-contained, meaning the validation can be done by the server without a call to the authorization server</p>"},{"location":"designdeck/#public-key-infrastructure-pki","title":"Public key infrastructure (PKI)","text":"<p>System for managing, storing, and distributing certificates</p> <p>Relies on certificate revocation lists (CRLs)</p>"},{"location":"designdeck/#tls-handshake","title":"TLS handshake","text":"<p>With mutual TLS:</p> <ol> <li>Client hello: protocol, cipher, etc.</li> <li>Server hello: supported cipher, etc.</li> <li>Server sends its certificate</li> <li>Client checks the server certificate (e.g., make sure the CA are trusted in its truststore, etc.)</li> <li>Client sends its certificate</li> <li>Server checks the client certificate</li> <li>The client generates a session key encrypted with the public key of the client certificate (asymmetric encryption) and sends it to the server</li> <li>Client sends data and encrypts each packet using the session key (symmetric encryption)</li> </ol> <p>One way: the session key is generated by the client</p>"},{"location":"designdeck/#two-main-uses-of-encryption","title":"Two main uses of encryption","text":"<p>Encryption in transit</p> <p>Encryption at rest</p>"},{"location":"designdeck/#two-types-of-encryption","title":"Two types of encryption","text":"<p>Symmetric: key is shared between a client and a server (faster)</p> <p>Asymmetric: two keys are used, a private and a public one - Client encrypts a message with the public key - Server decrypts the message with its private key</p>"},{"location":"designdeck/#what-does-digital-signature-provide","title":"What does digital signature provide","text":"<p>Integrity and authentication</p>"},{"location":"designdeck/#what-does-tls-provide","title":"What does TLS provide?","text":"<ul> <li>Confidentiality</li> <li>Authentication</li> <li>Integrity</li> </ul>"}]}
\ No newline at end of file
+{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Algo Deck","text":"Anki <p>Check the Anki version here.</p>"},{"location":"#array","title":"Array","text":""},{"location":"#algorithm-to-reverse-an-array","title":"Algorithm to reverse an array","text":"<pre><code>int i = 0;\nint j = a.length - 1;\nwhile (i &lt; j) {\n    swap(a, i++, j--);\n}\n</code></pre>"},{"location":"#array-complexity-access-search-insert-delete","title":"Array complexity: access, search, insert, delete","text":"<p>Access: O(1)</p> <p>Search: O(n)</p> <p>Insert: O(n)</p> <p>Delete: O(n)</p>"},{"location":"#binary-search-in-a-sorted-array-algorithm","title":"Binary search in a sorted array algorithm","text":"<pre><code>int lo = 0, hi = a.length - 1;\n\nwhile (lo &lt;= hi) {\n    int mid = lo + ((hi - lo) / 2);\n    if (a[mid] == key) {\n        return mid;\n    }\n    if (a[mid] &lt; key) {\n        lo = mid + 1;\n    } else {\n        hi = mid - 1;\n    }\n}\n</code></pre>"},{"location":"#further-reading","title":"Further Reading","text":"<ul> <li>Nearly All Binary Searches and Mergesorts are Broken by the Google AI Blog</li> </ul>"},{"location":"#find-an-element-in-a-rotated-sorted-array","title":"Find an element in a rotated sorted array","text":"<p>Solution: binary search</p> <p>Check first if the array is rotated. If not, apply normal binary search</p> <p>If rotated, find pivot (smallest element, only element whose previous is bigger)</p> <p>Then, check if the element is in 0..pivot-1 or pivot..len-1</p> <pre><code>int findElementRotatedArray(int[] a, int val) {\n    // If array not rotated\n    if (a[0] &lt; a[a.length - 1]) {\n        // We apply the normal binary search\n        return binarySearch(a, val, 0, a.length - 1);\n    }\n\n    int pivot = findPivot(a);\n\n    if (val &gt;= a[0] &amp;&amp; val &lt;= a[pivot - 1]) {\n        // Element is before the pivot\n        return binarySearch(a, val, 0, pivot - 1);\n    } else if (val &gt;= a[pivot] &amp;&amp; val &lt; a.length - 1) {\n        // Element is after the pivot\n        return binarySearch(a, val, pivot, a.length - 1);\n    }\n    return -1;\n}\n</code></pre>"},{"location":"#given-an-array-move-all-the-0-to-the-left-while-maintaining-the-order-of-the-other-elements","title":"Given an array, move all the 0 to the left while maintaining the order of the other elements","text":"<p>Example: 1, 0, 2, 0, 3, 0 =&gt; 0, 0, 0, 1, 2, 3</p> <p>Two pointers technique: read and write starting at the end of the array</p> <p>If read is on a 0, decrement read. Otherwise swap, decrement both</p> <pre><code>public void move(int[] a) {\n    int w = a.length - 1, r = a.length - 1;\n    while (r &gt;= 0) {\n        if (a[r] == 0) {\n            r--;\n        } else {\n            swap(a, r--, w--);\n        }\n    }\n}\n</code></pre> <p>Time complexity: O(n)</p> <p>Space complexity: O(1)</p>"},{"location":"#how-to-detect-if-an-element-is-a-pivot-in-a-rotated-sorted-array","title":"How to detect if an element is a pivot in a rotated sorted array","text":"<p>Only element whose previous is bigger (also the pivot is the smallest element)</p>"},{"location":"#how-to-find-a-pivot-element-in-a-rotated-array","title":"How to find a pivot element in a rotated array","text":"<p>Check first if the array is rotated</p> <p>Then, apply binary search (comparison with a[right] to know if we go left or right)</p> <pre><code>int findPivot(int[] a) {\n    int left = 0, right = a.length - 1;\n\n    // Array is not rotated\n    if (a[left] &lt; a[right]) {\n        return -1;\n    }\n\n    while (left &lt;= right) {\n        int mid = left + ((right - left) / 2);\n        if (mid &gt; 0 &amp;&amp; a[mid] &lt; a[mid - 1]) {\n            return a[mid];\n        }\n\n        if (a[mid] &lt; a[right]) {\n            // Pivot is on the left\n            right = mid - 1;\n        } else {\n            // Pivot is on the right\n            left = mid + 1;\n        }\n    }\n\n    return -1;\n}\n</code></pre>"},{"location":"#how-to-find-the-duplicates-in-an-array","title":"How to find the duplicates in an array","text":"<ul> <li>Hashtable</li> <li>Sorting the array then iterating over each element and check if previous = current</li> </ul>"},{"location":"#how-to-manage-a-dynamic-array","title":"How to manage a dynamic array","text":"<p>When full, create a new array of twice the size, copy items (System.arraycopy is optimized for that)</p> <p>Shrink: - Not when one-half full (otherwise worst case is too expensive: double-shrink-double-shrink etc.) - Solution: one-quarter full</p>"},{"location":"#how-to-test-if-the-array-is-sorted-in-ascending-or-descending-order","title":"How to test if the array is sorted in ascending or descending order","text":"<p>Test first and last element (no iteration)</p>"},{"location":"#rotate-an-array-by-n-elements-n-can-be-negative","title":"Rotate an array by n elements (n can be negative)","text":"<p>Example: 1, 2, 3, 4, 5 with n = 3 =&gt; 3, 4, 5, 1, 2</p> <ul> <li>Reverse the initial array</li> <li>Reverse from 0 to n-1</li> <li>Reverse from n to len - 1</li> </ul> <pre><code>void rotateArray(List&lt;Integer&gt; a, int n) {\n    if (n &lt; 0) {\n        n = a.size() + n;\n    }\n\n    reverse(a, 0, a.size() - 1);\n    reverse(a, 0, n - 1);\n    reverse(a, n, a.size() - 1);\n}\n</code></pre> <p>Time complexity: O(n)</p> <p>Memory complexity: O(1)</p>"},{"location":"#bit","title":"Bit","text":""},{"location":"#operator","title":"&amp; operator","text":"<p>AND bit by bit</p>"},{"location":"#operator_1","title":"&lt;&lt; operator","text":"<p>Shift on the left</p> <p>n * 2 &lt;=&gt; left shift by 1</p> <p>n * 4 &lt;=&gt; left shift by 2</p>"},{"location":"#operator_2","title":"&gt;&gt; operator","text":"<p>Shift on the right</p>"},{"location":"#operator_3","title":"&gt;&gt;&gt; operator","text":"<p>Logical shift (shift the sign bit as well)</p>"},{"location":"#operator_4","title":"^ operator","text":"<p>XOR bit by bit</p>"},{"location":"#bit-vector-structure","title":"Bit vector structure","text":"<p>Vector (linear sequence of numeric values stored contiguously in memory) in which each element is a bit (so either 0 or 1)</p>"},{"location":"#check-exactly-one-bit-is-set","title":"Check exactly one bit is set","text":"<pre><code>boolean checkExactlyOneBitSet(int num) {\n    return num != 0 &amp;&amp; (num &amp; (num - 1)) == 0;\n}\n</code></pre>"},{"location":"#clear-bits-from-i-to-0","title":"Clear bits from i to 0","text":"<pre><code>int clearBitsFromITo0(int num, int i) {\n    int mask = (-1 &lt;&lt; (i + 1));\n    return num &amp; mask;\n}\n</code></pre>"},{"location":"#clear-bits-from-most-significant-one-to-i","title":"Clear bits from most significant one to i","text":"<pre><code>int clearBitsFromMsbToI(int num, int i) {\n    int mask = (1 &lt;&lt; i) - 1;\n    return num &amp; mask;\n}\n</code></pre>"},{"location":"#clear-ith-bit","title":"Clear ith bit","text":"<pre><code>int clearBit(final int num, final int i) {\n    final int mask = ~(1 &lt;&lt; i);\n    return num &amp; mask;\n}\n</code></pre>"},{"location":"#flip-ith-bit","title":"Flip ith bit","text":"<pre><code>int flipBit(final int num, final int i) {\n    return num ^ (1 &lt;&lt; i);\n}\n</code></pre>"},{"location":"#get-ith-bit","title":"Get ith bit","text":"<pre><code>boolean getBit(final int num, final int i) {\n    return ((num &amp; (1 &lt;&lt; i)) != 0);\n}\n</code></pre>"},{"location":"#how-to-flip-one-bit","title":"How to flip one bit","text":"<p>b ^ 1</p>"},{"location":"#how-to-represent-signed-integers","title":"How to represent signed integers","text":"<p>Use the most significative bit to represent the sign. Yet, it is not enough (problem with this technique: 5 + (-5) != 0)</p> <p>Two's complement technique: take the one complement and add one</p> <p>-3: 1101</p> <p>-2: 1110</p> <p>-1: 1111</p> <p>0:  0000</p> <p>1:  0001</p> <p>2:  0010</p> <p>3:  0011</p> <p>The most significant bit still represents the sign</p> <p>Max integer value: 1...1 (31 bits)</p> <p>-1: 1...1 (32 bits)</p>"},{"location":"#set-ith-bit","title":"Set ith bit","text":"<pre><code>int setBit(final int num, final int i) {\n    return num | (1 &lt;&lt; i);\n}\n</code></pre>"},{"location":"#update-a-bit-from-a-given-value","title":"Update a bit from a given value","text":"<ul> <li>Clear this bit</li> <li>Apply OR on the result with a 0 or 1 left shifted to its index</li> </ul> <pre><code>int updateBit(int num, int i, boolean bit) {\n    int value = bit ? 1 : 0;\n    int mask = ~(1 &lt;&lt; i);\n    return (num &amp; mask) | (value &lt;&lt; i);\n}\n</code></pre>"},{"location":"#x-0s","title":"x &amp; 0s","text":"<p>0</p>"},{"location":"#x-1s","title":"x &amp; 1s","text":"<p>x</p>"},{"location":"#x-x","title":"x &amp; x","text":"<p>x</p>"},{"location":"#x-0s_1","title":"x ^ 0s","text":"<p>x</p>"},{"location":"#x-1s_1","title":"x ^ 1s","text":"<p>~x</p>"},{"location":"#x-x_1","title":"x ^ x","text":"<p>0</p>"},{"location":"#x-0s_2","title":"x | 0s","text":"<p>x</p>"},{"location":"#x-1s_2","title":"x | 1s","text":"<p>1s</p>"},{"location":"#x-x_2","title":"x | x","text":"<p>x</p>"},{"location":"#xor-operations","title":"XOR operations","text":"<p>0 ^ 0 = 0</p> <p>1 ^ 0 = 1</p> <p>0 ^ 1 = 1</p> <p>1 ^ 1 = 0</p> <p>n XOR 0 =&gt; keep</p> <p>n XOR 1 =&gt; flip</p>"},{"location":"#operator_5","title":"| operator","text":"<p>OR bit by bit</p>"},{"location":"#operator_6","title":"~ operator","text":"<p>Complement bit by bit</p>"},{"location":"#complexity","title":"Complexity","text":"<p>Big-O Cheat Sheet</p>"},{"location":"#01-knapsack-brute-force-complexity","title":"0/1 Knapsack brute force complexity","text":"<p>Time complexity: O(2^n) with n the number of items</p> <p>Space complexity: O(n)</p>"},{"location":"#01-knapsack-memoization-complexity","title":"0/1 Knapsack memoization complexity","text":"<p>Time and space complexity: O(n * c) with n the number items and c the capacity</p>"},{"location":"#01-knapsack-tabulation-complexity","title":"0/1 Knapsack tabulation complexity","text":"<p>Time and space complexity: O(n * c) with n the number of items and c the capacity</p> <p>Space complexity could even be improved to O(2*c) = O(c) as we need to store only the last 2 lines (using row%2):</p> <pre><code>int[][] dp = new int[2][c + 1];\n</code></pre>"},{"location":"#amortized-complexity-definition","title":"Amortized complexity definition","text":"<p>How much of a resource (time or memory) it takes to execute per operation on average</p>"},{"location":"#array-complexity-access-search-insert-delete_1","title":"Array complexity: access, search, insert, delete","text":"<p>Access: O(1)</p> <p>Search: O(n)</p> <p>Insert: O(n)</p> <p>Delete: O(n)</p>"},{"location":"#b-tree-complexity-access-insert-delete","title":"B-tree complexity: access, insert, delete","text":"<p>All: O(log n)</p>"},{"location":"#bfs-and-dfs-graph-traversal-time-and-space-complexity","title":"BFS and DFS graph traversal time and space complexity","text":"<p>Time: O(v + e) with v the number of vertices and e the number of edges</p> <p>Space: O(v)</p>"},{"location":"#bfs-and-dfs-tree-traversal-time-and-space-complexity","title":"BFS and DFS tree traversal time and space complexity","text":"<p>BFS: time O(v), space O(v)</p> <p>DFS: time O(v), space O(h) (height of the tree)</p>"},{"location":"#big-o","title":"Big O","text":"<p>Upper bound</p>"},{"location":"#big-omega","title":"Big Omega","text":"<p>Lower bound (fastest)</p>"},{"location":"#big-theta","title":"Big Theta","text":"<p>Theta(n) if both O(n) and Omega(n)</p>"},{"location":"#binary-heap-min-heap-or-max-heap-complexity-insert-get-min-max-delete-min-max","title":"Binary heap (min-heap or max-heap) complexity: insert, get min (max), delete min (max)","text":"<p>Insert: O(log (n))</p> <p>Get min (max): O(1)</p> <p>Delete min: O(log n)</p> <p>If not balanced O(n)</p> <p>If balanced O(log n)</p>"},{"location":"#bst-delete-algo-and-complexity","title":"BST delete algo and complexity","text":"<p>Find inorder successor and swap it</p> <p>Average: O(log n)</p> <p>Worst: O(h) if not self-balanced BST, otherwise O(log n)</p>"},{"location":"#bubble-sort-complexity-and-stability","title":"Bubble sort complexity and stability","text":"<p>Time: O(n\u00b2)</p> <p>Space: O(1)</p> <p>Stable</p>"},{"location":"#complexity-of-a-function-making-multiple-recursive-subcalls","title":"Complexity of a function making multiple recursive subcalls","text":"<p>Time: O(branches^depth) with branches the number of times each recursive call branches (english: 2 power 3)</p> <p>Space: O(depth) to store the call stack</p>"},{"location":"#complexity-to-create-a-trie","title":"Complexity to create a trie","text":"<p>Time and space: O(n * l) with n the number of words and l the longest word length</p>"},{"location":"#complexity-to-insert-a-key-in-a-trie","title":"Complexity to insert a key in a trie","text":"<p>Time: O(k) with k the size of the key</p> <p>Space: O(1) iterative, O(k) recursive</p>"},{"location":"#complexity-to-search-for-a-key-in-a-trie","title":"Complexity to search for a key in a trie","text":"<p>Time: O(k) with k the size of the key</p> <p>Space: O(1) iterative or O(k) recursive</p>"},{"location":"#counting-sort-complexity-stability-use-case","title":"Counting sort complexity, stability, use case","text":"<p>Time complexity: O(n + k) // n is the number of elements, k is the range (the maximum element)</p> <p>Space complexity: O(k)</p> <p>Stable</p> <p>Use case: known and small range of possible integers</p>"},{"location":"#doubly-linked-list-complexity-access-insert-delete","title":"Doubly linked list complexity: access, insert, delete","text":"<p>Access: O(n)</p> <p>Insert: O(1)</p> <p>Delete: O(1)</p>"},{"location":"#hash-table-complexity-search-insert-delete","title":"Hash table complexity: search, insert, delete","text":"<p>All: amortized O(1), worst O(n)</p>"},{"location":"#heapsort-complexity-stability-use-case","title":"Heapsort complexity, stability, use case","text":"<p>Time: Theta(n log n)</p> <p>Space: O(1)</p> <p>Unstable</p> <p>Use case: space constrained environment with O(n log n) time guarantee</p> <p>Yet, not stable and not cache friendly</p>"},{"location":"#insertion-sort-complexity-stability-use-case","title":"Insertion sort complexity, stability, use case","text":"<p>Time: O(n\u00b2)</p> <p>Space: O(1)</p> <p>Stable</p> <p>Use case: partially sorted structure</p>"},{"location":"#linked-list-complexity-access-insert-delete","title":"Linked list complexity: access, insert, delete","text":"<p>Access: O(n)</p> <p>Insert: O(1)</p> <p>Delete: O(1)</p>"},{"location":"#mergesort-complexity-stability-use-case","title":"Mergesort complexity, stability, use case","text":"<p>Time: Theta(n log n)</p> <p>Space: O(n)</p> <p>Stable</p> <p>Use case: good worst case time complexity and stable, good with linked list</p>"},{"location":"#quicksort-complexity-stability-use-case","title":"Quicksort complexity, stability, use case","text":"<p>Time: best and average O(n log n), worst O(n\u00b2) if the array is already sorted in ascending or descending order</p> <p>Space: O(log n) // In-place sorting algorithm</p> <p>Not stable</p> <p>Use case: in practice, quicksort is often faster than merge sort due to better locality (not applicable with linked list so in this case we prefer mergesort)</p>"},{"location":"#radix-sort-complexity-stability-use-case","title":"Radix sort complexity, stability, use case","text":"<p>Time complexity: O(nk) // n is the number of elements, k is the maximum number of digits for a number</p> <p>Space complexity: O(k)</p> <p>Stable</p> <p>Use case: if k &lt; log(n) (for example 1M of elements from 0..1000 as 4 &lt; log(1M))</p>"},{"location":"#recursivity-impacts-on-algorithm-complexity","title":"Recursivity impacts on algorithm complexity","text":"<p>Space impact as each call is added to the call stack</p> <p>Unless we use tail call recursion</p>"},{"location":"#red-black-tree-complexity-access-insert-delete","title":"Red-black tree complexity: access, insert, delete","text":"<p>All: O(log n)</p>"},{"location":"#selection-sort-complexity","title":"Selection sort complexity","text":"<p>Time: Theta(n\u00b2)</p> <p>Space: O(1)</p>"},{"location":"#stack-implementations-and-insertdelete-complexity","title":"Stack implementations and insert/delete complexity","text":"<ul> <li>Linked list with a pointer on the head</li> </ul> <p>Insert: O(1)</p> <p>Delete: O(1)</p> <ul> <li>Array</li> </ul> <p>Insert: O(n), amortized time O(1)</p> <p>Delete: O(1)</p>"},{"location":"#time-complexity-to-build-a-binary-heap","title":"Time complexity to build a binary heap","text":"<p>O(n)</p> <p>Time and space: O(v + e)</p>"},{"location":"#dynamic-programming","title":"Dynamic Programming","text":""},{"location":"#dynamic-programming-concept","title":"Dynamic programming concept","text":"<p>Break down a problem in smaller parts and store the results of these subproblems so that they only need to be computed once</p> <p>A DP algorithm will search through all of the possible subproblems (main difference with greedy algorithms)</p> <p>Based on either: - Memoization (top-down) - Tabulation (bottom-up)</p>"},{"location":"#memoization-vs-tabulation","title":"Memoization vs tabulation","text":"<p>Optimization technique to cache previously computed results</p> <p>Used by dynamic programming algorithms</p> <p>Memoization: top-down (start with a large, complex problem and break it down into smaller sub-problems)</p> <pre><code>f(x) {\n    if (mem[x] is undefined)\n        mem[x] = f(x-1) + f(x-2)\n    return mem[x]\n}\n</code></pre> <p>Tabulation: bottom-up (start with the smallest solution and then build up each solution until we arrive at the solution to the initial problem)</p> <pre><code>tabFib(n) {\n    mem[0] = 0\n    mem[1] = 1\n    for i = 2...n\n        mem[i] = mem[i-2] + mem[i-1]\n    return mem[n]\n}\n</code></pre>"},{"location":"#encoding","title":"Encoding","text":""},{"location":"#ascii-charset","title":"ASCII charset","text":"<p>128 characters</p>"},{"location":"#difference-encodingcharset","title":"Difference encoding/charset","text":"<p>Charset: set of characters to be used (e.g. ASCII 128 characters)</p> <p>Encoding: translation of a list of characters in binary</p> <p>Encoding is used because for all charset we can't guarantee 1 character = 1 byte</p> <p>Example: UTF-8 to encode Unicode characters using from 1 byte (english) up to 6 bytes</p>"},{"location":"#unicode-charset","title":"Unicode charset","text":"<p>Superset of ASCII with 2^21 characters</p>"},{"location":"#general","title":"General","text":""},{"location":"#before-finding-a-solution","title":"Before finding a solution","text":"<p>1) Make sure to understand the problem by listing: - Inputs - Outputs (what do we search) - Constraints</p> <p>2) Draw examples</p>"},{"location":"#comparator-implementation-to-order-two-integers","title":"Comparator implementation to order two integers","text":"<p>Ordering, min-heap: (a, b) -&gt; a - b</p> <p>Reverse ordering, max-heap: (a, b) -&gt; b - a</p> <p>7 ways: 1. a and b do not overlap 2. a and b overlap, b ends after a 3. a completely overlaps b 4. a and b overlap, a ends after b 5. b completely overlaps a 6. a and b do no overlap 7. a and b are equals</p>"},{"location":"#different-ways-for-two-intervals-to-relate-to-each-other-if-ordered-by-start-then-end","title":"Different ways for two intervals to relate to each other if ordered by start then end","text":"<p>2 different ways: - No overlap - Overlap // Merge intervals (start of the first interval, max of the two ends)</p>"},{"location":"#divide-and-conquer-algorithm-paradigm","title":"Divide and conquer algorithm paradigm","text":"<ol> <li>Divide: break a given problem into subproblems of same type</li> <li>Conquer: recursively solve these subproblems</li> <li>Combine: combine the answers to solve the initial problem</li> </ol> <p>Example with merge sort: 1. Split the array into two halves 2. Sort them (recursive call) 3. Merge the two halves</p>"},{"location":"#how-to-name-a-matrix-indexes","title":"How to name a matrix indexes","text":"<p>Use m[row][col] instead of m[y][x]</p>"},{"location":"#if-stucked-on-a-problem","title":"If stucked on a problem","text":"<ul> <li>Start with the smallest and easiest problem (e.g. 2 elements) and build a solution for that. Then, add elements and see if we can find a common pattern</li> <li>Greedy method</li> <li>Traversal technique</li> </ul>"},{"location":"#in-place-definition","title":"In place definition","text":"<p>Mutates an input</p>"},{"location":"#p-vs-np-problems","title":"P vs NP problems","text":"<p>P (polynomial): set of problems that can be solved reasonably fast (example: multiplication, sorting, etc.)</p> <p>Complexity is not exponential</p> <p>NP (non-deterministic polynomial): set of problems where given a solution, we can test is it is a correct one in a reasonable amount of time but finding the solution is not fast (example: a 1M*1M sudoku grid, traveling salesman problem, etc)</p> <p>NP-complete: hardest problems in the NP set</p> <p>There are other sets of problems that are not P nor NP as an answer is really hard to prove (example: best move in a chess game)</p> <p>P = NP means does being able to quickly recognize correct answers means there's also a quick way to find them?</p>"},{"location":"#solving-optimization-problems","title":"Solving optimization problems","text":"<ul> <li>Greedy method</li> <li>Dynamic programming (memoization or tabulation)</li> <li>Branch and bound (minimization problem only)</li> </ul>"},{"location":"#stable-property","title":"Stable property","text":"<p>Preserve the original order of elements with equal key</p>"},{"location":"#what-do-to-after-having-designed-a-solution","title":"What do to after having designed a solution","text":"<p>Testing on nominal cases then edge cases</p> <p>Time and space complexity</p>"},{"location":"#graph","title":"Graph","text":""},{"location":"#a-algorithm","title":"A* algorithm","text":"<p>Complete solution to find the shortest path to a target node</p> <p>Algorithm: - Put initial state in a priority queue - While priority queue is not empty: poll an element and inserts all neighbours - If target is reached, update a min variable</p> <p>Priority is computed using the evaluation function: f(n) = h + g where h is an heuristic (local cost to visit a node) and g is the cost so far (length of the path so far)</p>"},{"location":"#backedge-definition","title":"Backedge definition","text":"<p>An edge from a node to itself or to an ancestor</p>"},{"location":"#best-first-search-algorithm","title":"Best-first search algorithm","text":"<p>Greedy solution (non-complete) to find the shortest path to a target node</p> <p>Algorithm: - Put initial state in a priority queue - While target not reached: poll an element and inserts all neighbours</p> <p>Priority is computed using the evaluation function: f(n) = h where h is an heuristic (local cost to visit a node)</p>"},{"location":"#bfs-dfs-graph-traversal-use-cases","title":"BFS &amp; DFS graph traversal use cases","text":"<p>BFS: shortest path</p> <p>DFS: does a path exist, does a cycle exist (memo: D for Does)</p> <p>DFS stores a single path at a time, requires less memory than BFS (on average but same space complexity)</p>"},{"location":"#bfs-and-dfs-graph-traversal-time-and-space-complexity_1","title":"BFS and DFS graph traversal time and space complexity","text":"<p>Time: O(v + e) with v the number of vertices and e the number of edges</p> <p>Space: O(v)</p>"},{"location":"#bidirectional-search","title":"Bidirectional search","text":"<p>Run two simultaneous BFS, one from the source, one from the target</p> <p>Once their searches collide, we found a path</p> <p>If branching factor of a tree is b and the distance to the target vertex is d, then the normal BFS/DFS searching time complexity would we O(b^d)</p> <p>Here it is O(b^(d/2))</p>"},{"location":"#connected-graph-definition","title":"Connected graph definition","text":"<p>If there is a path between every pair of vertices, the graph is called connected</p> <p>Otherwise, the graph consists of multiple isolated subgraphs</p>"},{"location":"#difference-best-first-search-and-a-algorithms","title":"Difference Best-first search and A* algorithms","text":"<p>Best-first search is a greedy solution: not complete // a solution can be not optimal</p> <p>A*: complete</p>"},{"location":"#dijkstra-algorithm","title":"Dijkstra algorithm","text":"<p>Input: graph, initial vertex</p> <p>Output: for each vertex: shortest path and previous node // The previous node is the one we are coming from in the shortest path. To find the shortest path between two nodes, we need to iterate backwards.  Example: A -&gt; C =&gt; E, D, A</p> <p></p> <p>Algorithm: - Init the shortest distance to MAX except for the initial node - Init a priority queue where the comparator will be on the total distance so far - Init a set to store all visited node - Add initial vertex to the priority queue - While queue is not empty: Poll a vertex (mark it visited) and check the total distance to each neighbour (current distance + distance so far), update shortest and previous arrays if smaller. If destination was unvisited, adds it to the queue</p> <pre><code>void dijkstra(GraphAjdacencyMatrix graph, int initial) {\n    Set&lt;Integer&gt; visited = new HashSet&lt;&gt;();\n\n    int n = graph.vertex;\n    int[] shortest = new int[n];\n    int[] previous = new int[n];\n    for (int i = 0; i &lt; n; i++) {\n        if (i != initial) {\n            shortest[i] = Integer.MAX_VALUE;\n        }\n    }\n\n    // Entry: key=vertex, value=distance so far\n    PriorityQueue&lt;Entry&gt; minHeap = new PriorityQueue&lt;&gt;((e1, e2) -&gt; e1.value - e2.value);\n    minHeap.add(new Entry(initial, 0));\n\n    while (!minHeap.isEmpty()) {\n        Entry current = minHeap.poll();\n        int source = current.key;\n        int distanceSoFar = current.value;\n\n        // Get neighbours\n        List&lt;GraphAjdacencyMatrix.Edge&gt; edges = graph.getEdge(source);\n\n        for (GraphAjdacencyMatrix.Edge edge : edges) {\n            // For each neighbour, check the total distance\n            int distance = distanceSoFar + edge.distance;\n            if (distance &lt; shortest[edge.destination]) {\n                shortest[edge.destination] = distance;\n                previous[edge.destination] = source;\n            }\n\n            // Add the element in the queue if not visited\n            if (!visited.contains(edge.destination)) {\n                minHeap.add(new Entry(edge.destination, distance));\n            }\n        }\n\n        visited.add(source);\n    }\n\n    print(shortest);\n    print(previous);\n}\n</code></pre>"},{"location":"#dynamic-connectivity-problem","title":"Dynamic connectivity problem","text":"<p>Given a set of nodes and edges: are two nodes connected (directly or in-directly)?</p> <p>Two methods: - union(2, 5) // connect object 2 with object 5 - connected(1 , 6) // is object 1 connected to object 6?</p>"},{"location":"#further-reading_1","title":"Further Reading","text":"<ul> <li>Dynamic Connectivity Problem by Omar El Gabry</li> </ul>"},{"location":"#dynamic-connectivity-problem-quick-find-solution","title":"Dynamic connectivity problem - Quick-find solution","text":"<p>Array of integer of size N initialized with their index (0: 0, 1: 1 etc.).</p> <p>If two indexes have the same value, they belong to the same group.</p> <ul> <li>Is connected: id[p] == id[q] // O(1)</li> <li>Union: change all elements in the array whose value is equals to id[q] and set them to id[p] // O(n)</li> </ul>"},{"location":"#dynamic-connectivity-problem-quick-union-solution","title":"Dynamic connectivity problem - Quick-union solution","text":"<p>Init: integer array of size N</p> <p>Interpretation: id[i] is parent of i, root parent if id[i] == i</p> <ul> <li>Is connected: check if p and q have the same parent // O(n)</li> <li>Union: set the id of p's root to the id of q's root // O(n)</li> </ul>"},{"location":"#dynamic-connectivity-problem-weighted-quick-union-solution","title":"Dynamic connectivity problem - Weighted Quick-union solution","text":"<p>Modify quick-union to avoid tall tree</p> <p>Keep track of the size of each tree (number of nodes): extra array size[i] to count number of objects in the tree rooted at i</p> <p>O(n) extra space</p> <ul> <li>Union: link root of smaller tree to root of larger tree // O(log(n))</li> <li>Is connected: root(p) == root(q) // O(log(n))</li> </ul>"},{"location":"#given-n-tasks-from-0-to-n-1-and-a-list-of-relations-so-that-a-b-means-a-must-be-scheduled-before-b-how-to-know-if-it-is-possible-to-schedule-all-the-tasks-no-cycle","title":"Given n tasks from 0 to n-1 and a list of relations so that a -&gt; b means a must be scheduled before b, how to know if it is possible to schedule all the tasks (no cycle)","text":"<p>Solution: topological sort</p> <p>If there's a cycle in the relations, it means it is not possible to shedule all the tasks</p> <p>There is a cycle if the produced sorted array size is different from n</p>"},{"location":"#graph-definition","title":"Graph definition","text":"<p>A way to represent a network, or a collection of inteconnected objects</p> <p>G = (V, E) with V a set of vertices (or nodes) and E a set of edges (or links)</p>"},{"location":"#graph-traversal-bfs","title":"Graph traversal: BFS","text":"<p>Traverse broad into the graph by visiting the sibling/neighbor before children nodes (one level of children at a time)</p> <p>Iterative using a queue</p> <p>Algorithm: similar with tree except we need to mark the visited nodes, can start with any nodes</p> <pre><code>Queue&lt;Node&gt; queue = new LinkedList&lt;&gt;();\nNode first = graph.nodes.get(0);\nqueue.add(first);\nfirst.markVisitied();\n\nwhile (!queue.isEmpty()) {\n    Node node = queue.poll();\n    System.out.println(node.name);\n\n    for (Edge edge : node.connections) {\n        if (!edge.end.visited) {\n            queue.add(edge.end);\n            edge.end.markVisited();\n        }\n    }\n}\n</code></pre>"},{"location":"#graph-traversal-dfs","title":"Graph traversal: DFS","text":"<p>Traverse deep into the graph by visiting the children before sibling/neighbor nodes (traverse down one single path)</p> <p>Walk through a path, backtrack until we found a new path</p> <p>Algorithm: recursive or iterative using a stack (same algo than BFS except we use a queue instead of a stack)</p>"},{"location":"#how-to-compute-the-shortest-path-between-two-nodes-in-an-unweighted-graph","title":"How to compute the shortest path between two nodes in an unweighted graph","text":"<p>BFS traversal by using an array to keep track of the min distance distances[i] gives the shortest distance between the input node and the node of id i</p> <p>Algorithm: no need to keep track of the visited node, it is replaced by a test on the distance array</p> <pre><code>Queue&lt;Node&gt; queue = new LinkedList&lt;&gt;();\nqueue.add(parent);\nint[] distances = new int[graph.nodes.size()];\nArrays.fill(distances, -1);\ndistances[parent.id] = 0;\n\nwhile (!queue.isEmpty()) {\n    Node node = queue.poll();\n    for (Edge edge : node.connections) {\n        if (distances[edge.end.id] == -1) {\n            queue.add(edge.end);\n            distances[edge.end.id] = distances[node.id] + 1;\n        }\n    }\n}\n</code></pre>"},{"location":"#how-to-detect-a-cycle-in-a-directed-graph","title":"How to detect a cycle in a directed graph","text":"<p>Using DFS by marking the visited nodes, there is a cycle if a visited node is also part of the current stack</p> <p>The stack can be managed as a boolean array</p> <pre><code>boolean isCyclic(DirectedGraph g) {\n    boolean[] visited = new boolean[g.size()];\n    boolean[] stack = new boolean[g.size()];\n\n    for (int i = 0; i &lt; g.size(); i++) {\n        if (isCyclic(g, i, visited, stack)) {\n            return true;\n        }\n    }\n    return false;\n}\n\nboolean isCyclic(DirectedGraph g, int node, boolean[] visited, boolean[] stack) {\n    if (stack[node]) {\n        return true;\n    }\n\n    if (visited[node]) {\n        return false;\n    }\n\n    stack[node] = true;\n    visited[node] = true;\n\n    List&lt;DirectedGraph.Edge&gt; edges = g.getEdges(node);\n    for (DirectedGraph.Edge edge : edges) {\n        int destination = edge.destination;\n        if (isCyclic(g, destination, visited, stack)) {\n            return true;\n        }\n    }\n\n    // Backtrack\n    stack[node] = false;\n\n    return false;\n}\n</code></pre>"},{"location":"#how-to-detect-a-cycle-in-an-undirected-graph","title":"How to detect a cycle in an undirected graph","text":"<p>Using DFS</p> <p>Idea: for every visited vertex v, if there is an adjacent u such that u is already visited and u is not the parent of v, then there is a cycle</p> <pre><code>public boolean isCyclic(UndirectedGraph g) {\n    boolean[] visited = new boolean[g.size()];\n    for (int i = 0; i &lt; g.size(); i++) {\n        if (!visited[i]) {\n            if (isCyclic(g, i, visited, -1)) {\n                return true;\n            }\n        }\n    }\n    return false;\n}\n\nprivate boolean isCyclic(UndirectedGraph g, int v, boolean[] visited, int parent) {\n    visited[v] = true;\n\n    List&lt;UndirectedGraph.Edge&gt; edges = g.getEdges(v);\n    for (UndirectedGraph.Edge edge : edges) {\n        if (!visited[edge.destination]) {\n            if (isCyclic(g, edge.destination, visited, v)) {\n                return true;\n            }\n        } else if (edge.destination != parent) {\n            return true;\n        }\n    }\n    return false;\n}\n</code></pre>"},{"location":"#how-to-name-a-graph-with-directed-edges-and-without-cycle","title":"How to name a graph with directed edges and without cycle","text":"<p>Directed Acyclic Graph (DAG)</p>"},{"location":"#how-to-name-a-graph-with-few-edges-and-with-many-edges","title":"How to name a graph with few edges and with many edges","text":"<p>Sparse: few edges</p> <p>Dense: many edges</p>"},{"location":"#how-to-name-the-number-of-edges","title":"How to name the number of edges","text":"<p>Degree of a vertex</p>"},{"location":"#how-to-represent-the-edges-of-a-graph-structure-and-complexity","title":"How to represent the edges of a graph (structure and complexity)","text":"<ol> <li> <p>Using an adjacency matrix: two-dimensional array of boolean with a[i][j] is true if there is an edge between node i and j</p> </li> <li> <p>Time complexity: O(1)</p> </li> <li>Space complexity: O(v\u00b2) with v the number of vertices</li> </ol> <p>Problem: - If graph is undirected: half of the space is useless - If graph is sparse, we still have to consume O(v\u00b2) space</p> <ol> <li> <p>Using an adjacency list: array (or map) of linked list with a[i] represents the edges for the node i</p> </li> <li> <p>Time complexity: O(d) with d the degree of a vertex</p> </li> <li>Space complexity: O(2*e) with e the number of edges</li> </ol>"},{"location":"#topological-sort-complexity","title":"Topological sort complexity","text":"<p>Time and space: O(v + e)</p>"},{"location":"#topological-sort-technique","title":"Topological sort technique","text":"<p>If there is an edge from U to V, then U &lt;= V</p> <p>Possible only if the graph is a DAG</p> <p>Algo: - Create a graph representation (adjacency list) and an in degree counter (Map) - Zero them for each vertex - Fill the adjacency list and the in degree counter for each edge - Add in a queue each vertex whose in degree count is 0 (source vertex with no parent) - While the queue is not empty, poll a vertex from it then decrement the in degree of its children (no removal) <p>To check if there is a cycle, we must compare the size of the produced array to the number of vertices</p> <pre><code>List&lt;Integer&gt; sort(int vertices, int[][] edges) {\n    if (vertices == 0) {\n        return Collections.EMPTY_LIST;\n    }\n\n    List&lt;Integer&gt; sorted = new ArrayList&lt;&gt;(vertices);\n    // Adjacency list graph\n    Map&lt;Integer, List&lt;Integer&gt;&gt; graph = new HashMap&lt;&gt;();\n    // Count of incoming edges for each vertex\n    Map&lt;Integer, Integer&gt; inDegree = new HashMap&lt;&gt;();\n\n    for (int i = 0; i &lt; vertices; i++) {\n        inDegree.put(i, 0);\n        graph.put(i, new LinkedList&lt;&gt;());\n    }\n\n    // Init graph and inDegree\n    for (int[] edge : edges) {\n        int parent = edge[0];\n        int child = edge[1];\n\n        graph.get(parent).add(child);\n        inDegree.put(child, inDegree.get(child) + 1);\n    }\n\n    // Create a source queue and add each source (a vertex whose inDegree count is 0)\n    Queue&lt;Integer&gt; sources = new LinkedList&lt;&gt;();\n    for (Map.Entry&lt;Integer, Integer&gt; entry : inDegree.entrySet()) {\n        if (entry.getValue() == 0) {\n            sources.add(entry.getKey());\n        }\n    }\n\n    while (!sources.isEmpty()) {\n        int vertex = sources.poll();\n        sorted.add(vertex);\n\n        // For each vertex, we will decrease the inDegree count of its children\n        List&lt;Integer&gt; children = graph.get(vertex);\n        for (int child : children) {\n            inDegree.put(child, inDegree.get(child) - 1);\n            if (inDegree.get(child) == 0) {\n                sources.add(child);\n            }\n        }\n    }\n\n    // Topological sort is not possible as the graph has a cycle\n    if (sorted.size() != vertices) {\n        return new ArrayList&lt;&gt;();\n    }\n\n    return sorted;\n}\n</code></pre>"},{"location":"#travelling-salesman-problem","title":"Travelling salesman problem","text":"<p>Find the shortest possible route that visits every city (vertex) exactly once</p> <p>Possible solutions: - Greedy: nearest neighbour - Dynamic programming: compute optimal solution for a path of length n by using information already known for partial tours of length n-1 (time complexity: n^2 * 2^n)</p>"},{"location":"#two-types-of-graphs","title":"Two types of graphs","text":"<p>Directed graph (with directed edges)</p> <p>Undirected graph (with undirected edges)</p>"},{"location":"#greedy","title":"Greedy","text":""},{"location":"#best-first-search-algorithm_1","title":"Best-first search algorithm","text":"<p>Greedy solution (non-complete) to find the shortest path to a target node</p> <p>Algorithm: - Put initial state in a priority queue - While target not reached: poll an element and inserts all neighbours</p> <p>Priority is computed using the evaluation function: f(n) = h where h is an heuristic (local cost to visit a node)</p>"},{"location":"#greedy-algorithm","title":"Greedy algorithm","text":"<p>Algorithm paradigm of making the locally optimal choice at each stage using a heuristic function</p> <p>A locally optimal function does not necesseraly mean to not have a global context for taking a decision</p> <p>Never reconsider a choice (main difference with dynamic programming)</p> <p>Solution found may not be the most optimal one</p>"},{"location":"#greedy-algorithm-structure","title":"Greedy algorithm: structure","text":"<p>Often, the global context is spread into a priority queue</p>"},{"location":"#greedy-technique","title":"Greedy technique","text":"<p>Identify an optimal subproblem or substructure in the problem and determine how to reach it</p> <p>Focus on what you have now (don't think about what comes next)</p> <p>We may want to apply the traversal technique to have a global context for the identification part (a map of letters/positions etc.)</p>"},{"location":"#technique-optimization-problems-requiring-a-min-or-max","title":"Technique - Optimization problems requiring a min or max","text":"<p>Greedy technique</p>"},{"location":"#hash-table","title":"Hash Table","text":""},{"location":"#hash-table-complexity-search-insert-delete_1","title":"Hash table complexity: search, insert, delete","text":"<p>All: amortized O(1), worst O(n)</p>"},{"location":"#hash-table-implementation","title":"Hash table implementation","text":"<ul> <li>Array of linked list</li> <li>Hash code function to give the array index</li> </ul> <p>Resize the array when a threshold is reached</p> <p>If extreme nonuniform distribution, could be replaced by array of BST</p>"},{"location":"#heap","title":"Heap","text":""},{"location":"#binary-heap-min-heap-or-max-heap-complexity-insert-get-min-max-delete-min-max_1","title":"Binary heap (min-heap or max-heap) complexity: insert, get min (max), delete min (max)","text":"<p>Insert: O(log (n))</p> <p>Get min (max): O(1)</p> <p>Delete min: O(log n)</p>"},{"location":"#binary-heap-min-heap-or-max-heap-data-structure-used-for-the-implementation","title":"Binary heap (min-heap or max-heap) data structure used for the implementation","text":"<p>Using an array</p> <p>If children at index i: - Left children: 2 * i + 1 - Right children: 2 * i + 2 - Parent: (i - 1) / 2</p>"},{"location":"#binary-heap-min-heap-or-max-heap-definition","title":"Binary heap (min-heap or max-heap) definition","text":"<p>A binary heap is a a complete binary tree with min-heap or max-heap property ordering. Also called min heap or max heap.</p> <p>Min heap: each node smaller than its children, min value element at the root.</p> <p>Two operations: insert(), getMin()</p> <p>Difference BST: in a BST, each smaller element is on the left and greater element on the right, here a smaller element can be found on the left or the right side.</p>"},{"location":"#binary-heap-min-heap-or-max-heap-delete-min","title":"Binary heap (min-heap or max-heap) delete min","text":"<p>Replace min element (root) with the last node (left-most, lowest-level node because a binary heap is a complete binary tree)</p> <p>If violations, swap with the smallest child (level by level)</p>"},{"location":"#binary-heap-min-heap-or-max-heap-insert-algorithm","title":"Binary heap (min-heap or max-heap) insert algorithm","text":"<p>Insert node at the end (left-most spot because a binary heap is a complete binary tree)</p> <p>If violations, swap with parents until no more violation</p>"},{"location":"#binary-heap-min-heap-or-max-heap-use-cases","title":"Binary heap (min-heap or max-heap) use-cases","text":"<p>Priority queue</p>"},{"location":"#comparator-implementation-to-order-two-integers_1","title":"Comparator implementation to order two integers","text":"<p>Ordering, min-heap: (a, b) -&gt; a - b</p> <p>Reverse ordering, max-heap: (a, b) -&gt; b - a</p>"},{"location":"#convert-an-array-into-a-binary-heap-in-place","title":"Convert an array into a binary heap in place","text":"<p>For i from 0 to n-1, swap recursively element a[i] until min/max heap violation on its node</p>"},{"location":"#find-the-median-of-a-stream-of-numbers-2-methods-insertint-and-int-findmedian","title":"Find the median of a stream of numbers, 2 methods insert(int) and int findMedian()","text":"<p>Solution: two heap technique</p> <p>Keep two heaps and maintain the balance by transfering an element from one heap to another if not balanced</p> <p>Return the median (difference if even or odd)</p> <pre><code>// First half\nPriorityQueue&lt;Integer&gt; maxHeap = new PriorityQueue&lt;&gt;((a, b) -&gt; b - a);\n// Second half\nPriorityQueue&lt;Integer&gt; minHeap = new PriorityQueue&lt;&gt;();\n\npublic void insertNum(int n) {\n    // First element\n    if (minHeap.isEmpty()) {\n        minHeap.add(n);\n        return;\n    }\n\n    // Insert into min or max heap\n    Integer minSecondHalf = minHeap.peek();\n    if (n &gt;= minSecondHalf) {\n        minHeap.add(n);\n    } else {\n        maxHeap.add(n);\n    }\n\n    // Is balanced?\n    if (minHeap.size() &gt; maxHeap.size() + 1) {\n        maxHeap.add(minHeap.poll());\n    } else if (maxHeap.size() &gt; minHeap.size() + 1) {\n        minHeap.add(maxHeap.poll());\n    }\n}\n\npublic double findMedian() {\n    // Even\n    if (minHeap.size() == maxHeap.size()) {\n        return (double) (minHeap.peek() + maxHeap.peek()) / 2;\n    }\n\n    // Odd\n    if (minHeap.size() &gt; maxHeap.size()) {\n        return minHeap.peek();\n    }\n    return maxHeap.peek();\n}\n</code></pre>"},{"location":"#given-an-unsorted-array-of-numbers-find-the-k-largest-numbers-in-it","title":"Given an unsorted array of numbers, find the K largest numbers in it","text":"<p>Solution: using a min heap but we keep only K elements in it</p> <pre><code>public static List&lt;Integer&gt; findKLargestNumbers(int[] nums, int k) {\n    PriorityQueue&lt;Integer&gt; minHeap = new PriorityQueue&lt;&gt;();\n\n    // Put the first K numbers\n    for (int i = 0; i &lt; k; i++) {\n        minHeap.add(nums[i]);\n    }\n\n    // Iterate on the rest of the array\n    // Check whether the current element is bigger than the smallest one\n    for (int i = k; i &lt; nums.length; i++) {\n        if (nums[i] &gt; minHeap.peek()) {\n            minHeap.poll();\n            minHeap.add(nums[i]);\n        }\n    }\n\n    return toList(minHeap);\n}\n\npublic static List&lt;Integer&gt; toList(PriorityQueue&lt;Integer&gt; minHeap) {\n    List&lt;Integer&gt; list = new ArrayList&lt;&gt;(minHeap.size());\n    while (!minHeap.isEmpty()) {\n        list.add(minHeap.poll());\n    }\n\n    return list;\n}\n</code></pre> <p>Space complexity: O(k)</p>"},{"location":"#heapsort-algorithm","title":"Heapsort algorithm","text":"<ul> <li>Build a max heap from the array</li> <li>For i from n-1 to 0:</li> <li>Swap the largest element (at index 0) with i</li> <li>Heapify the remaining elements (0.. i -1) by putting the root element at its correct position (keep swapping element with biggest child until there is a max heap violation on a node)</li> </ul>"},{"location":"#is-binary-heap-stable","title":"Is binary heap stable?","text":"<p>Stable</p>"},{"location":"#time-complexity-to-build-a-binary-heap_1","title":"Time complexity to build a binary heap","text":"<p>O(n)</p>"},{"location":"#two-heaps-technique","title":"Two heaps technique","text":"<p>Keep two heaps: - A max heap for the first half - Then a min heap for the second half</p> <p>May be required to balance them to have at most a difference in terms of size of 1</p>"},{"location":"#why-binary-heap-over-bst-for-priority-queue","title":"Why binary heap over BST for priority queue?","text":"<p>BST needs an extra pointer to the min or max value (otherwise finding the min or max is O(log n))</p> <p>Implemented using an array: faster in practice (better locality, more cache friendly)</p> <p>Building a binary heap is O(n), instead of O(n log n) for a BST</p>"},{"location":"#linked-list","title":"Linked List","text":""},{"location":"#algorithm-to-reverse-a-linked-list","title":"Algorithm to reverse a linked list","text":"<pre><code>public ListNode reverse(ListNode head) {\n    ListNode previous = null;\n    ListNode current = head;\n\n    while (current != null) {\n        // Keep temporary next node\n        ListNode next = current.next;\n        // Change link\n        current.next = previous;\n        // Move previous and current\n        previous = current;\n        current = next;\n    }\n\n    return previous;\n}\n</code></pre>"},{"location":"#doubly-linked-list","title":"Doubly linked list","text":"<p>Each node contains a pointer to the previous and the next node</p>"},{"location":"#doubly-linked-list-complexity-access-insert-delete_1","title":"Doubly linked list complexity: access, insert, delete","text":"<p>Access: O(n)</p> <p>Insert: O(1)</p> <p>Delete: O(1)</p>"},{"location":"#get-the-middle-of-a-linked-list","title":"Get the middle of a linked list","text":"<p>Using the runner technique</p>"},{"location":"#iterate-over-two-linked-lists","title":"Iterate over two linked lists","text":"<pre><code>while (l1 != null || l2 != null) {\n\n}\n</code></pre>"},{"location":"#linked-list-complexity-access-insert-delete_1","title":"Linked list complexity: access, insert, delete","text":"<p>Access: O(n)</p> <p>Insert: O(1)</p> <p>Delete: O(1)</p>"},{"location":"#linked-list-questions-prerequisite","title":"Linked list questions prerequisite","text":"<p>Single or doubly linked list?</p>"},{"location":"#queue-implementations-and-insertdelete-complexity","title":"Queue implementations and insert/delete complexity","text":"<ol> <li>Linked list with pointers on head and tail</li> </ol> <p>Insert: O(1)</p> <p>Delete: O(1)</p> <ol> <li>Circular buffer if queue has a fixed size using a read and write pointer</li> </ol> <p>Insert: O(1)</p> <p>Delete: O(1)</p>"},{"location":"#ring-buffer-or-circular-buffer-structure","title":"Ring buffer (or circular buffer) structure","text":"<p>Data structure using a single, fixed-sized buffer as if it were connected end-to-end</p>"},{"location":"#what-if-we-need-to-iterate-backwards-on-a-singly-linked-list-in-constant-space-without-mutating-the-input","title":"What if we need to iterate backwards on a singly linked list in constant space without mutating the input?","text":"<p>Reverse the liked list (or a subpart only), implement the algo then reverse it again to the initial state</p>"},{"location":"#math","title":"Math","text":""},{"location":"#a-a-property","title":"a = a property","text":"<p>Reflexive</p>"},{"location":"#if-a-b-and-b-c-then-a-c-property","title":"If a = b and b = c then a = c property","text":"<p>Transitive</p>"},{"location":"#if-a-b-then-b-a-property","title":"If a = b then b = a property","text":"<p>Symmetric</p>"},{"location":"#logarithm-definition","title":"Logarithm definition","text":"<p>Inverse function to exponentiation</p> <ul> <li>log2(1) = 0</li> <li>log2(2) = 1</li> <li>log2(4) = 2</li> <li>log2(8) = 3</li> <li>log2(16) = 4</li> <li>etc.</li> </ul>"},{"location":"#median-of-a-sorted-array","title":"Median of a sorted array","text":"<p>If odd: middle value</p> <p>If even: average of the two middle values (1, 2, 3, 4 =&gt; (2 + 3) / 2 = 2.5)</p>"},{"location":"#n-choose-k-problems","title":"n-choose-k problems","text":"<p>From a set of n items, choose k items with 0 &lt;= k &lt;= n</p> <p>P(n, k)</p> <p>Order matters: n! / (n - k)! // How many permutations</p> <p>Order does not matter: n! / ((n - k)! k!) // How many combinations</p>"},{"location":"#probability-pa-b-inter","title":"Probability: P(a \u2229 b) // inter","text":"<p>P(a \u2229 b) = P(a) * P(b)</p>"},{"location":"#probability-pa-b-union","title":"Probability: P(a \u222a b) // union","text":"<p>P(a \u222a b) = P(a) + P(b) - P(a \u2229 b)</p>"},{"location":"#probability-pba-probability-of-a-knowing-b","title":"Probability: Pb(a) // probability of a knowing b","text":"<p>Pb(a) = P(a \u2229 b) / P(b)</p>"},{"location":"#queue","title":"Queue","text":""},{"location":"#dequeue-data-structure","title":"Dequeue data structure","text":"<p>Double ended queue for which elements can be added or removed from either the front (head) or the back (tail)</p>"},{"location":"#queue_1","title":"Queue","text":"<p>FIFO (First In First Out)</p>"},{"location":"#queue-implementations-and-insertdelete-complexity_1","title":"Queue implementations and insert/delete complexity","text":"<ol> <li>Linked list with pointers on head and tail</li> </ol> <p>Insert: O(1)</p> <p>Delete: O(1)</p> <ol> <li>Circular buffer if queue has a fixed size using a read and write pointer</li> </ol> <p>Insert: O(1)</p> <p>Delete: O(1)</p>"},{"location":"#recursion","title":"Recursion","text":""},{"location":"#how-to-handle-a-recursive-function-that-need-to-return-a-list","title":"How to handle a recursive function that need to return a list","text":"<p>Input: - Result List - Current iteration element</p> <p>Output: void</p> <pre><code>void f(List&lt;String&gt; result, String current) {\n    // Do something\n    result.add(...);\n}\n</code></pre>"},{"location":"#how-to-handle-a-recursive-function-that-need-to-return-a-maximum-value","title":"How to handle a recursive function that need to return a maximum value","text":"<p>Implementation: return max(f(a), f(b))</p>"},{"location":"#loop-inside-of-a-recursive-function","title":"Loop inside of a recursive function?","text":"<p>Might be a code smell. The iteration is already brought by the recursion itself.</p>"},{"location":"#sort","title":"Sort","text":""},{"location":"#bubble-sort-algorithm","title":"Bubble sort algorithm","text":"<p>Walk through a collection and compares 2 elements at a time</p> <p>If they are out of order, swap them</p> <p>Continue until the entire collection is sorted</p>"},{"location":"#bubble-sort-complexity-and-stability_1","title":"Bubble sort complexity and stability","text":"<p>Time: O(n\u00b2)</p> <p>Space: O(1)</p> <p>Stable</p>"},{"location":"#counting-sort-complexity-stability-use-case_1","title":"Counting sort complexity, stability, use case","text":"<p>Time complexity: O(n + k) // n is the number of elements, k is the range (the maximum element)</p> <p>Space complexity: O(k)</p> <p>Stable</p> <p>Use case: known and small range of possible integers</p>"},{"location":"#counting-sort-algorithm","title":"Counting sort algorithm","text":"<p>If range r is known</p> <p>1) Create an array of size r where each a[i] represents the number of occurences of i</p> <p>2) Modify the array to store the cumulative sum (if a=[1, 3, 0, 2] =&gt; [1, 4, 4, 6])</p> <p>3) Right shift the array with a backward iteration (element at index 0 is 0 =&gt; [0, 1, 4, 4])    Now a[i] represents the first index of i if array was sorted</p> <p>4) Create the sorted array by filling the elements from their first index</p>"},{"location":"#heapsort-algorithm_1","title":"Heapsort algorithm","text":"<ul> <li>Build a max heap from the array</li> <li>For i from n-1 to 0:</li> <li>Swap the largest element (at index 0) with i</li> <li>Heapify the remaining elements (0.. i -1) by putting the root element at its correct position (keep swapping element with biggest child until there is a max heap violation on a node)</li> </ul>"},{"location":"#heapsort-complexity-stability-use-case_1","title":"Heapsort complexity, stability, use case","text":"<p>Time: Theta(n log n)</p> <p>Space: O(1)</p> <p>Unstable</p> <p>Use case: space constrained environment with O(n log n) time guarantee</p> <p>Yet, not stable and not cache friendly</p>"},{"location":"#insertion-sort-algorithm","title":"Insertion sort algorithm","text":"<p>From i to 0..n, insert a[i] to its correct position to the left (0..i)</p> <p>Used by humans</p>"},{"location":"#insertion-sort-complexity-stability-use-case_1","title":"Insertion sort complexity, stability, use case","text":"<p>Time: O(n\u00b2)</p> <p>Space: O(1)</p> <p>Stable</p> <p>Use case: partially sorted structure</p>"},{"location":"#mergesort-algorithm","title":"Mergesort algorithm","text":"<p>Splits a collection into 2 halves, sort the 2 halves (recursive call) then merge them together to form one sorted collection</p> <pre><code>void mergeSort(int[] a) {\n    int[] helper = new int[a.length];\n    mergeSort(a, helper, 0, a.length - 1);\n}\n\nvoid mergeSort(int a[], int helper[], int lo, int hi) {\n    if (lo &lt; hi) {\n        int mid = (lo + hi) / 2;\n\n        mergeSort(a, helper, lo, mid);\n        mergeSort(a, helper, mid + 1, hi);\n        merge(a, helper, lo, mid, hi);\n    }\n}\n\nprivate void merge(int[] a, int[] helper, int lo, int mid, int hi) {\n    // Copy into helper\n    for (int i = lo; i &lt;= hi; i++) {\n        helper[i] = a[i];\n    }\n\n    int p1 = lo; // Pointer on the first half\n    int p2 = mid + 1; // Pointer on the second half\n    int index = lo; // Index of a\n\n    // Copy the smallest values from either the left or the right side back to the original array\n    while (p1 &lt;= mid &amp;&amp; p2 &lt;= hi) {\n        if (helper[p1] &lt;= helper[p2]) {\n            a[index] = helper[p1];\n            p1++;\n        } else {\n            a[index] = helper[p2];\n            p2++;\n        }\n        index++;\n    }\n\n    // Copy the eventual rest of the left side of the array into the target array\n    while (p1 &lt;= mid) {\n        a[index] = helper[p1];\n        index++;\n        p1++;\n    }\n}\n</code></pre>"},{"location":"#further-reading_2","title":"Further Reading","text":"<ul> <li>Making Sense of Merge Sort - Part 1 by Vaidehi Joshi</li> <li>Making Sense of Merge Sort - Part 2 by Vaidehi Joshi</li> </ul>"},{"location":"#mergesort-complexity-stability-use-case_1","title":"Mergesort complexity, stability, use case","text":"<p>Time: Theta(n log n)</p> <p>Space: O(n)</p> <p>Stable</p> <p>Use case: good worst case time complexity and stable, good with linked list</p>"},{"location":"#quicksort-algorithm","title":"Quicksort algorithm","text":"<p>Sort a collection by repeatedly choosing a pivot and partitioning the collection around it (smaller before, larger after)</p> <p>Here the pivot will be the last element of the subarray</p> <p>In an ideal world, the pivot would be the middle element so that we partition the array in two subsets of equal size</p> <p>The worst case is to find a pivot element at the top left or top right index of the subarray</p> <pre><code>void quickSort(int[] a) {\n    quickSort(a, 0, a.length - 1);\n}\n\nvoid quickSort(int a[], int lo, int hi) {\n    if (lo &lt; hi) {\n        int pivot = partition(a, lo, hi);\n        quickSort(a, lo, pivot - 1);\n        quickSort(a, pivot + 1, hi);\n    }\n}\n\n// Returns an index so that all element before that index are smaller\n// And all element after are bigger\nint partition(int a[], int lo, int hi) {\n    int pivot = a[hi];\n    int pivotIndex = lo; // Will represent the pivot index\n\n    // Iterate using the two pointers technique\n    for (int i = lo; i &lt; hi; i++) {\n        // If the current index is smaller, swap and increment pivot index\n        if (a[i] &lt;= pivot) {\n            swap(a, pivotIndex++, i);\n        }\n    }\n\n    swap(a, pivotIndex, hi);\n    return pivotIndex;\n}\n</code></pre>"},{"location":"#quicksort-complexity-stability-use-case_1","title":"Quicksort complexity, stability, use case","text":"<p>Time: best and average O(n log n), worst O(n\u00b2) if the array is already sorted in ascending or descending order</p> <p>Space: O(log n) // In-place sorting algorithm</p> <p>Not stable</p> <p>Use case: in practice, quicksort is often faster than merge sort due to better locality (not applicable with linked list so in this case we prefer mergesort)</p>"},{"location":"#radix-sort-algorithm","title":"Radix sort algorithm","text":"<p>Sort by applying counting sort on one digit at a time (least to most significant) Each new level must be stable (if equals, keep the order of the previous level)</p> <p>Example:</p> <ul> <li>53, 89, 150, 36, 633, 233</li> <li>Counting sort on digit 0 =&gt; 150, 53, 633, 36, 89</li> <li>Counting sort on digit 1 =&gt; 633, 233, 36, 150, 53, 89</li> <li>Counting sort on digit 2 =&gt; 36, 53, 89, 150, 233, 633 // If does not exist (like 36) it is replaced by 0</li> </ul>"},{"location":"#radix-sort-complexity-stability-use-case_1","title":"Radix sort complexity, stability, use case","text":"<p>Time complexity: O(nk) // n is the number of elements, k is the maximum number of digits for a number</p> <p>Space complexity: O(k)</p> <p>Stable</p> <p>Use case: if k &lt; log(n) (for example 1M of elements from 0..1000 as 4 &lt; log(1M))</p>"},{"location":"#selection-sort-algorithm","title":"Selection sort algorithm","text":"<p>From i to 0..n, find repeatedly the min element then swap it with i</p>"},{"location":"#selection-sort-complexity_1","title":"Selection sort complexity","text":"<p>Time: Theta(n\u00b2)</p> <p>Space: O(1)</p>"},{"location":"#shuffling-an-array","title":"Shuffling an array","text":"<p>Fisher-Yates shuffle algorithm: - Iterate over each element (i) - Pick a random index (from 0 to i included) and swap with the current element</p>"},{"location":"#stack","title":"Stack","text":""},{"location":"#stack_1","title":"Stack","text":"<p>LIFO (Last In First Out)</p>"},{"location":"#stack-implementations-and-insertdelete-complexity_1","title":"Stack implementations and insert/delete complexity","text":"<ul> <li>Linked list with a pointer on the head</li> </ul> <p>Insert: O(1)</p> <p>Delete: O(1)</p> <ul> <li>Array</li> </ul> <p>Insert: O(n), amortized time O(1)</p> <p>Delete: O(1)</p>"},{"location":"#string","title":"String","text":""},{"location":"#first-check-to-test-if-two-strings-are-a-permutation-or-a-rotation-of-each-other","title":"First check to test if two strings are a permutation or a rotation of each other","text":"<p>Same length</p>"},{"location":"#how-to-print-all-the-possible-permutations-of-a-string","title":"How to print all the possible permutations of a string","text":"<p>Recursion with backtracking</p> <pre><code>void permute(String s) {\n    permute(s, 0);\n}\n\nvoid permute(String s, int index) {\n    if (index == s.length() - 1) {\n        System.out.println(s);\n        return;\n    }\n\n    for (int i = index; i &lt; s.length(); i++) {\n        s = swap(s, index, i);\n        permute(s, index + 1);\n        s = swap(s, index, i);\n    }\n}\n</code></pre>"},{"location":"#rabin-karp-substring-search","title":"Rabin-Karp substring search","text":"<p>Searching a substring s in a string b takes O(s(b-s)) time</p> <p>Trick: compute the hash of each substring s</p> <p>Sliding window of size s</p> <p>Time complexity: O(b)</p> <p>If hash matches, check if the string are equals (as two different strings can have the same hash)</p>"},{"location":"#string-permutation-vs-rotation","title":"String permutation vs rotation","text":"<p>Permutation: contains the same characters in an order that can be different (abdc and dabc)</p> <p>Rotation: rotates according to a pivot</p>"},{"location":"#string-questions-prerequisite","title":"String questions prerequisite","text":"<p>Case sensitive?</p> <p>Encoding?</p>"},{"location":"#technique","title":"Technique","text":"<p>14 Patterns to Ace Any Coding Interview Question by Fahim ul Haq</p>"},{"location":"#01-knapsack-brute-force-technique","title":"0/1 Knapsack brute force technique","text":"<p>Recursive approach: solve f(c, i) with c is the remaining capacity and i is th current item index At each level, we branch with the item at index i (if enough capacity) and without it</p> <pre><code>public int knapsack(int[] profits, int[] weights, int c) {\n    return knapsack(profits, weights, c, 0, 0);\n}\n\npublic int knapsack(int[] profits, int[] weights, int c, int i, int sum) {\n    if (i == profits.length || c &lt;= 0) {\n        return sum;\n    }\n\n    // Not\n    int sum1 = knapsack(profits, weights, c, i + 1, sum);\n\n    // With\n    int sum2 = 0;\n    if (weights[i] &lt;= c) {\n        sum2 = knapsack(profits, weights, c - weights[i], i + 1, sum + profits[i]);\n    }\n\n    return Math.max(sum1, sum2);\n}\n</code></pre>"},{"location":"#01-knapsack-memoization-technique","title":"0/1 Knapsack memoization technique","text":"<p>Memoization: store a[c][i] (c is the remaining capacity, i is the current item index)</p> <p>As we need to store the 0 capacity, we have to init the array this way:</p> <p>int[][] a = new int[c + 1][n] // n is the number of items</p> <p>Time and space complexity: O(n * c)</p> <pre><code>public int knapsack(int[] profits, int[] weights, int capacity) {\n    // Capacity from 1 to n\n    Integer[][] a = new Integer[capacity][profits.length];\n    return knapsack(profits, weights, capacity, 0, 0, a);\n}\n\npublic int knapsack(int[] profits, int[] weights, int capacity, int i, int sum, Integer[][] a) {\n    if (i == profits.length || capacity == 0) {\n        return sum;\n    }\n\n    // If value already exists, return \n    if (a[capacity - 1][i] != null) {\n        return a[capacity][i];\n    }\n\n    // With\n    int sum1 = knapsack(profits, weights, capacity, i + 1, sum, a);\n    // Without\n    int sum2 = 0;\n    if (weights[i] &lt;= capacity) {\n        sum2 = knapsack(profits, weights, capacity - weights[i], i + 1, sum + profits[i], a);\n    }\n\n    a[capacity - 1][i] = Math.max(sum1, sum2);\n    return a[capacity - 1][i];\n}\n</code></pre>"},{"location":"#01-knapsack-tabulation-technique","title":"0/1 Knapsack tabulation technique","text":"<p>Two dimensional array: a[n + 1][c + 1] // n the number of items and c the max capacity</p> <p>First row and first column are set to 0</p> <p>a[row][col] represent the max profit with items 1..row at capacity col</p> <p>remainingWeight = col - itemWeight // col: current max capacity</p> <p>a[row][col] = max(a[row - 1][col], itemValue + a[row - 1][remainingWeight]) // max between item not selected and item selected + max remaining weight</p> <p>If remainingWeight &lt; 0, we can't chose the item so a[row][col] = a[row - 1][col]</p> <p>Return last element of the array</p> <pre><code>public int solveKnapsack(int[] profits, int[] weights, int capacity) {\n    int[][] a = new int[profits.length + 1][capacity + 1];\n\n    for (int row = 1; row &lt; profits.length + 1; row++) {\n        int value = profits[row - 1];\n        int weight = weights[row - 1];\n        for (int col = 1; col &lt; capacity + 1; col++) {\n            int remainingWeight = col - weight;\n            if (remainingWeight &lt; 0) {\n                a[row][col] = a[row - 1][col];\n            } else {\n                a[row][col] = Math.max(\n                        a[row - 1][col],\n                        value + a[row - 1][remainingWeight]\n                );\n            }\n        }\n    }\n\n    return a[profits.length][capacity];\n}\n</code></pre> <p>If we need to compute a result like \"determine if a subset exists\" that return a boolean, the array type is boolean[][]</p> <p>As we are only interested in the previous row, we can also use an int[2][n] array</p>"},{"location":"#backtracking-technique","title":"Backtracking technique","text":"<p>Solution for solving a problem recursively</p> <p>Loop: - apply() // Apply a change - try() // Try a solution - reverse() // Reverse apply</p>"},{"location":"#cyclic-sort-technique","title":"Cyclic sort technique","text":"<p>Iterate over each number of an array and swap it to its correct position</p> <p>At the end, we may iterate on the array to check which number is not at its correct position</p> <p>If numbers are not within the 1 to n range, we can simply drop them</p> <p>Alternative: marker technique (mark a result by setting a[i] to negative for example)</p>"},{"location":"#greedy-technique_1","title":"Greedy technique","text":"<p>Identify an optimal subproblem or substructure in the problem and determine how to reach it</p> <p>Focus on what you have now (don't think about what comes next)</p> <p>We may want to apply the traversal technique to have a global context for the identification part (a map of letters/positions etc.)</p>"},{"location":"#k-way-merge-technique","title":"K-way merge technique","text":"<p>Given K sorted array, technique to perform a sorted traversal of all the elements of all arrays</p> <ul> <li>First, push the first element of each array in a min heap</li> <li>While min heap not empty, take min element and push the next element of the same array</li> </ul> <p>We need to keep track of which structure the min element come from (tracking the array index or taking the next node if it's a linked list)</p>"},{"location":"#runner-technique","title":"Runner technique","text":"<p>Iterate over the linked list with two pointers simultaneously either with: - One ahead by a fixed amount - One faster</p> <p>This technique can also be applied on other problems where we need to find a cycle (f(slow) and f(f(fast)) may converge)</p>"},{"location":"#simplification-technique","title":"Simplification technique","text":"<p>Simplify the problem. If solvable, generalize to the initial problem.</p> <p>Example: sort the array first</p>"},{"location":"#sliding-window-technique","title":"Sliding window technique","text":"<p>Range of elements in a specific window size</p> <p>Two pointers left and right: - Move right while condition is valid - Move left if condition is not valid</p>"},{"location":"#subsets-technique","title":"Subsets technique","text":"<p>Technique to find all the possible permutations or combinations</p> <p>Start with an empty set, for each element of the input, add them to all the existing subsets to create new subsets</p> <p>Example: - Given [1, 5, 3] - =&gt; [] // Start - =&gt; [], [1] - =&gt; [], [1], [5], [1,5] - =&gt; [], [1], [5], [1,5], [3], [1,3], [1,5,3]</p> <p>For each level, we iterate from 0 to size // size is the fixed size of the list</p> <pre><code>List&lt;List&lt;Integer&gt;&gt; findSubsets(int[] a) {\n    List&lt;List&lt;Integer&gt;&gt; subsets = new ArrayList&lt;&gt;();\n    // Add subset []\n    subsets.add(new ArrayList&lt;&gt;());\n\n    for (int n : a) {\n        // Fix the current size\n        int size = subsets.size();\n        for (int i = 0; i &lt; size; i++) {\n            // Copy subset\n            ArrayList&lt;Integer&gt; newSubset = new ArrayList&lt;&gt;(subsets.get(i));\n            // Add element\n            newSubset.add(n);\n            subsets.add(newSubset);\n        }\n    }\n\n    return subsets;\n}\n</code></pre>"},{"location":"#technique-dealing-with-cycles-in-a-linked-list-or-an-array","title":"Technique - Dealing with cycles in a linked list or an array","text":"<p>Runner technique</p>"},{"location":"#technique-find-all-the-permutations-or-combinations","title":"Technique - Find all the permutations or combinations","text":"<p>Subsets technique or recursion + backtracking</p>"},{"location":"#technique-find-an-element-in-a-sorted-array-or-linked-list","title":"Technique - Find an element in a sorted array or linked list","text":"<p>Binary search</p>"},{"location":"#technique-find-or-calculate-something-among-all-the-contiguous-subarrays-of-a-given-size","title":"Technique - Find or calculate something among all the contiguous subarrays of a given size","text":"<p>Sliding window technique</p> <p>Example: - Given an array, find the average of all subarrays of size \u2018K\u2019 in it</p>"},{"location":"#technique-find-the-longestshortest-substring-or-subarray","title":"Technique - Find the longest/shortest substring or subarray","text":"<p>Sliding window technique</p> <p>Example: - Longest substring with K distinct characters - Longest substring without repeating characters</p>"},{"location":"#technique-find-the-smallestlargestmedian-element-of-a-set","title":"Technique - Find the smallest/largest/median element of a set","text":"<p>Two heaps technique</p>"},{"location":"#technique-finding-a-certain-element-in-a-linked-list-eg-middle","title":"Technique - Finding a certain element in a linked list (e.g. middle)","text":"<p>Runner technique</p>"},{"location":"#technique-given-a-sorted-array-find-a-set-of-elements-that-fullfill-certain-conditions","title":"Technique - Given a sorted array, find a set of elements that fullfill certain conditions","text":"<p>Two pointers technique</p> <p>Example: - Given a sorted array and a target sum, find a pair in the array whose sum is equal to the given target - Given an array of unsorted numbers, find all unique triplets in it that add up to zero - Comparing strings containing backspaces</p>"},{"location":"#technique-given-an-array-of-size-n-containing-integer-from-1-to-n-eg-with-one-duplicate","title":"Technique - Given an array of size n containing integer from 1 to n (e.g. with one duplicate)","text":"<p>Cyclic sort technique</p>"},{"location":"#technique-given-time-intervals","title":"Technique - Given time intervals","text":"<p>Traversal technique</p> <p>Iterate with two pointers, one over the starts, another one over the ends</p> <p>Handle the element with the lowest value first and generate an event</p> <p>Example: how many rooms for n meetings =&gt; meeting started, meeting started, meeting ended etc.</p>"},{"location":"#technique-how-to-get-the-k-biggestsmallestfrequent-elements","title":"Technique - How to get the K biggest/smallest/frequent elements","text":"<p>Top K elements technique</p>"},{"location":"#technique-optimization-problems-requiring-a-min-or-max_1","title":"Technique - Optimization problems requiring a min or max","text":"<p>Greedy technique</p>"},{"location":"#technique-problems-featuring-a-list-of-sorted-arrays-merge-or-find-the-smallest-element","title":"Technique - Problems featuring a list of sorted arrays (merge or find the smallest element)","text":"<p>K-way merge technique</p>"},{"location":"#technique-scheduling-problem-with-n-tasks-where-each-task-can-have-constraints-to-be-completed-before-others","title":"Technique - Scheduling problem with n tasks where each task can have constraints to be completed before others","text":"<p>Topological sort technique</p>"},{"location":"#technique-situations-like-priority-queue-or-scheduling","title":"Technique - Situations like priority queue or scheduling","text":"<p>Heap data structure</p> <p>Possibly two heaps technique</p>"},{"location":"#top-k-elements-technique-biggest-and-smallest","title":"Top K elements technique (biggest and smallest)","text":"<p>Finding the K biggest elements: - Min heap - Add k elements - Then iterate over the remaining elements, if current &gt; min =&gt; remove min, add current</p> <p>Finding the k smallest elements: - Max heap - Add k elements - Then iterate over the remaining elements, if current &lt; max =&gt; remove max, add current</p>"},{"location":"#topological-sort-technique_1","title":"Topological sort technique","text":"<p>If there is an edge from U to V, then U &lt;= V</p> <p>Possible only if the graph is a DAG</p> <p>Algo: - Create a graph representation (adjacency list) and an in degree counter (Map) - Zero them for each vertex - Fill the adjacency list and the in degree counter for each edge - Add in a queue each vertex whose in degree count is 0 (source vertex with no parent) - While the queue is not empty, poll a vertex from it then decrement the in degree of its children (no removal) <p>To check if there is a cycle, we must compare the size of the produced array to the number of vertices</p> <pre><code>List&lt;Integer&gt; sort(int vertices, int[][] edges) {\n    if (vertices == 0) {\n        return Collections.EMPTY_LIST;\n    }\n\n    List&lt;Integer&gt; sorted = new ArrayList&lt;&gt;(vertices);\n    // Adjacency list graph\n    Map&lt;Integer, List&lt;Integer&gt;&gt; graph = new HashMap&lt;&gt;();\n    // Count of incoming edges for each vertex\n    Map&lt;Integer, Integer&gt; inDegree = new HashMap&lt;&gt;();\n\n    for (int i = 0; i &lt; vertices; i++) {\n        inDegree.put(i, 0);\n        graph.put(i, new LinkedList&lt;&gt;());\n    }\n\n    // Init graph and inDegree\n    for (int[] edge : edges) {\n        int parent = edge[0];\n        int child = edge[1];\n\n        graph.get(parent).add(child);\n        inDegree.put(child, inDegree.get(child) + 1);\n    }\n\n    // Create a source queue and add each source (a vertex whose inDegree count is 0)\n    Queue&lt;Integer&gt; sources = new LinkedList&lt;&gt;();\n    for (Map.Entry&lt;Integer, Integer&gt; entry : inDegree.entrySet()) {\n        if (entry.getValue() == 0) {\n            sources.add(entry.getKey());\n        }\n    }\n\n    while (!sources.isEmpty()) {\n        int vertex = sources.poll();\n        sorted.add(vertex);\n\n        // For each vertex, we will decrease the inDegree count of its children\n        List&lt;Integer&gt; children = graph.get(vertex);\n        for (int child : children) {\n            inDegree.put(child, inDegree.get(child) - 1);\n            if (inDegree.get(child) == 0) {\n                sources.add(child);\n            }\n        }\n    }\n\n    // Topological sort is not possible as the graph has a cycle\n    if (sorted.size() != vertices) {\n        return new ArrayList&lt;&gt;();\n    }\n\n    return sorted;\n}\n</code></pre>"},{"location":"#traversal-technique","title":"Traversal technique","text":"<p>Traverse the input and generate another data structure or optional events</p> <p>Start the problem from this new state</p>"},{"location":"#two-heaps-technique_1","title":"Two heaps technique","text":"<p>Keep two heaps: - A max heap for the first half - Then a min heap for the second half</p> <p>May be required to balance them to have at most a difference in terms of size of 1</p>"},{"location":"#two-pointers-technique","title":"Two pointers technique","text":"<p>Two pointers iterating through the data structure in tandem until one or both pointers hit a certain condition</p> <p>Often useful when structure is sorted. If not sorted, we may want to sort it first.</p> <p>Most of the times (not always): first pointer is at the start, the second pointer is at the end</p> <p>The two pointers can also be on two different ds, still iterating in tandem (e.g. comparing strings containing backspaces)</p> <p>Time complexity is linear</p>"},{"location":"#what-if-we-need-to-iterate-backwards-on-a-singly-linked-list-in-constant-space-without-mutating-the-input_1","title":"What if we need to iterate backwards on a singly linked list in constant space without mutating the input?","text":"<p>Reverse the liked list (or a subpart only), implement the algo then reverse it again to the initial state</p>"},{"location":"#tree","title":"Tree","text":""},{"location":"#2-3-tree","title":"2-3 tree","text":"<p>Self-balanced BST =&gt; O(log n) complexity</p> <p>Either: - 2-node: contains a single value and has two children - 3-node: contains two values and has three children - Leaf: 1 or 2 keys</p> <p>Insert: find proper leaf and insert the value in-place. If the leaf has 3 values (called temporary 4-node), split the node into three 2-node and insert the middle value into the parent.</p>"},{"location":"#avl-tree","title":"AVL tree","text":"<p>If tree is not balanced, rearange the nodes with single or double rotations</p>"},{"location":"#b-tree-complexity-access-insert-delete_1","title":"B-tree complexity: access, insert, delete","text":"<p>All: O(log n)</p>"},{"location":"#b-tree-definition-and-use-case","title":"B-tree: definition and use case","text":"<p>Self-balanced BST =&gt; O(log n) complexity</p> <p>Can have more than two children (generalization of 2-3 tree)</p> <p>Use-case: huge amount of data that cannot fit in main memory but disk space.</p> <p>Height is kept low to reduce the disk accesses.</p> <p>Match how page disk are working</p>"},{"location":"#balanced-binary-tree-definition","title":"Balanced binary tree definition","text":"<p>The balance factor of each node (the difference between the two subtree heights) should never exceed 1</p> <p>Guarantee of O(log n) search</p>"},{"location":"#balanced-bst-use-case-b-tree-red-black-tree-avl-tree","title":"Balanced BST use case: B-tree, Red-black tree, AVL tree","text":"<ul> <li>B-tree: paging from disk (database)</li> <li>Red-black tree: fairly frequents inserts, deletes or retrievals</li> <li>AVL tree: many retrievals, infrequent inserts and deletes</li> </ul>"},{"location":"#bfs-and-dfs-tree-traversal-time-and-space-complexity_1","title":"BFS and DFS tree traversal time and space complexity","text":"<p>BFS: time O(v), space O(v)</p> <p>DFS: time O(v), space O(h) (height of the tree)</p>"},{"location":"#binary-tree-bfs-traversal","title":"Binary tree BFS traversal","text":"<p>Level order traversal (level by level)</p> <p>Iterative algorithm: use a queue, put the root, iterate while queue is not empty</p> <pre><code>Queue&lt;Node&gt; queue = new LinkedList&lt;&gt;();\nqueue.add(root);\n\nwhile(!queue.isEmpty()) {\n    Node node = queue.poll();\n    visit(node);\n\n    if(node.left != null) {\n        queue.add(node.left);\n    }\n    if(node.right != null) {\n        queue.add(node.right);\n    }\n}\n</code></pre>"},{"location":"#binary-tree-definition","title":"Binary tree definition","text":"<p>Tree with each node having up to two children</p>"},{"location":"#binary-tree-dfs-traversal-in-order-pre-order-and-post-order","title":"Binary tree DFS traversal: in-order, pre-order and post-order","text":"<ul> <li>In-order: left-root-right</li> <li>Pre-order: root-left-right</li> <li>Post-order: left-right-root</li> </ul> <p>It's depth first so:</p> <p></p> <ul> <li>In-order: 1, 2, 3, 4, 5, 6, 7</li> <li>Pre-order: 3, 2, 1, 5, 4, 6, 7</li> <li>Post-order: 1, 2, 4, 7, 6, 5, 3</li> </ul>"},{"location":"#binary-tree-complete","title":"Binary tree: complete","text":"<p>Every level of the tree is fully filled, with last level filled from the left to the right</p>"},{"location":"#binary-tree-full","title":"Binary tree: full","text":"<p>Each node has 0 or 2 children</p>"},{"location":"#binary-tree-perfect","title":"Binary tree: perfect","text":"<p>2^l - 1 nodes with l the level: 1, 3, 7, etc. nodes</p> <p>Every level is fully filled</p>"},{"location":"#bst-complexity-access-insert-delete","title":"BST complexity: access, insert, delete","text":"<p>If not balanced O(n)</p> <p>If balanced O(log n)</p>"},{"location":"#bst-definition","title":"BST definition","text":"<p>Binary tree in which every node must fit the property: all left descendents &lt;= n &lt; all right descendents</p> <p>Implementation: optional key, value, left, right</p>"},{"location":"#bst-delete-algo-and-complexity_1","title":"BST delete algo and complexity","text":"<p>Find inorder successor and swap it</p> <p>Average: O(log n)</p> <p>Worst: O(h) if not self-balanced BST, otherwise O(log n)</p>"},{"location":"#bst-insert-algo","title":"BST insert algo","text":"<p>Search for key or value (by recursively going left or right depending on the comparison) then insert a new node or reset the value (no swap)</p> <p>Complexity: worst O(n)</p> <pre><code>public TreeNode insert(TreeNode root, int a) {\n    if (root == null) {\n        return new TreeNode(a);\n    }\n\n    if (root.val &lt;= a) { // Left\n        root.left = insert(root.left, a);\n    } else { // Right\n        root.right = insert(root.right, a);\n    }\n\n    return root;\n}\n</code></pre>"},{"location":"#bst-questions-prerequisite","title":"BST questions prerequisite","text":"<p>Is it a self-balanced BST? (impacts: O(log n) time complexity guarantee)</p>"},{"location":"#complexity-to-create-a-trie_1","title":"Complexity to create a trie","text":"<p>Time and space: O(n * l) with n the number of words and l the longest word length</p>"},{"location":"#complexity-to-insert-a-key-in-a-trie_1","title":"Complexity to insert a key in a trie","text":"<p>Time: O(k) with k the size of the key</p> <p>Space: O(1) iterative, O(k) recursive</p>"},{"location":"#complexity-to-search-for-a-key-in-a-trie_1","title":"Complexity to search for a key in a trie","text":"<p>Time: O(k) with k the size of the key</p> <p>Space: O(1) iterative or O(k) recursive</p>"},{"location":"#given-a-binary-tree-algorithm-to-populate-an-array-to-represent-its-level-by-level-traversal","title":"Given a binary tree, algorithm to populate an array to represent its level-by-level traversal","text":"<p>Solution: BFS by popping only a fixed number of elements (queue.size)</p> <pre><code>public static List&lt;List&lt;Integer&gt;&gt; traverse(TreeNode root) {\n    List&lt;List&lt;Integer&gt;&gt; result = new LinkedList&lt;&gt;();\n    Queue&lt;TreeNode&gt; queue = new LinkedList&lt;&gt;();\n    queue.add(root);\n    while (!queue.isEmpty()) {\n        List&lt;Integer&gt; level = new ArrayList&lt;&gt;();\n\n        int levelSize = queue.size();\n        // Pop only levelSize elements\n        for (int i = 0; i &lt; levelSize; i++) {\n            TreeNode current = queue.poll();\n            level.add(current.val);\n            if (current.left != null) {\n                queue.add(current.left);\n            }\n            if (current.right != null) {\n                queue.add(current.right);\n            }\n        }\n        result.add(level);\n    }\n    return result;\n}\n</code></pre>"},{"location":"#how-to-calculate-the-path-number-of-a-node-while-traversing-using-dfs","title":"How to calculate the path number of a node while traversing using DFS?","text":"<p>Example: 1 -&gt; 7 -&gt; 3 gives 173</p> <p>Solution: sum = sum * 10 + n</p> <pre><code>private int dfs(TreeNode node, int sum) {\n    if (node == null) {\n        return 0;\n    }\n\n    sum = 10 * sum + node.val;\n\n    // Do something\n}\n</code></pre>"},{"location":"#min-or-max-value-in-a-bst","title":"Min (or max) value in a BST","text":"<p>Move recursively on the left (on the right)</p>"},{"location":"#red-black-tree","title":"Red-Black tree","text":"<p>Self-balanced BST =&gt; O(log n) complexity</p> <ul> <li>Root node always black</li> <li>Incoming node is red</li> <li>Red violation: child and parent are red</li> <li>Resolve violation by recoloring and/or restructuring</li> </ul>"},{"location":"#further-reading_3","title":"Further Reading","text":"<p>Binary Trees: Red Black by David Pynes</p>"},{"location":"#red-black-tree-complexity-access-insert-delete_1","title":"Red-black tree complexity: access, insert, delete","text":"<p>All: O(log n)</p>"},{"location":"#reverse-a-binary-tree-algo","title":"Reverse a binary tree algo","text":"<pre><code>public void reverse(Node node) {\n    if (node == null) {\n        return;\n    }\n\n    Node temp = node.right;\n    node.right = node.left;\n    node.left = temp;\n\n    reverse(node.left);\n    reverse(node.right);\n}\n</code></pre>"},{"location":"#trie-definition-implementation-and-use-case","title":"Trie definition, implementation and use case","text":"<p>Tree-like data structure with empty root and where each node store characters</p> <p>Each path down the tree represent a word (until a null node that represents the end of the word)</p> <p>Usually implemented using a map of children (or a fixed size array with ASCII charset for example)</p> <p>Use case: dictionnary (save memory)</p> <p>Also known as prefix tree</p>"},{"location":"#why-to-use-bst-over-hash-table","title":"Why to use BST over hash table","text":"<p>Sorted keys</p> <p>#tree</p>"},{"location":"anki/","title":"Anki","text":"<p>Anki is a free software (Windows/Mac/Linux/iPhone/Android) designed to help remembering information. Anki relies on the concept of spaced repetition which is a proven technique to increase the rate of memorization. Here's a 2-minute video that delves into spaced repetition:</p> <p>Michael A. Nielsen, \"Augmenting Long-term Memory\"</p> <p>The single biggest change that Anki brings about is that it means memory is no longer a haphazard event, to be left to chance. Rather, it guarantees I will remember something, with minimal effort. That is, Anki makes memory a choice.</p> <p>I used Anki myself with Algo Deck and Design Deck and it paid off. This method played a key role in helping me land a role as L5 SWE at Google (senior software engineer).</p> <p>Here is a flashcard example:</p> <p></p> <p>The Anki versions (a clone of the flashcards from this repo) are available via one-time GitHub sponsorships:</p> <ul> <li>Algo Deck: $19 tier.</li> <li>Design Deck: $21 tier.</li> <li>Algo Deck and Design Deck: $29 tier.</li> </ul> <p>Trusted by over 100 developers.</p>"},{"location":"designdeck/","title":"Design Deck","text":"Anki <p>Check the Anki version here.</p>"},{"location":"designdeck/#cache","title":"Cache","text":""},{"location":"designdeck/#cache-aside","title":"Cache aside","text":"<p>Application is responsible for reading and writing to the DB (using write-through or write-back policy)</p> <p>The cache doesn't interact with the storage directly</p> <p></p>"},{"location":"designdeck/#cache-aside-vs-read-through","title":"Cache aside vs. read-through","text":"<p>Cache aside: - Data model can be different from DB</p> <p>Read-through: - Same data model as DB - Can use the refresh-ahead pattern</p>"},{"location":"designdeck/#cache-eviction-policy","title":"Cache eviction policy","text":"<ul> <li>LRU (Least Recently Used)</li> <li>LFU (Least Frequently Used)</li> <li>FIFO</li> </ul>"},{"location":"designdeck/#cache-locations","title":"Cache locations","text":"<ul> <li>Client caching</li> <li>CDN</li> <li>In memory</li> <li>Distributed cache</li> <li>Database caching (query or object)</li> </ul>"},{"location":"designdeck/#cache-refresh-ahead","title":"Cache: refresh-ahead","text":"<p>Cache to automatically refresh any recently accessed entry prior to its expiration</p> <p>Used with read-through cache</p> <ul> <li>Pro: can result in reduced latency</li> <li>Con: not accurately predicting which items are likely to be needed in the future</li> </ul>"},{"location":"designdeck/#cache-write-through-vs-write-back","title":"Cache: write through vs. write back","text":"<p>Main difference: consistency</p> <p>Write through: 1. Write to the cache and the DB in a single DB transaction (may still lead to cache inconsistency if the DB commit failed) 2. Return</p> <p>Write back: 1. Write to the cache 2. Return 3. Asynchronously store in DB</p>"},{"location":"designdeck/#four-main-distributed-cache-benefits","title":"Four main distributed cache benefits","text":"<ul> <li>Improve read latency</li> <li>Can improve availability (e.g., DB unavailable, responses are served from the cache)</li> <li>Save computation time (e.g., SQL computation)</li> <li>Independently scalable from the rest of the system</li> </ul>"},{"location":"designdeck/#main-metric-for-cache","title":"Main metric for cache","text":"<p>Cache hit ratio: hits / total accesses</p>"},{"location":"designdeck/#read-through-cache","title":"Read-through cache","text":"<p>Read-through cache sits in-line with the DB</p> <p>Single entry point</p>"},{"location":"designdeck/#when-to-use-a-cache","title":"When to use a cache","text":"<ul> <li>Speed up reads</li> <li>Response complex to compute</li> </ul>"},{"location":"designdeck/#cloud","title":"Cloud","text":""},{"location":"designdeck/#cdn","title":"CDN","text":"<p>Content Delivery Network</p> <p>Network of geographically dispersed servers used to deliver static content (images, CSS, Javascript files, etc.)</p> <p>Two kinds of CDNs: - Push CDN: we are responsible for providing the content - Pull CDN: CDN is responsible for pulling the right content (expiration to be used)</p> <p>Pull is easier to handle whereas push gives us more flexibility</p> <p>Use case for pull: Docker Hub S3 layer</p>"},{"location":"designdeck/#db","title":"DB","text":""},{"location":"designdeck/#3-main-reasons-to-partition-data","title":"3 main reasons to partition data","text":"<ul> <li>Scalability</li> <li>Improve performance of write heavy systems (usually, for example, key range partitioning can improve reads)</li> <li>Dataset doesn\u2019t fit into a single node</li> </ul>"},{"location":"designdeck/#acid-property","title":"ACID property","text":"<ul> <li> <p>Atomic: all transaction succeeds or none does (all or nothing)</p> </li> <li> <p>Consistency: from one valid state to another (invariants must always be true)</p> </li> </ul> <p>Not necessarily a property of the DB (e.g., foreign key constraint), can be a property of the application (e.g., credits and debits must be balanced)</p> <p>Different from consistency in eventual consistency (which is more about convergence as the matter is replicating data)</p> <ul> <li>Isolation: a transaction is not affected by another ongoing transaction (a transaction cannot read from another transaction that has not yet been completed)</li> </ul> <p>Refers to serializability</p> <ul> <li>Durability: once a transaction is committed, it will remain in the system</li> </ul>"},{"location":"designdeck/#anti-entropy","title":"Anti-entropy","text":"<p>Optimization to favor latency over consistency when writing to a DB (e.g., leaderless replication)</p> <p>Background process to constantly looks for differences in data</p> <p>Could be used as an alternative or in conjunction with read repair</p>"},{"location":"designdeck/#byzantine-fault-tolerant","title":"Byzantine fault-tolerant","text":"<p>A system is Byzantine fault-tolerant if it continues to operate correctly if in the case of a Bizantine's problem (some of the nodes malfunctioning, not obeying the protocol or malicious attackers).</p>"},{"location":"designdeck/#calm-theorem","title":"CALM theorem","text":"<p>Consistency As Logical Monotonicity</p> <p>A program has a consistent, coordination-free (e.g., consensus-free) distributed implementation if and only if it is monotonic</p> <p>Consistency in this context doesn't mean linearizability. It focuses on the consistency of the program's output while traditional consistency focus on the consistency of reads and writes.</p> <p>In CALM, a consistent program is one that produces the same output no matter in which order the inputs are processed and despite any conflicts.</p> <p>Said differently, does the implementation produce the outcome we expect despite any race condition that may arise.</p>"},{"location":"designdeck/#cap-theorem","title":"CAP theorem","text":"<p>Consistency, availability, partition tolerance (e.g., one node cut off from the rest of the cluster because of a network partition) =&gt; pick 2 out of 3</p> <p>C refers to linearizability</p>"},{"location":"designdeck/#caveat-of-serializability","title":"Caveat of serializability","text":"<p>It's possible that serial order is different from the order in which transactions were actually run (latest may not win)</p> <p>If not, we need a stricter isolation level: strict serializability (serializability + linearizability)</p>"},{"location":"designdeck/#chain-replication","title":"Chain replication","text":"<p>Replication protocol that uses a different topology than leader based replication protocols like Raft</p> <p>Left-most process referred as the chain's head, right-most as the chain's tail: - Client send writes to the head, which updates its local state and forwards to the next process in the chain - Next process updates its local state and forwards to the next process in the chain - Etc. - Once the update is received by the tail, the ack flows back to the head which replies to the client that the write succeeded</p> <p></p> <p>Fault tolerance is delegated to a dedicated component: control plane - If head fails: the control plane removes it and makes the next as the head - If intermediate node fails: the control plane removes it temporarily from the chain, and then adds it back eventually as the tail - If tail fails: the control plane removes it and makes the predecessor as the new tail</p> <p>Benefits: - Strongly consistent protocol - Reads are served from the tail without contacting other replicas first which allows a lower response time</p> <p>Drawbacks: - Writes are slower than quorum-based replication. - A single slow node can slow down all writes. - As reads are served from a single node, it can't be scaled horizontally. A mitigation is to allow intermediate nodes to serve reads but they can do it only if a read is considered as clean (the ack for this object has been returned to the predecessor). // The tail serves as the authority of the latest clean version</p> <p>Notes: - To avoid the overhead of having a single node handling the writes, we can find a way to shard data and handle multiple chains (see https://engineering.fb.com/2022/05/04/data-infrastructure/delta/)</p>"},{"location":"designdeck/#chain-replication-vs-consensus","title":"Chain replication vs. consensus","text":"<p>Similar consistency guarantees</p> <p>Chain replication: - Optimized for reads for CP systems - Better read availability: a chain of n nodes can tolerate up to n-2 nodes failure</p> <p>Example with 5 nodes:     - Chain replication: tolerate up to 3 nodes failure     - Consensus with R=3 and W=3: tolerate up to 2 nodes failure</p> <p>Consensus: - Optimized for writes for CP systems</p>"},{"location":"designdeck/#change-data-capture-cdc","title":"Change data capture (CDC)","text":"<p>A datastore is selected as the authoritative source of data where all update operations are performed</p> <p>An event log is then created from this datastore that is consumed by all the remaining operations the same way as in event sourcing</p>"},{"location":"designdeck/#concurrency-control","title":"Concurrency control","text":"<p>Ensures that correct results for concurrent operations are generated</p> <p>Pessimistic: lock (mutual exclusion)</p> <p>Optimistic: checks for conflicts at the end of a transaction</p> <p>In the end, concurrency control serves the same purpose as atomicity</p>"},{"location":"designdeck/#consensus","title":"Consensus","text":"<p>Set of processes agreeing on some data value in a fault-tolerant way</p> <p>Satisfies safety and liveness</p>"},{"location":"designdeck/#consistency-models","title":"Consistency models","text":"<p>Describe what expectations clients might have in terms of possible returned values despite the existence of multiple copies of data and concurrent access to it</p> <p>Not the C in ACID but the C in CAP (converging to an end state)</p> <p></p> <ul> <li> <p>Eventual consistency: all the nodes converge to the same state (not necessarily the latest)</p> </li> <li> <p>Write follow reads: ensures that writes are ordered after writes that were observed by previous read operations</p> </li> </ul> <p>Example:     - P1 reads value =&gt; foo     - P1 updates value to bar       =&gt; Every node will converge to bar (a process can't read bar, then foo, regardless of the process)       Also known as session causality</p> <ul> <li> <p>Monotonic reads consistency: a client doing several reads in sequence will never go backward in time</p> </li> <li> <p>Monotonic writes consistency: values originating from the same client appear in the order the client has executed them</p> </li> <li> <p>Read-after-write-consistency: if a client performs a write, then this write if visible during subsequent reads</p> </li> </ul> <p>Also known as read-your-writes consistency</p> <ul> <li> <p>Causal consistency: operations that are causally related need to be seen in the same order by all the nodes</p> </li> <li> <p>Sequential consistency: operations appear to take place in some total order, and that order is consistent with the order of operations from each individual clients</p> </li> </ul> <p>Twitter example: no guarantee between which tweet is seen first between two friends posting at the same time, but the ordering is guaranteed for the same friend</p> <ul> <li>Linearizability: make a system appear as if there is only a single copy of the data and all operations are atomic (one operation at a time)</li> </ul> <p>Even though there may be multiple replicas, the application does not need to worry about them</p> <p>C in CAP</p> <p>Real time guarantees</p>"},{"location":"designdeck/#cqrs","title":"CQRS","text":"<p>Command Query Responsibility Segregation</p> <p>Dissociate writes (command) from reads (query)</p> <p>Pros: - Allows creating stores per use case (e.g., analytics, geospatial) - Scale the read and write parts independently</p> <p>Cons: - Eventual consistency between the stores</p>"},{"location":"designdeck/#crdt","title":"CRDT","text":"<p>Conflict-free Replicated Data Types</p> <p>Data structure that is replicated across nodes: - Replicas are updated independently, concurrently and without coordination - An algo (part of the data type) can perform a deterministic conflict resolution - Replicas are guaranteed to eventually converge to the same state   =&gt; Strong eventual consistency</p> <p>Used in the context of collaborative applications</p> <p>Note: CRDTs can be combined to form new CRDTs</p>"},{"location":"designdeck/#crdt-and-collaborative-applications-eg-google-docs","title":"CRDT and collaborative applications (e.g., Google Docs)","text":"<p>Compared to OT, each character has a stable identifier (even if characters are added or deleted)</p> <p>Example: 0 is the beginning of the document, 1 is the end of the document, every character has a fractional number as an ID</p> <p>May lead to interleaving problems (e.g;, two inserted words by two users are interleaved: \"Alice\", \"Bob\" =&gt; \"BoAlibce\"</p> <p>Interleaving depends on the merging algorithm used (e.g., Treedoc doesn't lead to interleaving)</p>"},{"location":"designdeck/#db-indexes-tradeoff","title":"DB indexes tradeoff","text":"<p>Speed up read query but slow down writes</p>"},{"location":"designdeck/#db-internal-components","title":"DB internal components","text":"<ul> <li>Transport layer accepting requests</li> <li>Query processor determining the most efficient way to run queries</li> <li>Execution engine</li> <li>Storage engine</li> </ul>"},{"location":"designdeck/#db-read-vs-write-heavy-latency-vs-consistency-availability-vs-consistency-acid-vs-non-acid","title":"DB: read vs. write-heavy, latency vs. consistency, availability vs. consistency, ACID vs. non-ACID","text":""},{"location":"designdeck/#delta-crdts","title":"Delta CRDTs","text":"<p>Optimized state-based CRDTs where only recently applied changes to a state are replicated instead of the full state</p>"},{"location":"designdeck/#denormalization","title":"Denormalization","text":"<p>Introduce some amount of duplication in a normalized dataset in order to speed up reads (e.g., denormalized document, cache or index)</p> <p>Cons: - Requires more space - May slow down writes</p>"},{"location":"designdeck/#design-consideration-when-partitioning-data","title":"Design consideration when partitioning data","text":"<p>Should match the primary access pattern</p>"},{"location":"designdeck/#downside-of-distributed-transactions","title":"Downside of distributed transactions","text":"<p>Performance penalty</p> <p>Example: distributed transactions in MySQL are reported to be over 10 times slower than single-node transactions</p>"},{"location":"designdeck/#event-sourcing","title":"Event sourcing","text":"<p>Ensures that all changes to application state are stored as a sequence of events</p>"},{"location":"designdeck/#eventual-consistency-requirements","title":"Eventual consistency requirements","text":"<ul> <li>Eventual delivery: every update applied at a replica is eventually applied to all replicas</li> <li>Convergence: guarantees that replicas have applied the same updates eventually reach the same state</li> </ul>"},{"location":"designdeck/#examples-of-solutions-offering-leader-election-abstractions","title":"Examples of solutions offering leader election abstractions","text":"<ul> <li>etcd (linearizable)</li> <li>ZooKeeper (not linearizable for read operations)</li> </ul>"},{"location":"designdeck/#federation","title":"Federation","text":"<p>Splits up DB by function</p>"},{"location":"designdeck/#fencing-token","title":"Fencing token","text":"<p>Monotonically increasing token that increments whenever a client acquires a distributed lock</p> <p>Use case: when writing to a DB, if the provided token has a lower value than the current one, rejects the write</p> <p>Solve possible issues with lease as an update has to be made from the latest token</p>"},{"location":"designdeck/#gossip-protocol","title":"Gossip protocol","text":"<p>Peer-to-peer protocol based on the way epidemics spread</p> <p>No central registry and the only way to spread common data is to rely on each member to pass it along to their neighbors</p> <p>Useful when broadcasting to a large number of processes like thousands or more, where a deterministic protocol wouldn't scale</p>"},{"location":"designdeck/#graph-db-main-use-case","title":"Graph DB main use case","text":"<p>Relational can handle simple cases of many-to-many relationships</p> <p>Yet, if the connections become more complex, it's more natural to start modeling data as a graph</p>"},{"location":"designdeck/#hinted-handoff","title":"Hinted handoff","text":"<p>Optimization to favor latency over consistency when writing to a DB</p> <p>If a coordinator node cannot contact the necessary number of replicas, it stores locally the result of the operation and forward it to the failed node(s) after they recovered</p> <p>Used in sloppy quorums</p>"},{"location":"designdeck/#hot-spot-in-partitioning","title":"Hot spot in partitioning","text":"<p>Partition is heavily loaded compared to others</p> <p>Also called skew</p>"},{"location":"designdeck/#in-a-database-strategy-to-handle-rebalancing","title":"In a database, strategy to handle rebalancing","text":"<p>Not based on key hashing as a rebalancing would be huge</p> <p>Simple solution: Create many more partitions than nodes and assign several partitions to each node (e.g., a db running on a cluster of 10 nodes may be split into 10k partitions). When a node is added to the cluster, it will steal a few partitions from every existing node</p>"},{"location":"designdeck/#isolation-levels","title":"Isolation levels","text":"<p>Degree to which transactions are isolated from other concurrent execution transactions</p> <p>Isolations come at a performance cost (more coordination and synchronization)</p> <p></p> <ul> <li>Dirty writes: a transaction overwrites a value that has previously been written by another transaction that is still in-flight and has not been committed yet</li> </ul> <p>=&gt; Can violate integrity constraints</p> <ul> <li>Dirty reads: a transaction observes a write from a transaction that hasn't been committed yet</li> </ul> <p>=&gt; decisions can be taken based on data updates that can be rolled back</p> <ul> <li> <p>Fuzzy reads: a transaction reads a value twice but sees a different value in each read because a committed transaction updated the value between the two reads</p> </li> <li> <p>Lost updates: two transactions reads the same value and then try to update it to two different values, only one update survives</p> </li> </ul> <p>Example: Two transactions read the current inventory size (say 100 items), add respectively 5 and 10 items and then store back the size. Depending on the execution order, then final order can be 110 instead of 115.</p> <ul> <li> <p>Read skew: an integrity constraint seems to be violated because a transaction can only see partial results of another transaction</p> </li> <li> <p>Write skew: when two transactions read the same objects, and then updates some of those objects</p> </li> </ul> <p>Example: Two on-call doctors for a shift. Both feeling unwell, and decide to request leave. They both click the button at the same time. In the case of a write skew, the two transactions can succeed as for both, when reading the number of available doctors, it was more than one.</p> <ul> <li>Phantom reads: when a transaction does a predicate-based read and another transaction writes or removes a data matched by this predicate while the first transaction is still in flight</li> </ul> <p>Example: Transaction A computes the max and average age of employees. Transaction B is interleaved and inserts a lot of old employees. Thus, the average age could be larger than the max.</p>"},{"location":"designdeck/#known-crdts","title":"Known CRDTs","text":"<p>Counter: - Grow-only counter: increment only - Positive-negative counter: increment and decrement (combination of two grow only counter: one positive, one negative)</p> <p>Register (a memory cell storing whatever): - LWW-register: total order using timestamps - Multi-value register: keep track of causality, in case of conflicts it returns all conflicting cases (analogy: Git with an interactive merge resolution)</p> <p>Set: - Grow-only set: once an element is added it can't be removed - Two-phase set: elements can be added and removed (combination of two grow only set) - LWW-element set (last-write-wins): similar to two-phase set but we associate a timestamp for each element to resolve conflicts - Observed-remove set: use tags instead of timestamps; each element is associated to a list of add-tags and a list of remove-tags (example: vector clocks) - Sequence: used to build collaborative applications (e.g., Treedoc)</p>"},{"location":"designdeck/#last-write-wins-lww","title":"Last-write-wins (LWW)","text":"<p>Conflict resolution based on timestamp</p> <p>Used by DynamoDB or Cassandra to resolve conflicts</p> <p>Shouldn't happen in single-master replication</p>"},{"location":"designdeck/#leader-election","title":"Leader election","text":"<p>Algorithm to guarantee at most one leader at any given time (safety) and that an election eventually completes (liveness)</p>"},{"location":"designdeck/#lsm-tree","title":"LSM tree","text":"<p>Log-Structured Merge tree</p> <p>Consists of smaller mutable memory-resident (memtable) and larger immutable disk-resident (SSTable) components</p> <p>Memtables data are sorted and flushed on disk when their size reaches a configurable threshold or periodically</p> <p>Because of a memtable is just a special case of buffer, durability is not guaranteed (durability must be brought by replication)</p> <p>Examples: Lucene, Cassandra, Bitcask, etc.</p>"},{"location":"designdeck/#lsm-tree-vs-b-tree","title":"LSM tree vs. B-tree","text":"<p>LSM-tree faster for writes, slower for reads because it has to check multiple data structures (bigger read amplification): memtable and SSTable</p> <p>Compaction can impact ongoing requests</p> <p>B-tree faster for reads, slower for writes as it must write every piece of data at least twice in the WAL &amp; tree itself (bigger write amplification)</p> <p>Each key exists in exactly one place =&gt; easier to offer strong transactional semantics</p>"},{"location":"designdeck/#main-difference-between-consistency-models-and-isolation-levels","title":"Main difference between consistency models and isolation levels","text":"<p>Consistency models: applies to single-object operations</p> <p>Isolation levels: applies to multi-object operations</p>"},{"location":"designdeck/#merkle-tree","title":"Merkle tree","text":"<p>A tree in which every leaf is labelled with the hash of a data block: - Level n contains the data blocks - Level n-1 the hash of one data block - Level n-2 the hash of 2 data blocks - Level 1 the hash of all the data blocks</p> <p></p> <p>Efficient and secure verification of the contents of a large data structure</p> <p>Allows reducing data transfered between a client and a server. For example, if we want to compare a merkle tree stored on a server with one store on the client, they can both exchange their top hash. If different, we can delve in and only get the data blocks which have changed.</p>"},{"location":"designdeck/#monotonic-reads-consistency-implementation","title":"Monotonic reads consistency implementation","text":"<p>One way to achieve it is to make sure each user always makes their reads from the same replica</p>"},{"location":"designdeck/#mvcc","title":"MVCC","text":"<p>Multiversion Concurrency Control</p> <p>A possible implementation of optimistic concurrency control and snapshot isolation level</p> <p>MVCC allows reads and writes to proceed with minimal coordination on the storage level since reads can continue accessing older values until the new ones are committed</p>"},{"location":"designdeck/#n1-select-problem","title":"N+1 select problem","text":"<p>Assuming a one-to-many relationship between 2 tables A and B =&gt; A 1-* B</p> <p>If we want to iterate through all the A and for each one, print the list of B, the naive implementation would be: - <code>select * from A</code> - And then for each A, <code>select * from B where A_ID = ?</code></p> <p>Alternatively, we could reduce the number of rount-trips to the DB from N+1 to 2 with a simple <code>select * from B</code></p> <p>Most ORM tools prevent N+1 selects</p>"},{"location":"designdeck/#nosql-main-types-and-main-architecture-principles","title":"NoSQL: main types and main architecture principles","text":"<p>Key-value store, document store, column-oriented store or graph DB</p> <ul> <li>Mainly partition-based</li> <li>Leverage eventual consistency</li> </ul>"},{"location":"designdeck/#operation-based-crdts","title":"Operation-based CRDTs","text":"<p>Commutative replicated data types</p> <p>Replication is made in propagating the update operation</p> <p>Operations characteristics: - Must be commutative. - Not necessarily idempotent. If idempotent, OK. If not, it's up to the delivery layer to ensure the operations are delivered without duplication. - Delivered in causal order.</p>"},{"location":"designdeck/#operational-transformation-ot-concept-and-main-drawback","title":"Operational transformation (OT): concept and main drawback","text":"<p>A way to handle collaborative applications</p> <p>Receive update operations and depending on the operations that occur concurrently, transform them</p> <p>Example: - Initial state: \"helo\" - Concurrently: user 1 inserts \"l\" at position 3 and user 2 inserts \"!\" at position 4 - If transaction for user 1 completes before the one of user 2, we end up with \"hell!o\" instead of \"hello!\" - OT will transorm the transaction from user 2 into: insert \"!\" at position 5</p> <p>Drawback: all the communications go through a central server (e.g., impossible with systems at scale such as Google Docs)</p> <p>Replaced with CRDT</p>"},{"location":"designdeck/#optimistic-concurrency-control-pros-and-cons","title":"Optimistic concurrency control: pros and cons","text":"<p>Perform badly if high contention as it leads to a high proportion of retry, thus making performance worse</p> <p>If not much contention, it tends to perform better than pessimistic</p>"},{"location":"designdeck/#pacelc-theorem","title":"PACELC theorem","text":"<p>If case of a network partition (P): we should choose between availability (A) or consistency (C)</p> <p>Else, in the absence of partition (E): we should choose between latency (L) or consistency (C)</p> <p>Most systems are either: - AP/EL - CP/EC</p>"},{"location":"designdeck/#partitioning-sharding","title":"Partitioning (sharding)","text":"<p>Split up a large dataset that is too big for a single machine into smaller parts and spread them across several machines</p> <p>Define the partition type based on the primary access pattern</p>"},{"location":"designdeck/#partitioning-criteria","title":"Partitioning criteria","text":"<p>Range partitioning: keys are sorted and a partition owns all the keys from some minimum up to some maximum (example: MySQL RANGE COLUMNS partitioning) - Pros: efficient range queries - Cons: Risk of hot spots, requires repartitioning to potentially split a range into two subranges if a partition gets too big</p> <p>Hash partitioning: hash function is applied to each key and a partition owns a range of hashes</p>"},{"location":"designdeck/#partitioning-methods","title":"Partitioning methods","text":"<p>Horizontal partitioning: partition by rows</p> <p>Vertical partitioning: partition by columns (create tables with fewer columns)</p> <p>Rationale: if the subtables have different access patterns (e.g., a column is a blob that we rarely consume, we can create a vertical partitioning to store this blob not on the primary disk)</p> <p>Also called normalization</p>"},{"location":"designdeck/#quorum","title":"Quorum","text":"<p>Minimum number of nodes that need to vote on an operation before it can be considered successful</p> <p>Usually: majority</p>"},{"location":"designdeck/#raft","title":"Raft","text":"<p>Leader election and replication algorithms</p>"},{"location":"designdeck/#leader-election_1","title":"Leader election","text":"<p>Using a state machine to elect a leader</p> <p>Each process is in one of these three states: leader, candidate (part of the election process), follower</p>"},{"location":"designdeck/#replication","title":"Replication","text":"<p>The leader stores the sequence of operations altering the state into a local ordered log</p> <p>Then, this log is replicated across followers Each entry is considered as committed when it has been replicated on a majority of nodes</p> <p>Replication enables consensus</p>"},{"location":"designdeck/#read-repair","title":"Read repair","text":"<p>Optimization to favor latency over consistency when writing to a DB (e.g., leaderless replication)</p> <p>If a coordinator node receives conflicting values from the contacted replicas (which shouldn't happen in case of single-master replication for example), it resolves the conflict by: - Resolving the conflict (e.g., LWW) - Forwarding it to the stale replica - Responding to the read request</p>"},{"location":"designdeck/#relation-between-replication-factor-write-consistency-and-read-consistency","title":"Relation between replication factor, write consistency and read consistency","text":"<p>Given: - N: number of replicas - W: number of nodes that have to ack a write for it to succeed - R: number of nodes that have to respond to a read operation for it to succeed</p> <p>If R+W &gt; N, the system can guarantee to return the most recent written value because there's always an overlap between read and write sets (consistency)</p> <p>Notes: - In case of read-heavy systems, we want to minimize R - If W = 1 and R = N, durability isn't guaranteed in the presence of failure - If W &lt; (N+1)/2, it may leads to write conflicts (e.g., W &lt; 2 if 3 nodes) - If R+W &lt;= N, weak/eventual consistency</p>"},{"location":"designdeck/#replication-vs-partition-impacts","title":"Replication vs. partition: impacts","text":"<p>Replication: - Read-heavy - Availability &gt; consistency</p> <p>Partition: - Write-heavy (splitting up data across different shards)</p>"},{"location":"designdeck/#schema-on-read-vs-schema-on-write","title":"Schema-on-read vs. schema-on-write","text":"<p>Schema-on-read: implicit schema but not enforced by the DB (also called schemaless but misleading)</p> <p>Schema-on-write: explicit schema, the DB ensures all writes are conforming to it (e.g., relational DB)</p>"},{"location":"designdeck/#serializability","title":"Serializability","text":"<p>I in ACID (strong isolation level)</p> <p>Equivalent to serial execution (no interleaving due to concurrent transactions)</p>"},{"location":"designdeck/#serializable-snapshot-isolation-ssi","title":"Serializable Snapshot Isolation (SSI)","text":"<p>Snapshot Isolation (SI) allows write skew</p> <p>SSI is a stricter isolation level than SI preventing write skew: check at runtime for conflicts between transactions</p> <p>Downside: increase the number of aborted transactions</p>"},{"location":"designdeck/#single-leader-multi-leader-leaderless-replication","title":"Single-leader, multi-leader, leaderless replication","text":""},{"location":"designdeck/#single-leader","title":"Single-leader","text":"<p>All writes go through one leader</p> <p>Pro: ensure consistency</p> <p>Con: all writes go through a single node (bottleneck)</p>"},{"location":"designdeck/#multi-leader","title":"Multi-leader","text":"<p>Rarely makes sense within a single datacenter (benefits rarely outweigh the added complexity) but used in multi-datacenter contexts</p> <p>DB must resolve the conflicts in a convergent way</p> <p>Use cases: - One leader per datacenter</p> <p></p> <ul> <li>Clients with offline operation</li> <li>Collaborative editing</li> </ul> <p>Different topologies:</p> <p></p> <p>Most used: all-to-all</p> <p>Pro: not limited to the write throughput of a single node</p> <p>Con: possible write conflicts</p>"},{"location":"designdeck/#leaderless-replication","title":"Leaderless replication","text":"<p>Client sends its writes to several replicas in parallel</p> <p>Read requests are also sent in parallel to multiple replicas (this way, if a write hasn't been replicated yet to one replica, it won't lead to stale data)</p> <p>Rely on read repair and anti-entropy mechanisms</p> <p>Rely on quorum to know how long to wait for a request (not perfect: if a write fails because we didn't reach a quorum, what shall we do about the replicas where the write has already been committed)</p> <p>Examples: Cassandra, DynamoDB, Riak</p> <p>Pro: throughput</p> <p>Con: quorums are not perfect, provide illusion of strong consistency when  in reality, it's often not true</p>"},{"location":"designdeck/#sloppy-quorum","title":"Sloppy quorum","text":"<p>In case of a quorum of w nodes to accept a write: if we can't reach w, the DB accepts the write replicate it to nodes that aren't among the ones on which the value usually lives</p> <p>Relies on hinted handoff</p>"},{"location":"designdeck/#snapshot-isolation-si","title":"Snapshot Isolation (SI)","text":"<p>Guarantee that all reads made in a transaction will see a consistent snapshot of the database</p> <p>In practice, it reads the last committed values that existed at the time it started</p> <p>Allows write skew</p>"},{"location":"designdeck/#snapshot-isolation-common-implementation","title":"Snapshot Isolation common implementation","text":"<p>MVCC</p>"},{"location":"designdeck/#sstable","title":"SSTable","text":"<p>Sorted String Table, immutable components of a LSM tree</p> <p>Sorted immutable data structure</p> <p>It consists of 2 components: index files and data files</p> <p>The index (based on a hashtable or a B-tree) holds the keys and the data entries (offsets in the data file where the actual records are located)</p> <p>Data files hold records in key order</p>"},{"location":"designdeck/#state-based-crdts-definition-and-requirements","title":"State-based CRDTs: definition and requirements","text":"<p>Convergent replicated data types</p> <p>Replication is made in propagating the full local state to replicas</p> <p>States are merged with a function which must be: - Commutative - Idempotent - Associative   =&gt; Update monotonically increase the internal state according to some partial order rules defined (e.g., max of two values, union of two sets)</p> <p>=&gt; Delivery layer doesn't have to guarantee causal ordering nor idempotency, only eventual delivery</p>"},{"location":"designdeck/#strong-eventual-consistency-definition-and-requirements","title":"Strong eventual consistency: definition and requirements","text":"<p>Stronger guarantee than eventual consistency</p> <p>Based on the fact that we can define a deterministic outcome for any conflict</p> <p>Requires: - Eventual delivery: every update applied to a replica is eventually applied to all replicas - Strong convergence: guarantees that replicas that have executed the same updates have the same state (with eventual consistency, the guarantee is that the replicas eventually reach the same state, once consensus is reached)</p> <p>Strong convergence requires convergent replicated data types (part of CRDT family)</p> <p>Main difference with eventual consistency: - Leaderless replication - No consensus needed, instead, it relies on a deterministic outcome for any conflict</p> <p>A solution to the CAP theorem</p>"},{"location":"designdeck/#three-phase-commit-3pc","title":"Three-phase commit (3PC)","text":"<p>Failure-resilient refinement of 2PC</p> <p>Unlike 2PC, satisfies liveness but not safety</p>"},{"location":"designdeck/#transaction","title":"Transaction","text":"<p>A unit of work performed in a database system, representing a change, which can be potentially composed of multiple operations</p>"},{"location":"designdeck/#two-main-approaches-to-partition-a-table-that-has-secondary-indexes","title":"Two main approaches to partition a table that has secondary indexes","text":"<p>Partitioning secondary indexes by document: - Each partition maintains its own secondary index - Write: one partition - Query on the index: requires querying multiple partitions (scatter/gather)</p> <p>Optimized from writes</p> <p>Example: Elasticsearch, MongoDB, Cassandra, Riak, etc.</p> <p>Partitioning secondary indexes by term: - Global index covering all the partitions (to be replicated) - Write: multiple partitions are updated (for resiliency) - Query on the index: served from one partition containing the index</p> <p>Optimized from reads</p>"},{"location":"designdeck/#two-types-of-crdts","title":"Two types of CRDTs","text":"<p>Operation-based and state-based</p> <p>Operation-based require less bandwidth</p> <p>State based require less assumptions about the delivery layer</p>"},{"location":"designdeck/#two-phase-commit-2pc","title":"Two-phase commit (2PC)","text":"<p>Protocol used to implement atomic transaction commits across multiple processes</p> <p>Satisfies safety but not liveness</p>"},{"location":"designdeck/#wal","title":"WAL","text":"<p>Write-ahead log (or redo log)</p> <p>Append-only file to which every modification must be written</p> <p>Used for restoration in the event of a DB crash: - Durability - Atomicity (allows to identify the operations on progress and complete or undo them)</p>"},{"location":"designdeck/#when-relational-vs-when-document","title":"When relational vs. when document","text":"<p>Relational (schema-on-write): - Better support for joins - Many-to-one and many-to-many relationships - ACID</p> <p>Document (schema-on-read): - Schema flexibility - Better performance due to locality - Closer to the data structures used by the application - In general not ACID - In general write-heavy</p>"},{"location":"designdeck/#when-to-use-a-column-oriented-store","title":"When to use a column-oriented store","text":"<p>Because columns are stored contiguously: analytical workloads (computing average values, finding trends, etc.)</p> <p>Flexible schema</p> <p>Limited space (storing same data type together offers a better compression ratio)</p>"},{"location":"designdeck/#why-db-schemaless-is-misleading","title":"Why DB schemaless is misleading","text":"<p>There is an implicit schema but not enforced by the DB</p> <p>More accurate term: schema-on-read</p> <p>Different from relational DB with shema-on-write where the schema is explicit and the DB ensures all written data conforms to it</p> <p>Similar to dynamic vs. static type checking in a programming language</p>"},{"location":"designdeck/#why-is-in-memory-faster","title":"Why is in-memory faster","text":"<p>Not necessarily because they don't need to read from disk (even a disk-based storage engine may never need to read from disk if enough memory)</p> <p>Can be faster because they avoid the overhead of encoding in a form that can be written to disk</p>"},{"location":"designdeck/#write-and-read-amplification","title":"Write and read amplification","text":"<p>Ratio of the amount of data written/read to the disk versus the amount of data intended to be written</p>"},{"location":"designdeck/#write-heavy-and-replication-type","title":"Write heavy and replication type","text":"<p>Do not rely on single-master replication as it heavily impacts the scaling of write-heavy systems</p> <p>Instead, rely on leaderless replication</p> <p>Trade off: consistency is harder to guarantee </p>"},{"location":"designdeck/#design","title":"Design","text":""},{"location":"designdeck/#auditing","title":"Auditing","text":"<p>Checking the integrity of data</p>"},{"location":"designdeck/#backward-vs-forward-compatibility","title":"Backward vs. forward compatibility","text":""},{"location":"designdeck/#bloom-filter","title":"Bloom filter","text":"<p>Probabilistic, memory-efficient data structure for approximating the content of a set</p> <p>Can tell if a key does not appear in the DB</p>"},{"location":"designdeck/#causality","title":"Causality","text":"<p>Causal dependency: one event causing another</p> <p>Happened-before relationship</p>"},{"location":"designdeck/#concurrent-operations","title":"Concurrent operations","text":"<p>Not only operations that happen at the same time but also operations made without knowing about each other</p> <p>Example: - Concurrent to-do list operations with a current \"Buy milk\" item - User 1 deletes it - User 2 doesn't have an internet connection, modifies it into \"Buy soy milk\", and then is connected again =&gt; this modification may have been done one hour after user 1 deletion</p>"},{"location":"designdeck/#consistent-hashing","title":"Consistent hashing","text":"<p>Special kind of hashing such that when a resize occurs, only 1/n percent of the keys need to be rebalanced (n: number of nodes)</p> <p>Solutions: - Ring consistent hash with virtual nodes to improve the distribution - Jump consistent hash: faster but nodes must be numbered sequentially (e.g., if we have 3 servers foo, bar, and baz =&gt; we can't decide to remove bar)</p>"},{"location":"designdeck/#design-impacts-of-sharing","title":"Design impacts of sharing","text":"<p>May decrease: - Availability - Performance - Scalability</p>"},{"location":"designdeck/#design-read-heavy-vs-write-heavy-impacts","title":"Design: read-heavy vs. write-heavy impacts","text":"<p>Read heavy: - Leverage replication - Leverage denormalization</p> <p>Write heavy: - Leverage partition (usually) - Leverage normalization</p>"},{"location":"designdeck/#different-types-of-message-failure","title":"Different types of message failure","text":"<ul> <li>Delayed</li> <li>Dropped</li> <li>Duplicated</li> <li>Out-of-order</li> </ul>"},{"location":"designdeck/#event-log-vs-message-queue","title":"Event log vs. message queue","text":"<p>Event log: - Consumers are free to select the point of the log they want to consume messages from, which is not necessarily the head - Log is immutable, messages cannot be removed by consumers (removed by a GC running periodically)</p>"},{"location":"designdeck/#exactly-once-delivery","title":"Exactly-once delivery","text":"<p>Impossible to achieve</p> <p>However, we can achieve exactly-once processing using a dedup or by requiring the consumers to be idempotent</p>"},{"location":"designdeck/#flp-impossibility","title":"FLP impossibility","text":"<p>In an asynchronous distributed system, there's no consensus algorithm that can satisfy: - Agreement - Validity - Termination - And fault tolerance</p>"},{"location":"designdeck/#geohashing","title":"Geohashing","text":"<p>Encode geographic coordinates into a short string called a cell with varying resolutions</p> <p>The more letters in the string, the more precise the location</p> <p></p> <p>Main use case: - Proximity searches in O(1)</p>"},{"location":"designdeck/#hashing-definition-and-size-of-md5-and-sha256","title":"Hashing definition and size of MD5 and SHA256","text":"<p>Map data of arbitrary size to fixed-size values</p> <p>Examples: - MD5: 16 bytes - SHA256: 32 bytes</p>"},{"location":"designdeck/#hdfs","title":"HDFS","text":"<p>Distributed filesystem: - Fault tolerant - Scalable - Optimised for batch operations</p> <p>Architecture: - Single master (maintain filesystem metadata, inform clients about which server store a specific part of a file) - Multiple data nodes</p> <p>Leverage: - Partitioning: each file is partitioned into multiple chunks =&gt; performance - Replication =&gt; availability</p> <p>Read: communicates with the master node to identify the servers containing the relevant chunks</p> <p>Write: chain replication</p>"},{"location":"designdeck/#how-to-reduce-sharing","title":"How to reduce sharing","text":"<ul> <li>Decompose stateful and stateless parts of a system =&gt; makes scaling easier</li> <li>Partitioning =&gt; fault isolation</li> </ul>"},{"location":"designdeck/#hyperloglog","title":"HyperLogLog","text":"<p>Used to approximate cardinality of a set</p> <p>Optimization for space over perfect accuracy</p>"},{"location":"designdeck/#backing-idea","title":"Backing idea","text":"<p>Coin flip game: you flip a coin, if head, flip again, if tail stop</p> <p>If a player reaches n flips, it means that on average, he tried 2n+1 times</p>"},{"location":"designdeck/#algo","title":"Algo","text":"<p>For an ID, we will count how many consecutive 0 (head) bits on the left</p> <p>Example: 001110 =&gt; 2</p> <p>Hence, on average we should have seen 22+1 visitors</p> <p>Requirement: we need visitors ID to be uniform =&gt; either if the ID is randomly generated or by hashing them (if ID is auto incremented for example)</p> <p>Required memory: log(log(m)) with m the number of  unique visitors</p> <p>Problem with this algo: it depends on luck. For example, if user 00000001 connects every day =&gt; the system will always approximate 28 visitors</p>"},{"location":"designdeck/#bucketing","title":"Bucketing","text":"<p>Distribute to multiple counters and aggregate the results (possible because each counter is very small)</p> <p>If we want 4 counters, we distribute the ID based on the first 2 bits</p> <p>Result: 2(n1 + n2 + n3 + n4) / 4</p> <p>Problem: mean is highly impacted with large outliers</p> <p>Solution: use harmonic mean</p>"},{"location":"designdeck/#idempotent","title":"Idempotent","text":"<p>If executed more than once it has the same effect as if it was executed once</p>"},{"location":"designdeck/#latency-numbers-every-programmer-should-know","title":"Latency numbers every programmer should know","text":"<ul> <li>Read 1MB sequentially from memory: 250 \u00b5s</li> <li>Round trip within the same datacenter: 0.5 ms</li> <li>Read 1MB sequentially from SSD: 1 ms</li> <li>Disk seek: 10 ms</li> <li>Read 1MB sequentially from disk: 20 ms</li> <li>Send round trip packet over continents: ~100 ms</li> </ul>"},{"location":"designdeck/#lease","title":"Lease","text":"<p>Lock with an expiry timeout after which the lock is automatically released</p> <p>May lead to situations where two nodes believe they hold the lock (for example, when the expiry signal hasn't been caught yet by the first node because of a GC or CPU throttling)</p> <p>Can be solved using a fencing token</p>"},{"location":"designdeck/#least-loaded-endpoint-load-balancing-strategy","title":"Least loaded endpoint load balancing strategy","text":"<p>Not efficient</p> <p>A more efficient option is to randomly pick two servers and route the request to the least-loaded one of the two</p>"},{"location":"designdeck/#liveness-property","title":"Liveness property","text":"<p>Something good will eventually occur</p> <p>Example: leader is elected, eventual consistency</p>"},{"location":"designdeck/#load-balancing","title":"Load balancing","text":"<p>Route requests across a pool of servers</p>"},{"location":"designdeck/#load-shedding","title":"Load shedding","text":"<p>Action to reduce the load on something</p> <p>Example: when the CPU utilization reaches a threshold, the server can start returning errors</p> <p>A more special form of load shedding is selective client throttling, where an application assigns different quotas to each of its clients</p>"},{"location":"designdeck/#locality","title":"Locality","text":"<p>Performance optimization to put several pieces of data in the same place</p>"},{"location":"designdeck/#log","title":"Log","text":"<p>Append-only, totally ordered sequence of messages</p> <p>Each message is: - Appended at the end of the log - Is assigned a unique sequential index</p> <p>Example: Kafka</p>"},{"location":"designdeck/#log-compaction","title":"Log compaction","text":"<p>Throw away duplicate keys in the log and keep only the most recent update for each key</p>"},{"location":"designdeck/#main-drawback-of-shared-nothing-architectures","title":"Main drawback of shared-nothing architectures","text":"<p>Reduce flexibility</p> <p>If the application needs to access to new data access patterns in an efficient way, it might be hard to provide it given the system's data have been partitioned in a specific way</p> <p>Example: attempting to query by a secondary attribute that is not the partitioning key might require to access all the nodes of the system</p>"},{"location":"designdeck/#mapreduce","title":"MapReduce","text":"<p>Programming model for processing large amounts of data in bulk across many machines: - Map: processes a set of key/value pairs and produces as output another set of intermediate key/value pairs. - Reduce: receives all the values for each key and returns a single value, essentially merging all the values according to some logic</p>"},{"location":"designdeck/#microservices-pros-and-cons","title":"Microservices: pros and cons","text":"<p>Pros: - Organizational (each team dictates its own release schedule, etc.) - Codebase is easier to digest - Strong boundaries - Independent scaling - Independent data model</p> <p>Cons: - Eventual consistency - Remote calls - Harder to operate (more complex)</p>"},{"location":"designdeck/#number-of-values-to-generate-to-reach-50-chances-of-collision-32-bit-64-bit-and-128-bit-hash","title":"Number of values to generate to reach 50% chances of collision: 32-bit, 64-bit, and 128-bit hash","text":"<ul> <li>32: 80 k</li> <li>64: 5 billion</li> <li>128: 3e+18 (1 billion hashes generated every second for 100 years)</li> </ul>"},{"location":"designdeck/#orchestration-vs-choreography","title":"Orchestration vs. choreography","text":"<p>Orchestration: single central system responsible for coordinating the execution</p> <p>Choreography: no need for a central coordinator, each system is aware of the previous and the next</p>"},{"location":"designdeck/#outbox-pattern","title":"Outbox pattern","text":"<p>Used to update a DB and publish an event in a transactional fashion</p> <p>Within a transaction, persist in the DB (insert, update or delete) and insert at the same time a new row in an event table</p> <p>Implements a worker that checks the event table, publishes an event and deletes the row (at least once guarantee)</p>"},{"location":"designdeck/#perfect-hashing","title":"Perfect hashing","text":"<p>No collision, only possible if we know the keys up front</p> <p>Given k elements, the hashing function returns an int between 0 and k</p>"},{"location":"designdeck/#quadtree","title":"Quadtree","text":"<p>Tree data structure where each internal node has exactly four children: NE, NW, SE, SW</p> <p>Main use case: - Improve geospatial caching (e.g., 1km in an urban area isn't the same as 1km outside cities)</p> <p>Source: https://engblog.yext.com/post/geolocation-caching</p>"},{"location":"designdeck/#rate-limiting-throttling-definition-and-algos","title":"Rate-limiting (throttling): definition and algos","text":"<p>Mechanism that rejects a request when a specific quota is exceeded</p>"},{"location":"designdeck/#token-bucket-algo","title":"Token bucket algo","text":"<p>Token of a pre-defined capacity, put back in the bucket periodically:</p> <p></p>"},{"location":"designdeck/#leaking-bucket-algo","title":"Leaking bucket algo","text":"<p>Uses a FIFO queue When a request arrives, checks if the queue is full: - If yes: request is dropped - If not: added to the queue   =&gt; Requests pulled from the queue at regular intervals</p> <p></p>"},{"location":"designdeck/#rebalancing","title":"Rebalancing","text":"<p>Move data or services from one node to another in order to spread the load fairly</p>"},{"location":"designdeck/#rest","title":"REST","text":"<p>Architectural style where the server exposes a set of resources</p> <p>All communications must be stateless and cacheable</p> <p>Relies mainly on HTTP but not mandatory</p>"},{"location":"designdeck/#rest-vs-grpc","title":"REST vs. gRPC","text":"<p>REST (architectural style): - Universality - Standardization (status code, ETag, If-Match, etc.)</p> <p>gRPC (RPC framework): - Contract - Binary protocol (faster, less bandwidth) // We could use HTTP/2 without gRPC and leverage binary protocols but it would require more efforts - Bidirectional</p>"},{"location":"designdeck/#safety-property","title":"Safety property","text":"<p>Something bad will never happen</p> <p>Example: leader election eventually completes</p>"},{"location":"designdeck/#saga","title":"Saga","text":"<p>Distributed transaction composed of a set of local transactions</p> <p>Each transactions has a corresponding compensation action to undo its changes</p> <p>Usually, a Saga is implemented with an orchestrator that manages the execution of the transactions and handles the compensations if needed</p>"},{"location":"designdeck/#scalability","title":"Scalability","text":"<p>System's ability to cope with increased load</p>"},{"location":"designdeck/#scalability-ceiling","title":"Scalability ceiling","text":"<p>Hard limit (e.g., device maximum throughput)</p>"},{"location":"designdeck/#shared-nothing-architectures","title":"Shared-nothing architectures","text":"<p>Reduce coordination and contention so that every request can be processed independently by a single node or group of nodes</p> <p></p> <p>Increase availability, performance, and scalability</p>"},{"location":"designdeck/#source-of-truth","title":"Source of truth","text":"<p>Holds the authoritative version of the data</p>"},{"location":"designdeck/#split-brain","title":"Split-brain","text":"<p>Network partition =&gt; nodes unable to communicate with each other =&gt; multiple nodes believing they are the leader</p> <p>As a node is unaware that another node is still functioning, it can lead to data corruption or data loss</p>"},{"location":"designdeck/#throughput","title":"Throughput","text":"<p>The rate of work performed</p>"},{"location":"designdeck/#total-vs-partial-order","title":"Total vs. partial order","text":"<p>Total order: a binary relation that can be used to compare any 2 elements of a set with each other</p> <p>Partial order: a binary relation that can be used to compare only some of the elements of a set with each other</p> <p>Total ordering in distributed systems is rarely mandatory</p>"},{"location":"designdeck/#uuid","title":"UUID","text":"<p>128-bit number</p> <p>Collision probability: after generating 1 billion UUID every second for ~100 years, the probability of creating a single duplicate reaches 50%</p>"},{"location":"designdeck/#validation-vs-verification","title":"Validation vs. verification","text":"<p>Validation: process of analyzing the parts of the system and building mental models that reflects the interaction of those parts</p> <p>Example: validate the quality of water by inspecting all the pipes and infrastructure to capture, clean and deliver water</p> <p>Verification: process of analyzing output at a system boundary</p> <p>Example: validate the quality of water by testing the water (output) coming from a sink</p>"},{"location":"designdeck/#vector-clock","title":"Vector clock","text":"<p>Algorithm that generates partial ordering of events and detects causality violation</p>"},{"location":"designdeck/#why-asynchronous-communication","title":"Why asynchronous communication","text":"<p>Reduce temporal coupling (not connected at the same time) =&gt; processes execute at independent rates, without blocking the sender</p> <p>If the interaction pattern isn't request/response with client blocking until it receives the response</p>"},{"location":"designdeck/#http","title":"HTTP","text":""},{"location":"designdeck/#301-vs-302","title":"301 vs. 302","text":"<p>301: redirect permanently</p> <p>302: redirect temporarily</p>"},{"location":"designdeck/#403-or-404","title":"403 or 404?","text":"<p>Retuning 403 can leak existence of a resource</p> <p>Example: Apple is secretly working on super cars and creates an internal GET <code>https://apple.com/supercar</code> endpoint</p> <p>Returning 403 means the user doesn't have the rights to access the resource, but leaks the existence of <code>/supercar</code></p>"},{"location":"designdeck/#cookie","title":"Cookie","text":"<p>Small files stored on a user's computer to hold specific data (e.g., language preference)</p> <p>Requests made by the browser will contain cookies data</p> <p>Types of cookies: - Session cookies: only lasts for the duration of a session - Persistent cookies: outlast user session - Third-party cookies: used for advertising</p>"},{"location":"designdeck/#four-main-http2-features","title":"Four main HTTP/2 features","text":"<ul> <li>Request multiplexing: multiple requests over a single TCP connection   =&gt; Prioritization can now be part of the request</li> <li>Server push</li> <li>Binary protocol (lower overhead in decoding data, smaller network footprint)</li> <li>Header compression</li> </ul>"},{"location":"designdeck/#hls","title":"HLS","text":"<p>HTTP live streaming: video streaming protocol</p>"},{"location":"designdeck/#http_1","title":"HTTP","text":"<p>Request/response protocol used to encode and transport information between a client and a server Stateless (each request is executed independently)</p> <p>The request and the response are 2 standard message types exchanged in a single HTTP transaction - Request: method, URL, HTTP version, headers, body - Response: HTTP version, status, reason, headers, body</p> <p>Example of a POST request:</p> <p>```http request POST https://example.com HTTP/1.0 Host: example.com User-Agent: Mozilla/4.0 Content-Length: 5</p> <p>Hello ```</p> <p>Application layer protocol (OSI level 7)</p> <p>Relies on a transport protocol (OSI level 4, TCP most of the time but not mandatory) for error detection, flow control, reliability, etc.</p>"},{"location":"designdeck/#http-cache-control-header","title":"HTTP cache-control header","text":"<p>Allows setting how long to cache a response</p> <p>Part of the response header (hence, cached by the browser) but can be part of the request header too (hence, cached on server side)</p> <p>If request header marked as private, the results are intended for a single user (then won't be cached by a load balancer for example)</p>"},{"location":"designdeck/#http-etag","title":"HTTP Etag","text":"<p>Entity tag header that allows clients to make conditional requests</p> <p>Server returns an ETag being the date and time of the last update of a resource</p> <p>Client sends a <code>If-Match</code> header to update a resource only if clients have the most recent version</p>"},{"location":"designdeck/#http-keep-alive","title":"HTTP keep-alive","text":"<p>Maintain a persistent TCP connection (reduces the number of TCP and HTTPS handshakes)</p>"},{"location":"designdeck/#http-methods-safeness-and-idempotence","title":"HTTP methods: safeness and idempotence","text":"<ul> <li>GET: safe, idempotent</li> <li>PUT: not safe, idempotent</li> <li>POST: not safe, not idempotent</li> <li>DELETE: not safe, idempotent</li> </ul>"},{"location":"designdeck/#http-safe-method","title":"HTTP safe method","text":"<p>Doesn't have any visible side effects and can be cached</p>"},{"location":"designdeck/#http-status-code-429","title":"HTTP status code 429","text":"<p>When clients are throttled, the most common way is to return a 429 (Too Many Requests)</p> <p>The response can also include a Retry-After header indicating how long to wait before making a new request (in seconds)</p>"},{"location":"designdeck/#http-status-codes","title":"HTTP status codes","text":"<ul> <li>2xx: success</li> <li>3xx: redirection</li> <li>4xx: client error</li> <li>5xx: server error</li> </ul>"},{"location":"designdeck/#what-happens-if-you-type-googlecom-in-your-browser","title":"What happens if you type google.com in your browser","text":"<ul> <li>URL parsing</li> <li>HSTS lookup (HTTP Strict Transport Security: list of websites that have requested to be contacted via HTTPS only)</li> <li>DNS lookup:<ul> <li>Is DNS record cached in browser?</li> <li>If not present, check if the hostname can be resolved by reference in the local hosts file</li> <li>If not present, DNS lookup (typically to the ISP DNS) // Uses ARP to get the MAC address of the DNS IP address</li> </ul> </li> <li>Opens a TCP socket</li> <li>TCP handshake</li> <li>HTTPS handshake</li> <li>HTTP request (GET, port 80)</li> <li>Receive HTML, javascript (to be executed on client side), and images. Data can be cached by the browser using HTTP Etag.</li> </ul> <p>Source: https://github.com/alex/what-happens-when</p>"},{"location":"designdeck/#kafka","title":"Kafka","text":""},{"location":"designdeck/#consumer-types","title":"Consumer types","text":"<p>Without consumer group: each consumer will receive all the messages in a topic</p> <p>With consumer group: each consumer will receive a subset of the messages</p> <p>Each consumer is assigned to multiple partitions (zero to many)</p> <p>A partition is always assigned to only one consumer</p> <p>If there are more consumers than partitions, some consumers will not be assigned to any partition (scalability ceiling)</p>"},{"location":"designdeck/#durabilityavailability-and-latencythroughput-tradeoffs","title":"Durability/availability and latency/throughput tradeoffs","text":"<p>Source: https://developers.redhat.com/articles/2022/05/03/fine-tune-kafka-performance-kafka-optimization-theorem#kafka_priorities_and_the_cap_theorem</p>"},{"location":"designdeck/#log-compaction_1","title":"Log compaction","text":"<p>Log compaction is a mechanism to give per-record retention to a topic</p> <p>It ensures that Kafka will always retain at least the last message for each key of a given partition</p> <p>A partition that is not yet compacted may have more than one message with the same key</p> <p>Property: - <code>retention.ms</code>: maximum time the topic will retain old log segments before deleting or compacting them (default: 7 days)</p> <p>For low-throughput topic (topics with segments that should be rolled out because of <code>segment.ms</code> rather than <code>segment.bytes</code>), we should ensure that segment.ms is lower than <code>retention.ms</code></p>"},{"location":"designdeck/#offset","title":"Offset","text":"<p>A strictly increasing identifier per partition</p>"},{"location":"designdeck/#partition","title":"Partition","text":"<p>Topics are divided into partitions</p> <p>A partition is an ordered, immutable log of messages</p> <p>No guaranteed ordering per topic with multiple partitions</p> <p>Yet, the ordering is guaranteed per partition</p>"},{"location":"designdeck/#partition-distribution","title":"Partition distribution","text":"<p>The client implements a partitioner based on the key (e.g., hash(key) % number of partitions)</p> <p>This is not done on Kafka's side</p> <ul> <li>Default hash in Java: murmur2</li> <li>Default hash in Go: FNV-1a</li> </ul> <p>If key is empty: round-robin</p>"},{"location":"designdeck/#rebalancing_1","title":"Rebalancing","text":"<p>Not possible to decrease the number of partitions: topic has to be recreated</p> <p>Possible to increase the number of partitions</p> <p>Possible issue: no more guaranteed ordering as one key may be assigned to a different partition</p>"},{"location":"designdeck/#segment","title":"Segment","text":"<p>Each partition is divided into segments</p> <p>Instead of storing all the messages of a partition in a single file, Kafka splits them into chunks called segments A log segment is a file identified by the first message offset it contains</p> <p>Properties: - <code>segment.bytes</code>: maximum segment file size before creating a new segment (default: 1GB) - <code>segment.ms</code>: period after which a new segment is created, even if the segment is not full (default: 7 days)</p>"},{"location":"designdeck/#shared-subscription","title":"Shared subscription","text":"<p>Distribute messages</p> <p>All the consumers from one consumer group receive a portion of the messages</p> <p>One partition is assigned to one consumer, one consumer can listen to multiple partitions</p>"},{"location":"designdeck/#math","title":"Math","text":""},{"location":"designdeck/#associative-property","title":"Associative property","text":"<p>A binary operation is associative if rearranging the parentheses in an expression will not change the result</p> <p>Example: <code>+</code> is associative; e.g., (2 + 3) + 4 = 2 + (3 + 4)</p>"},{"location":"designdeck/#commutative-property","title":"Commutative property","text":"<p>A binary operation is commutative if changing the order of the operands doesn't change the result</p> <p>Example: <code>+</code> is commutative, <code>/</code> isn't commutative</p>"},{"location":"designdeck/#harmonic-mean","title":"Harmonic mean","text":"<p>x1: probability of p1 (e.g. 0.5)</p> <p>Less sensitive to large outliers</p>"},{"location":"designdeck/#network","title":"Network","text":""},{"location":"designdeck/#arp-protocol","title":"ARP protocol","text":"<p>Map an IP address to a MAC address</p>"},{"location":"designdeck/#average-connection-speed-in-usa","title":"Average connection speed in USA","text":"<p>42 Mbps</p>"},{"location":"designdeck/#backpressure","title":"Backpressure","text":"<p>A node limits its own rate of sending in order to avoid overloading. Queueing is done on the sender side.</p> <p>Also known as flow control</p> <p>Example: TCP flow control</p>"},{"location":"designdeck/#bandwidth","title":"Bandwidth","text":"<p>Maximum amount of data that can be transferred in a unit of time</p>"},{"location":"designdeck/#bgp","title":"BGP","text":"<p>Border Gateway Protocol: Routing system of the internet</p> <p>When a client submits data via the Internet, BGP is responsible for looking at all of the available paths that data could travel and picking the best route</p> <p>Note: The chosen route isn't necessarily the fastest one, it can be the cheapest one. See https://technology.riotgames.com/news/fixing-internet-real-time-applications-part-i.</p>"},{"location":"designdeck/#cors","title":"CORS","text":"<p>Cross-origin resource sharing</p> <p>Mechanism to allow restricted resources on a page to be requested from another domain outside the domain from which the resource was served</p> <p>It extends and adds flexibility to SOP (Same-Origin Policy, same domain)</p> <p>Example: User visits A and the page attempts to fetch data from B: 1. Browser sends a GET request to B with Origin header A 2. Server may respond with: - Access-Control-Allow-Origin (ACAO) header set to the domain A - ACAO set to a wildcard (*) indicating that the requests from all domains are allowed - An error if the server does not allow a cross-origin request</p>"},{"location":"designdeck/#difference-ping-heartbeat","title":"Difference ping &amp; heartbeat","text":"<p>Ping: sends messages to a process and expects a response within a specified time period (request-reply)</p> <p>Heartbeat: a process is actively notifying its peers that it's still running by sending a message (notification)</p>"},{"location":"designdeck/#difference-tcp-udp","title":"Difference TCP &amp; UDP","text":"<ul> <li>Connection-oriented / Connectionless</li> <li>Reliable / Unreliable</li> <li>Ordered / Unordered</li> <li>Heavyweight / Leightweight</li> </ul>"},{"location":"designdeck/#difference-view-materialized-view","title":"Difference view &amp; materialized view","text":"<p>A view is just an abstraction (SQL request is rewritten to match the actual schema)</p> <p>A materialized view is a copy (written to disk)</p>"},{"location":"designdeck/#dns","title":"DNS","text":"<p>Domain Name System: automatic translation between a name and an IP address</p> <p></p> <p>Notes: - Usually the local DNS configuration is the ISP one (config initialized from the router or static config) - The browser, the OS and the DNS resolver all use caches internally - A TTL is used to inform the cache how long the entry is valid</p>"},{"location":"designdeck/#dns-lookup-push-or-pull","title":"DNS lookup: push or pull","text":"<p>DNS is based on the pull mode: - If record is present: DNS will return it - If record isn't present: DNS will pull the value, store it, and then return it</p> <p>Notes: - New DNS records are immediate - DNS updates are slow because of TTL (there is no propagation, we wait for cached records to expire)</p>"},{"location":"designdeck/#health-checks-passive-vs-active","title":"Health checks: passive vs. active","text":"<p>Passive: performed by the load balancer as it routes incoming requests (e.g., 503)</p> <p>Active: the load balancer actively checking the health of the servers via a query to their health endpoint</p>"},{"location":"designdeck/#internet-model","title":"Internet model","text":"<p>A network of networks</p> <p></p>"},{"location":"designdeck/#layer-4-vs-layer-7-load-balancer","title":"Layer 4 vs. layer 7 load balancer","text":"<p>Layer 4 is faster and requires less computing resources than layer 7 is but less flexible</p> <p>Layer 4: look at the info at the transport layer to distribute the requests (source, destination, port)</p> <p>Forward packet using NAT</p> <p>Layer 7: look at the info at the application layer to distribute the requests (header, message, etc.)</p> <p>Terminate the network traffic, read then open a connection to the target server</p> <p>A layer 7 can de-multiplex individual HTTP requests where multiple concurrent streams are multiplexed on the same TCP connection</p>"},{"location":"designdeck/#mac-address","title":"MAC address","text":"<p>A unique identifier assigned to a network interface</p>"},{"location":"designdeck/#max-size-of-a-tcp-packet","title":"Max size of a TCP packet","text":"<p>64K</p>"},{"location":"designdeck/#mqtt-lwt","title":"MQTT LWT","text":"<p>Last Will and Testament</p> <p>Whenever a client is marked as disconnected (proper disconnection or heartbeat failure), it triggers to send a message in a particular topic</p>"},{"location":"designdeck/#ntp","title":"NTP","text":"<p>Network Time Protocol: used to synchronize clocks</p>"},{"location":"designdeck/#osi-model","title":"OSI model","text":"<p>7 layers: 1. Physical: transmission of raw bits over a physical link (e.g., USB, Bluetooth) 2. Data link: responsible from moving a packet of data from one node to a neighbouring node 3. Network: provides a way of sending packets between nodes that are not directly linked and might belong to other networks (e.g., IP, iptables routing) 4. Transport: application to application communication, based on ports when multiple applications on the same node wants to communicate (e.g., TCP, UDP) 5. Session 6. Presentation 7. Application: protocol of exchanges between the two sides (e.g., DNS, HTTP)</p>"},{"location":"designdeck/#routers","title":"Routers","text":"<p>A way to connect networks that are connected with each other (used for the Internet)</p> <p>Capable of routing packets properly across networks so that they reach their destination successfully</p> <p>Based on the fact that an IP has a network prefix</p>"},{"location":"designdeck/#routers-buffering","title":"Routers buffering","text":"<p>Routers use queuing (buffering) to address network congestion</p> <p>A buffer has a fixed size and a fixed number of packets</p> <p>If no available buffer: packet is dropped</p> <p>Note: not a way to increase the throughput</p>"},{"location":"designdeck/#routers-processing","title":"Routers processing","text":"<p>Per-packet processing, no buffering</p> <p>Impacts: - It\u2019s faster to route 10 packets of 1000 bytes than 20 packets of 500 bytes - Sending small packets more frequently can fill the router buffer more quickly</p> <p>Source: https://technology.riotgames.com/news/fixing-internet-real-time-applications-part-i</p>"},{"location":"designdeck/#routing-table","title":"Routing table","text":"<ul> <li>Network destination and mask (together form the network identifier)</li> <li>Gateway: next node to which a packet has to be sent</li> <li>Interface: corresponding interface through which the gateway can be reached</li> </ul> <p>Example:</p> Destination Network mask Gateway Interface 0.0.0.0 0.0.0.0 240.1.1.3 if1 240.1.1.0 255.255.255.0 0.0.0.0 if1"},{"location":"designdeck/#service-mesh","title":"Service mesh","text":"<p>All network traffic from a client goes through a process co-located on the same machine (sidecar)</p> <p>Used to facilitate service-to-service communications</p>"},{"location":"designdeck/#switch","title":"Switch","text":"<p>Receive frame and forward to specific links they are addressed to. Used for local networks.</p> <p>Example: Ethernet frame</p> <p></p> <p>To do this, the switch maintains a switch table that maps MAC addresses to the corresponding interfaces that lead to them</p> <p>At first, the switch table is empty. If the entry is empty, a frame is forwarded to all the interfaces (switches are self-learning)</p>"},{"location":"designdeck/#tcp-congestion-control","title":"TCP congestion control","text":"<p>Determine dynamically the throughput (the number of segments that can be sent without an ack): - Increase exponentially for every segment ack - Decrease with a missed ack</p> <p>Upon a new connection, the size of the window is set to a system default</p> <p>It's one of the reasons why reusing a TCP connection leads to a performance increase</p>"},{"location":"designdeck/#tcp-connection-backlog","title":"TCP connection backlog","text":"<p>SYN requests are queued before being accepted by a user-mode process</p> <p>When there are too many requests for the process, the backlog reaches a limit and SYN packets are dropped (to be later retransmitted by the client)</p>"},{"location":"designdeck/#tcp-flow-control","title":"TCP flow control","text":"<p>A receiver communicates back to the sender the size of the buffer when acknowledging a segment</p> <p>Backpressure mechanism</p>"},{"location":"designdeck/#tcp-handshake","title":"TCP handshake","text":"<p>3-way handshake - syn (sender to receiver) - syn-ack (receiver to sender) // ack the segment number received - ack (sender to receiver) // ack the segment number received</p>"},{"location":"designdeck/#websocket","title":"Websocket","text":"<p>Communication protocol (layer 7) provides a full-duplex communication channel over a single TCP connection and bidirectional streaming capabilities</p> <p>Different from HTTP but compatible with HTTP (starts as an HTTP connection and then is upgraded via a well-defined handshake to a TCP connection)</p> <p>Obsolete with HTTP/2</p>"},{"location":"designdeck/#why-cant-we-rely-on-the-system-clock-in-distributed-systems","title":"Why can't we rely on the system clock in distributed systems?","text":"<ul> <li>There's no guarantee that times are synchronized</li> <li>In the case of an NTP synchronization, the system clock of one node can jump backward in time</li> </ul>"},{"location":"designdeck/#reliability","title":"Reliability","text":""},{"location":"designdeck/#bulkhead-pattern","title":"Bulkhead pattern","text":"<p>Provides guaranteed fault isolation by design</p> <p>Based on the idea of partitioning a shared resource to isolate failures</p>"},{"location":"designdeck/#cascading-failure","title":"Cascading failure","text":"<p>A process in a system of interconnected parts in which the failure of one or few parts can trigger the failure of other parts and so on</p>"},{"location":"designdeck/#causal-consistency-implementation","title":"Causal consistency implementation","text":"<p>When a replica receives a new write, it doesn't apply it locally immediately. First, it checks whether the write's dependencies have been committed locally. If not, it waits until the required version appears.</p>"},{"location":"designdeck/#circuit-breaker","title":"Circuit breaker","text":"<p>Used to prevent a network or service failure from cascading to other failures</p> <p>Implemented on the client-side</p> <p>Three states: - Closed: accept requests - Open: do not accept requests and fail immediately - Half-open: give the service another chance (can also be implemented using a probe)</p> <p>The circuit can be opened when the health endpoint of the service is down or when the number of consecutive errors reaches a threshold</p>"},{"location":"designdeck/#exponential-backoff","title":"Exponential backoff","text":"<p>Wait time increased exponentially after every retry attempt</p>"},{"location":"designdeck/#fault-tolerance","title":"Fault tolerance","text":"<p>Property of a system that can continue operating correctly in the presence of failure of its components</p>"},{"location":"designdeck/#jitter","title":"Jitter","text":"<p>Introduces a part of randomness to avoid synchronized retry spikes experienced during cascading failures</p>"},{"location":"designdeck/#knee-point","title":"Knee point","text":"<p>Moment when linear scalability is not possible anymore</p>"},{"location":"designdeck/#phi-accrual-failure-detector","title":"Phi-accrual failure detector","text":"<p>Instead of treating failure node failure as a binary problem (up or down), a phi-accrual failure detector has a continuous scale, capturing the probability of the monitored process's crash</p> <p>Works by maintaining a sliding window, collecting arrival times of the most recent heartbeats</p> <p>Used to approximate the arrival time of the next heartbeat and compute a suspicion level (how certain the failure detector is about a failure)</p>"},{"location":"designdeck/#retry-amplification","title":"Retry amplification","text":"<p>Having retries at multiple levels of the dependency chain can amplify the number of retry</p> <p>The deeper a service in the chain, the higher the load it will be exposed to due to amplification:</p> <p></p> <p>In case of a long dependency chain, perhaps we should only retry at a single level of the chain</p>"},{"location":"designdeck/#security","title":"Security","text":""},{"location":"designdeck/#authentication","title":"Authentication","text":"<p>Process of determining whether someone or something is who or what it declares itself to be</p>"},{"location":"designdeck/#certificate-authorities","title":"Certificate authorities","text":"<p>Organizations issuing certificates by signing them</p>"},{"location":"designdeck/#cipher","title":"Cipher","text":"<p>Encryption algorithm</p>"},{"location":"designdeck/#confidentiality","title":"Confidentiality","text":"<p>Process of protecting information from being accessed by unauthorized parties</p> <p>Mainly achieved via encryption</p>"},{"location":"designdeck/#integrity","title":"Integrity","text":"<p>The process of preserving the accuracy and completeness of data over its entire lifecycle, so that they cannot be modified in an unauthorized or undetected manner</p>"},{"location":"designdeck/#mutual-tls","title":"Mutual TLS","text":"<p>Add client authentication using a certificate</p>"},{"location":"designdeck/#oauth-2","title":"OAuth 2","text":"<p>Standard for access delegation</p> <p>Process - Client gets a token from an authorization server - Makes a request to a server using the token - Server validates the token to the authorization server</p> <p>Notes: some token types like JWT are self-contained, meaning the validation can be done by the server without a call to the authorization server</p>"},{"location":"designdeck/#public-key-infrastructure-pki","title":"Public key infrastructure (PKI)","text":"<p>System for managing, storing, and distributing certificates</p> <p>Relies on certificate revocation lists (CRLs)</p>"},{"location":"designdeck/#tls-handshake","title":"TLS handshake","text":"<p>With mutual TLS:</p> <ol> <li>Client hello: protocol, cipher, etc.</li> <li>Server hello: supported cipher, etc.</li> <li>Server sends its certificate</li> <li>Client checks the server certificate (e.g., make sure the CA are trusted in its truststore, etc.)</li> <li>Client sends its certificate</li> <li>Server checks the client certificate</li> <li>The client generates a session key encrypted with the public key of the client certificate (asymmetric encryption) and sends it to the server</li> <li>Client sends data and encrypts each packet using the session key (symmetric encryption)</li> </ol> <p>One way: the session key is generated by the client</p>"},{"location":"designdeck/#two-main-uses-of-encryption","title":"Two main uses of encryption","text":"<p>Encryption in transit</p> <p>Encryption at rest</p>"},{"location":"designdeck/#two-types-of-encryption","title":"Two types of encryption","text":"<p>Symmetric: key is shared between a client and a server (faster)</p> <p>Asymmetric: two keys are used, a private and a public one - Client encrypts a message with the public key - Server decrypts the message with its private key</p>"},{"location":"designdeck/#what-does-digital-signature-provide","title":"What does digital signature provide","text":"<p>Integrity and authentication</p>"},{"location":"designdeck/#what-does-tls-provide","title":"What does TLS provide?","text":"<ul> <li>Confidentiality</li> <li>Authentication</li> <li>Integrity</li> </ul>"}]}
\ No newline at end of file
diff --git a/site/sitemap.xml.gz b/site/sitemap.xml.gz
index 5e3c239..ba06a93 100644
Binary files a/site/sitemap.xml.gz and b/site/sitemap.xml.gz differ