From 274df77ef79340cada6bd849cac05c03b0e3a3a2 Mon Sep 17 00:00:00 2001
From: David Marx <david.marx84@gmail.com>
Date: Wed, 20 Mar 2024 23:42:39 -0700
Subject: [PATCH] Create thinking_fast_and_slow.md

---
 thinking_fast_and_slow.md | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)
 create mode 100644 thinking_fast_and_slow.md

diff --git a/thinking_fast_and_slow.md b/thinking_fast_and_slow.md
new file mode 100644
index 000000000..4ce6a4e4e
--- /dev/null
+++ b/thinking_fast_and_slow.md
@@ -0,0 +1,21 @@
+# thinking fast and slow
+
+labels: experimental
+
+weights = W
+
+decompose W into W = W1 + W2 s.t. W1 and W2 have same dimension
+
+set a (alpha) to be a mixing rate, which starts at zero.
+
+Learn W as W = W1 + a * W2, increasing `a` throughout training in proportion to lr
+
+W2 are the "slow" weights and are learned conventionally
+
+W1 will be learned parameterized as a hyperlora, and so are our "fast" weights.
+
+let W1 = VZ where V is a learnable vector and Z is a fixed, randomly initialized orthonormal matrix (i.e. random projections)
+
+the "slow" weights are essentially a residual.
+
+we could "stack" residuals if we wanted higher-order granularity