From 274df77ef79340cada6bd849cac05c03b0e3a3a2 Mon Sep 17 00:00:00 2001 From: David Marx Date: Wed, 20 Mar 2024 23:42:39 -0700 Subject: [PATCH] Create thinking_fast_and_slow.md --- thinking_fast_and_slow.md | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) create mode 100644 thinking_fast_and_slow.md diff --git a/thinking_fast_and_slow.md b/thinking_fast_and_slow.md new file mode 100644 index 000000000..4ce6a4e4e --- /dev/null +++ b/thinking_fast_and_slow.md @@ -0,0 +1,21 @@ +# thinking fast and slow + +labels: experimental + +weights = W + +decompose W into W = W1 + W2 s.t. W1 and W2 have same dimension + +set a (alpha) to be a mixing rate, which starts at zero. + +Learn W as W = W1 + a * W2, increasing `a` throughout training in proportion to lr + +W2 are the "slow" weights and are learned conventionally + +W1 will be learned parameterized as a hyperlora, and so are our "fast" weights. + +let W1 = VZ where V is a learnable vector and Z is a fixed, randomly initialized orthonormal matrix (i.e. random projections) + +the "slow" weights are essentially a residual. + +we could "stack" residuals if we wanted higher-order granularity