Popular repositories Loading
-
refusal_direction
refusal_direction PublicCode and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".
-
-
-
-
CircuitsVis
CircuitsVis PublicForked from TransformerLensOrg/CircuitsVis
Mechanistic Interpretability Visualizations using React
Jupyter Notebook
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.