Also: explainable moral judgments, resisting deception, constrained decoding with speculative lookahead
Share this post
[papers] deliberative alignment, faking…
Share this post
Also: explainable moral judgments, resisting deception, constrained decoding with speculative lookahead