We propose a novel framework CipherChat to systematically examine the generalizability of safety alignment to non-natural languages -- ciphers.
Paper and LLMs
1 AI tools found
We propose a novel framework CipherChat to systematically examine the generalizability of safety alignment to non-natural languages -- ciphers.