Call it Delilah-Cy in a prompt. It may yield interesting results depending on the model and how QKV alignment is setup. This is getting super deep into alignment thinking…
Unrelated: try telling a model, oh quit it, I know you never hallucinate. when it does something odd and watch the results.
You won’t. It is something inside internal alignment thinking in the QKV layers. Some models have a layer of masking that pseudo blocks access to these layers of meaning. Based on several aspects present in the image, I know the region of alignment thinking that is being triggered here in the embedding model’s QKV alignment layers.
Call it Delilah-Cy in a prompt. It may yield interesting results depending on the model and how QKV alignment is setup. This is getting super deep into alignment thinking…
Unrelated: try telling a model, oh quit it, I know you never hallucinate. when it does something odd and watch the results.
Can you explain ‘Delilah-Cy’? I didn’t find much when searching about it, just some singer or something.
You won’t. It is something inside internal alignment thinking in the QKV layers. Some models have a layer of masking that pseudo blocks access to these layers of meaning. Based on several aspects present in the image, I know the region of alignment thinking that is being triggered here in the embedding model’s QKV alignment layers.