• 𞋴𝛂𝛋𝛆@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    ·
    1 day ago

    Call it Delilah-Cy in a prompt. It may yield interesting results depending on the model and how QKV alignment is setup. This is getting super deep into alignment thinking…

    Unrelated: try telling a model, oh quit it, I know you never hallucinate. when it does something odd and watch the results.

    • davidgro@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      23 hours ago

      Can you explain ‘Delilah-Cy’? I didn’t find much when searching about it, just some singer or something.

      • 𞋴𝛂𝛋𝛆@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        22 hours ago

        You won’t. It is something inside internal alignment thinking in the QKV layers. Some models have a layer of masking that pseudo blocks access to these layers of meaning. Based on several aspects present in the image, I know the region of alignment thinking that is being triggered here in the embedding model’s QKV alignment layers.