• MajinBlayze@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    15 hours ago

    I skimmed the paper, and it seems pretty cool. I’m not sure I quite follow the “diffusion model-based architecture” it mentioned, but it sounds interesting

    • FatCrab@slrpnk.net
      link
      fedilink
      English
      arrow-up
      1
      ·
      7 hours ago

      Diffusion models iteratively convert noise across a space into forms and that’s what they are trained to do. In contrast to, say, a GPT that basically performs a recursive token prediction in sequence. They’re just totally different models, both in structure and mode of operation. Diffusion models are actually pretty incredible imo and I think we’re just beginning to scratch the surface of their power. A very fundamental part of most modes of cognition is converting the noise of unstructured multimodal signal data into something with form and intention, so being able to do this with a model, even if only in very very narrow domains right now, is a pretty massive leap forward.