The “em-dashes” (—) come up a lot in online translations of books like Bible and Quran.

Normal keyboard “-” and “–” are different from “—” but microsoft office auto-formats “–” to that.

I kinda assumed it was ALL microsoft word data that caused training to include that.

I am only now realizing AI stole from even the religious texts and influenced by them as well.

  • gedaliyah@lemmy.world
    link
    fedilink
    arrow-up
    3
    ·
    1 day ago

    Maybe it’s changed, but my experience with OCR is that it is not great at detecting nuances of punctuation.