For some images, there is dialog, so the test could be transferred directly (Manga, comic books, etc). But many many images don’t have any text at all, and often only one character.

I think there’s probably some version of ‘the male gaze’ that could be avoided. But maybe someone has a more canonical answer?