(CW: literary prescriptivism, for some definition of ‘literary’)
Adding dry captions(*) to incidental images on Facebook/Tumblr/etc. posts (or: alt text to images on web pages) is as close to a definite ceterus paribus improvement as I can think of. Readers using screenreading software or its accessibility-oriented ilk to browse the web are able to read [sic] those captions, aiding their consumption of the text. For “typical” readers (i.e. those without visual impairments, browsing the web without additional software assistance) the additional text hardly poses a nuisance — in the case of alt text, they don’t even engage with it unless they specifically go looking for it, whilst suitably demarcated image captions are easy to skim past.
A good caption is also generally a good inferential bridge, conveying most of the context the image provides. Sure, a text description of an image might not produce the exact same mental-emotional experience (“affect”, I believe the kids are calling it) as the image itself, but if “same mental-emotional experience” were an end goal we’d be done for anyway given it’s a subjective experience.
Take, for instance, the screenshot above. The caption swiftly communicates the most important emotional aspects of the image, both the implicit (it’s a couple; they’re happy) and the explicit (they are facing the camera; her chin rests on his shoulder). This provides great insight into the ‘value add’ that the image provides — it’s plenty to go on whether you’re making sense of someone else’s comment on the image (“they are the cutest thing” / “I love their expressions”), or whether you’re just interested in how it complements the piece.