They're exaggerating the problem. You can get a users IP and user-agent string, but only in a vacuum, not linked to your username or anything else. And even if you could, this wouldn't be mutch of a problem, because this information gets passed to basically everyone and doesn't reveal mutch(only strongly approximated location(next big city in the worst case) and what browser an operating system you're using). Comparing this to email tracking pixels is a misleading comparison, because these can be connected to a single person(the recipient of the email), making the information more valuable(and add another layer of information, such as the time the email was first opened).
It doesn't give up the IP alone, but it could be a component in that. You could deanonymize people by correlating vote data. That is, if only one user and IP has both viewed an inline image in and voted on any set of posts, then the IP and the user probably go together.
EDIT: That being said, I doubt that it's that much worse than ordinary, non-inline images in that regard, unless users are voting on images without actually viewing them.
Comparing this to email tracking pixels is a misleading comparison, because these can be connected to a single person (the recipient of the email), making the information more valuable...
Lemmy private messages are directly comparable to the individual user scenario of emai, and they support tracking images like this.