Skip Navigation

Why would a UTF-8 MySQL backup contain invalid UTF-8 characters?

I’ve been running into several problems with restoring MySQL backups. Namely, the backups come from an environment other than the one I’m working in and I’m forced to remove superuser commands contained in the backups.

The problem is when trying to remove those commands I’m constantly getting UTF-8 encoding errors because there are loads of invalid character sequences.

Why would MySQL encode a backup as UTF-8 if the data isn’t actually UTF-8? This feels like bad design to me.

You're viewing a single thread.

18 comments
  • Encoding is hard. Especially when your data comes from web forms or CSV files. And MySQL needed three tries to get UTF-8 right and you need DB admins and often programmers as well who know this. So not everything MySQL calls UTF-8 actually is.

    And often enough it took a long while for something to actually reach UTF-8 status. And idiots not converting the data leads to databases with a mixture of encodings.

18 comments