It's stories like this that don't surprise me as much as make me ask: How the fuck do you store and process this much data to get anything useful out of it.
I don't think they're saying that method would yield 100% clean data but it would give you all the "necessary" data with the absolute bare minimum storage requirement. At some point people will log into their email and for most people if you have their email password you have the password they use for everything
I could be wrong, and this is a generalization of any country you can name, but my impression is data is stored on everyone so when they decide someday to look you up they already have all the data collected. It's not really processed until needed.
The real answer is compute power. At the moment it's very expensive to run the computations necessary for big LLMs, I've heard some companies are even developing specialized chips to run them more efficiently. On the other hand, you probably don't want your phone's keyboard app burning out the tiny CPU in it and draining your battery. It's not worth throwing anything other than a simple model at the problem.