Also CS doesn't really do academia like other sciences, being somewhere on the intersection of maths, engineering, and tinkering. Shit's definitely not invalid just because it hasn't been submitted to a journal this could've been a blog post but there's academics involved so publish or perish applies.
Or, differently put: If you want to review it, bloody hell do it it's open access. A quick skim tells me "way more thorough than I care to read for the quite less than extraordinary claim".
You are overrating peer reviewing. It's basically a tool to help editors to understand if a paper "sells", to improve readability and to discard clear garbage.
If methodologies are not extremely flawed, peer reviewing almost never impact quality of the results, as reviewers do not redo the work. From the "trustworthy" point of view, peer reviewing is comparable to a biased rng. Google for actual reproducibility of published experiments and peer-reviewing biases for more details
Most peer reviewed papers are non reproducible. Peer review has the primary purpose of telling the editor how sellable is a paper in a small community he only superficially knows, and to make it more attractive to that community by suggesting rephrasing of paragraphs, additional references, additional supporting experiment to clarify unclear points.
But it doesn't guarantees methodology is not flawed. Editor chooses reviewer very superficially, and reviews are mainly driven by biases, and reviewers cannot judge the quality of a research because they do not reproduce it.
Honesty of researchers is what guarantees quality of a paper
There is some variation across disciplines; I do think that in general the process does catch a lot of frank rubbish (and discourages submission of obvious rubbish), but from time to time I do come across inherently flawed work in so-called “high impact factor” and allegedly “prestigious” journals.
In the end, even after peer review, you need to have a good understanding of the field and to have developed and applied your critical appraisal skills.
And TBF just getting on arxiv also means you jumped a bullshit hurdle: Roughly speaking you need to be in a position in academia, or someone there needs to vouch for the publication. At the same time getting something published there isn't exactly prestigious so there's no real incentive to game the system, as such the bar is quite low but consistent.
Arxiv is a pre print archive. Many very prestigious researchers put their pre prints there. It is as credible as any journal (more than many out there nowadays). Its presentation is just less curated and a selection is missing, because there is no editor. Readers of a paper must know what they are reading, and must critically assess it.
Mostly when it comes to the types of papers I read them being shoddy involves issues of the type "yeah this has good asymptotic performance and even the constants are good but we're completely thrashing caches and to get it published we cherry-picked the algorithms we benchmark against so we still come out on top, or near the top but can say that our way to do things is simpler". Or even better "let's not do benchmarks at all but overload the paper with Greek and call it theory in the hopes nobody ever tries to implement it".
And I'm not even blaming people for it, the issue being that these kinds of results should be published for the sake of science and not having to duplicate work but people need to jazz it up to get their papers accepted. The metric for "contribution to the field" is fucked: It was a valiant effort, it didn't really pan out, can't hit the target without missing a couple of times first and with each try you learn and so did I from reading the paper. "Algorithm doesn't actually produce the output it's supposed to produce" is virtually unheard of, at least in a fraudulent manner. It's after all much easier to get things to be correct than to get them to be fast.
This paper isn't your usual CS paper though, "having humans do stuff and analyse what they did and what they think of it" isn't exactly a CS methodology, what happens in those cases is that researchers ask for help from a random researcher down the hallway working in a field which uses suitable methods. Peer review at USENIX won't check that methodology for sanity because the peers there have no real idea either.
As to the novelty of the claim: Pretty much restricted to "this annoys humans more than it annoys bots". That captchas can be beat by bots is well-established in the field (both in the "academic" and "wearing a BOFH t-shirt" sense), that they're annoying is so painfully obvious only psychologists would dare to challenge it, so the claim is indeed restricted to "have they lost 99% or 110% of their value when you value the sanity of your human users".
Absolutely. One needs to know what is reading. That's why pre prints are fine.
High impact factor journals are full of works purposely wrong, made because author wants the results that readers are looking for (that is the easiest way to be published in high impact factor journal).