what debugging regex feels like
what debugging regex feels like
what debugging regex feels like
I found your email address:
(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
I was about to ruin your day by finding a valid email address that would be rejected by your regex, but it doesn't even parse correctly on regex101.com
The only valid regex for email is .+@.+
btw
That \\.
part doesn't look right, but what do I know. Apparently control codes are valid elsewhere, so a literal backslash followed by any character, even a space or a newline, might actually be valid there.
"Yeah, my e-mail address is abc, carriage return, three backspaces and a terminal bell at example dot com. ... What do you mean your mail program doesn't support it?"
It helps if you break it apart into its component parts. Which is like anything else, really, but we've all accepted that regexes are supposed to run together in an unreadable mess. No reason it has to be that way.
If they are Perl regexes, like all regexes are supposed to be, you can have non-semantic whitespace and comments.
But if you are using some system that enforces something different, you are out of luck.
Not necessarily. For just debugging purposes, you can still break them up to help understand them. Even ignoring that, there are options in languages that don't implement /x.
At my company we store our regex in the database with linebreaks in it, but when it's actually called to be used those line breaks are stripped out. That way regex that looks for X can all be all on one line and actually readable.
The comments flag needs more support.
I have found chatgpt to be very good at writing regex. I also don't know how to write regex.
In my experience, it is good at simple to medium complexity regex. For the harder ones it starts being quite useless though, at best providing a decent starting point to begin debugging from.
well, you won't get better using chatgpt for it
Just pop them into regex101 or a similar tool, add sample data, see the mistake, fix the mistake, continue to do other stuff.
Just pop them into regex101 or a similar tool, add sample data,
see the mistake, fix the mistake, continue to do other stuff.it works there, pull hair
FTFY
I usually do
# What we are doing (high level) # Why we need regex # Regex step by step # Examples of matches regex
And I still rewrite it the next time
// abandon all hope ye who commit here (?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
Edith: damit, Not the first to post this abomination
I haven't laughed so hard in a month. Thank you for that.
Regexr.com is my go-to.
This is the one I use! Might have to look at regexer though
Me checking my own docs: "this is some voodoo shit, idk how it works"
That's what my comments say
Never debug regex, just generate a new one. It's not worth the hassle to figure out not only what it does, but what it was meant to do.
Better yet, just write it out in code, and never use regex. Tis a stupid thing that never should have been made.
Hard disagree. The function regex serves in programs like Notepad++ can't be easily replaced by "writing it out in code". With a very small number of characters you can get complex search patterns and capturing groups. It's hard to read but incredibly useful.
Can't upvote twice, have a low effort comment instead
feel like thats a notepad++ problem? in general, breaking it out into manageable human ingest-able chunks is A Good Idea
Regex is a write only language.
I love regex and I use it a lot, but I very rarely use it in any kind of permanent solution. When I do, I make sure to keep it as minimal as possible, supplementing with higher level programming where possible. Backreferences and assertions are a cardinal sin and should never be used.
Downvoted so that everyone can know I'm cool since I understand regex better than the idiot who made that meme.
I know I'm weird, but I love regex.
If I have a complex regular expression to code into my app, I write it in pomsky, then copy paste the compiled regex to my source file, but also keep the pomsky source nearby. Much more maintainable.
This is basically code refactoring on a simplified level. You're basically renaming a whole bunch of functions/tokens at once.
Let's say you're renaming the variable 'count' under the method 'buttplug'. First off, what do you rename it to?
You start by replacing every instance of buttplug.count with a unique token, let's say tnuoc.gulpttub.
Then you replace that buttplug with a unique buttplug.
Simple.
Then you replace that buttplug with a unique buttplug.
Rare buttplugs with good affixes are better than unique buttplugs.
Aziz! LIGHT!
There are a few online regex testing tools that will analyse your efforts and give you the opportunity to provide sample data.
LOL yeah that's about right.
Ohhhhh it was this extra '
There are no bugs, it's just not doing what you expect it to be doing...
... which, now that I think of it, can be said about all software in general.
awk-ward
I think he found the Road Runner.
Elisp has a nice notation for maintainably composing regexes like any other programming expression.
Only language I've seen offer that.
So instead of "/\\*\\(?:[^*]\\|\\*[^/]\\)*\\*+/"
, the regular expression to match C block comments could be expressed (with inline comments)
lisp
(rx "/*" ; Initial /* (zero-or-more (or (not (any "*")) ; Either non-*, (seq "*" ; or * followed by (not (any "/"))))) ; non-/ (one-or-more "*") ; At least one star, "/") ; and the final /
Ffs just rewrite it
This would honestly be a lot easier
I don't know any hieroglyphs, but I do know cuneiform. Would rather read cuneiform than regex!
Others: "Oh god, regexes are so hard to understand!"
Me, an intellectual: writing a code that does the same.