The famous Regular Expressions (RegEx)

Share this post

In my “how to handle strings” serie, we saw:

How not to talk about regular expressions or regEx?

Indeed RegEx are an essential tool for handling, searching for, replacing character strings. It is therefore impossible not to talk about it when dealing with data of this type.

Before you start …

Before starting, I would like to clarify that the objective of this article is not to give a course or a reference on RegEx. Far from me, in fact, the idea of ​​giving a lecture on RegEx. It would not be of much use … because RegEx, it is practiced. And anyway there are really plenty of great sites on the net

We will go through some useful links to learn RegEx. Then we will see how to practice them effectively (through cheat sheets, tools, etc.). Then we will see examples of implementation with languages ​​such as Python or Java.

The idea is therefore to provide a practical and useful RegEx file!

Memento “cheat sheet”

You will find on the internet a plethora of RegEx mementos in one page. So I am not going to redo what has already been done well. on the other hand, here are some useful links to find this information:

  • htregular-expressions-cheat-sheet-v2 (an English PDF file to keep)
  • Openclassroom (in French)
  • RexEgg (in English)

It is far from exhaustive, a simple search through your favorite search engine will show you.

The essential tools

As I mentioned above regular expressions are practical. In fact it is tested! to be more precise, we often spend our time adjusting our RegEx. Without tools and via a “taton approach” you can spend / waste hours finding the right recipe.

We must therefore use tools that will allow us to test and “debug” our regular expressions. here are a few (free and easily accessible on the net):

regex101

Accessible at https://regex101.com/

A very practical tool with highlighting to better understand what is happening, an integrated help and of course a visualization of the results live (when typing). The tool even provides the code (Python, Java, etc.) that matches if you ask it. Likewise, an already ready-to-use Regex library is available. In short, a great tool!

debuggex

Accessible at https://www.debuggex.com/

A very practical tool with a slightly different approach because it is graphic and somewhat more interactive. Basically the diagram above shows the decomposition of your RegEx which makes it more visual (very useful for very large regular expressions). Very useful also the “slider” below which allows you to move the channel cursor.

Other tools

In fact there are really a lot of tools out there for testing regular expressions, I would just mention these in addition:

Some useful examples

Check an email (in lowercase):

\b[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,}\b

Control the format of an IP address:

\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b

Check a VISA card number (be careful, this RegEx only works for VISA, and not Mastercard for example):

^4[0-9]{12}(?:[0-9]{3})?$

Check that a password meets certain criteria (1 letter, 1 number, at least 1 special character String between 8 and 15 characters):

^(?=.*[A-Za-z])(?=.*\d)(?=.*[&-+!*$@%_])([&-+!*$@%_\w]{8,15})$

Use with Python

To use RegEx with Python, nothing could be simpler, just use the re module (import re). In the example below we will retrieve for example two elements (groups) from a string:

import re

regex = r"([A-Z])([0-9]*)"
matches = re.finditer(regex, "A122 Z3")

for matchNum, match in enumerate(matches, start=1):
    print ("N° de groupe: {matchNum} | {start}-{end}: chaine {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1
        print ("Element {groupNum} trouvé: {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

Result:

N° de groupe: 1 | 0-4: chaine A122
Element 1 trouvé: 0-1: A
Element 2 trouvé: 1-4: 122
N° de groupe: 2 | 5-7: chaine Z3
Element 1 trouvé: 5-6: Z
Element 2 trouvé: 6-7: 3

Use with Java

In java, it is not more complicated because we will use the regex package (java.util.regex .. *) which makes our life just as simple:

<pre class="wp-block-syntaxhighlighter-code">import java.util.regex.Matcher;
import java.util.regex.Pattern;

final Pattern pattern = Pattern.compile("([A-Z])([0-9]*)", Pattern.MULTILINE);
final Matcher matcher = pattern.matcher("A122 Z3");

while (matcher.find()) {
    System.out.println("Trouvé: " + matcher.group(0));
    for (int i = 1; i <= matcher.groupCount(); i++) {
        System.out.println("Group " + i + ": " + matcher.group(i));
    }
}</pre>
Share this post

Benoit Cayla

In more than 15 years, I have built-up a solid experience around various integration projects (data & applications). I have, indeed, worked in nine different companies and successively adopted the vision of the service provider, the customer and the software editor. This experience, which made me almost omniscient in my field naturally led me to be involved in large-scale projects around the digitalization of business processes, mainly in such sectors like insurance and finance. Really passionate about AI (Machine Learning, NLP and Deep Learning), I joined Blue Prism in 2019 as a pre-sales solution consultant, where I can combine my subject matter skills with automation to help my customers to automate complex business processes in a more efficient way. In parallel with my professional activity, I run a blog aimed at showing how to understand and analyze data as simply as possible: datacorner.fr Learning, convincing by the arguments and passing on my knowledge could be my caracteristic triptych.

View all posts by Benoit Cayla →

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Fork me on GitHub