Regex stands for regular expressions, and it’s a means of searching a string of characters (text) for specific patterns. An easy way to understand this is the command “Ctrl + F” function on most computers. If you ever wanted to find a word in a long essay, you press command + F, and a small search box appears in the top right. Then if you enter a certain character or word, it returns all of the exact matches found on that page. That is an example of regular expressions at work.
A large amount of the information that we have follows a very distinct pattern. For example, if you live in the United States or Canada, you know that phone numbers follow a model of (xxx-xxx-xxxx). Other common text patterns include email addresses, social security numbers, or website URLs. This means we can create regular expressions that can identify these items very reliably. Within Cybersecurity, there are many situations where you will need to search through thousands of lines of characters for certain patterns. These patterns can be IP addresses (xxx.x.x.x), web domains (www.*.com), login error codes, timestamps, or any other piece of information that could help in an investigation. Regex makes working with enormous data sets possible, and it’s a good skill to have. Here, I go over some of the use cases of regex and the best programming languages for implementing it:
One of the best places to know how to use regex is for manual log analysis. A log file can have thousands of recorded events, and you could not have a tool that can do automated searches. Often if there is a security incident, you should look for IP addresses that would indicate someone could hack into the system.
The process of extracting data from websites. Regular expressions can be used to find useful information from web pages such as phone numbers or emails in a large body of text.
The example below is a regex script in python. This script will accept the text you have saved to your clipboard (by highlighting the text and pressing ctrl + c) as an input, searching it for any phone numbers or email addresses, and saving it to a list.
Alternatively, some people build scripts that are capable of extracting information from web pages without the need for manual work. This means getting information directly from the web. You could use programming libraries to pull the information by targeting HTML & CSS elements such as or, then use regex to extract the data needed. Another example is you could write a script that collects the IP addresses and domains and then feeds that information into virustotal to find out if they are malicious in any way.
Best languages for writing regex based scripts
Python: The most popular language for automation, and the same is true for regex. It has a library called re, which allows you to use regular expressions. To understand its usage exists a free online book called automatetheboringstuff.com. It focuses on automating day-to-day tasks with python and regular expressions.
Bash: This is the command-line scripting language for Linux, and it’s another place that makes it quite easy to use regex. Since Linux is a very popular operating system for security, this is probably the second-best option you have for learning how to use regex to search files.
Regular expressions can save you a ton of time, allowing you to search through huge datasets in one step. This will be very helpful in obtaining things like IP addresses, domains, user account names, login errors, and other helpful pieces of information that could indicate that you have been hacked. The ability to extract this information will help make your investigations more effective and faster, which will lead to shorter response times for your company.