PHP Regular Expressions 101
Regular expressions are one of those things that I never really took the time to learn. Rather, whenever a situation arose I would discern and 'learn' the specific thing that I needed to resolve my problem.
Having built Whois Browser as well as many other internal tools which heavily depend on regular expressions this is somewhat of a bemusing approach.
If you have a similar approach, this post seeks to serve as a basic overview of what you need to know to utilise regular expressions within PHP (at the most basic level).
For advanced level regular expressions check out http://www.regular-expressions.info/ - it is an awesome resource. It is well worth setting aside an afternoon and reading it all.
preg functions
As outlined in the PHP manual there are a number of preg functions contained within the PHP language. These functions all 'perform (a search for) a regular expression' and then do something.
You can for example use:
-
preg_replace
to find any numerical values prepended with a £ sign and replace them with something else. -
preg_match_all
to find all phone numbers in a given text block
These methods serve as your main interface for utilising regular expressions within PHP. They are incredibly powerful.
Perl Compatible Regular Expressions (PCRE)
PHP utilises Perl Compatible Regular Expressions (PCRE). This is a format for writing regular expressions - the things that you are searching for.
The PHP manual has its own page on pattern syntax.
The basics
A basic code snippet outlining usage of the preg_match
function would be as follows:
$regex = "#cat#";
preg_match($regex, $contentToSearch, $outputArray);
This would search your $contentToSearch
for the word cat
and populate $outputArray
with any instances of cat
that it found (this is a completely pointless example).
The #
symbol delimits the start and the end of the expression. You can utilise any "non-alphanumeric, non-backslash, non-whitespace character" as a delimiter but it is advisable to use something that is obvious, and will not be utilised within the expression itself (otherwise you'll have to escape characters).
Modifiers
After the closing delimiter you can add any number of modifiers. These modify the expression so for example you can:
- make your expression case independent
- make your expression search across multiple lines
If my regex above were $regex = "#cat#i";
it would match cat
, CAT
, Cat
etc
Captures
Any sub-expression contained within brackets ((
and)
) is captured. That is to say it is appended to your $outputArray
.
$contentToSearch = "123-ABC-456";
$regex = "#([0-9]*)-[A-Z]*-([0-9]*)#";
preg_match($regex, $contentToSearch, $outputArray);
In this example the contents of $outputArray
would be:
Array
(
[0] => 123-ABC-456
[1] => 123
[2] => 456
)
- The first value is the whole expression match.
- The second and third values are the values of the two capturing blocks.
Matches
[
and ]
enclose acceptable sets of characters.
[0-9]
allows any character from 0-9[A9]
allows the charactersA
or9
How many
[A-Z]
will match one character from A-Z.[A-Z]*
will match any number of characters from A-Z[A-Z]{2,}
will match two or more characters from A-Z
Greediness
$contentToSearch = "abc123 welcome back";
$regex = "#.* #";
preg_match($regex, $contentToSearch, $outputArray);
$regexTwo = "#.* #U";
preg_match($regexTwo, $contentToSearch, $outputArrayTwo);
In this example $outputArray
would contain the following:
Array
(
[0] => abc123 welcome
)
The .
character matches any character. So this expressions searches for any number of any characters followed by a space.
The match is by default greedy so it wants the largest possible match. abc123 welcome
is the largest set of any number of any characters followed by a space.
You can use the U
modifier to make the regex ungreedy by default.
$outputArrayTwo
would contain:
Array
(
[0] => abc123
)
There you go
Whilst I am happy to answer any questions that you may have, there are no shortage of questions (and answers), tutorials, and websites outlining the usage of regular expressions.
This post is a bare minimum introduction to regular expressions. That is to say with that outlined above you can probably achieve most simple things.
Whilst the actual expressions often times look ugly, they are in fact incredibly simple. Don't be put off.
Take the time now to get a grasp on regular expressions and you will be thanking yourself the next time you reach for a strpos
and str_replace
combo and realise that it is a hell of a lot easier with preg_replace
;)