Docy

Building Regular Expressions

Building Regular Expressions

DLP engine contains 3000+ predefined data identifiers that can be used in the DLP rules. DLP engine also supports custom data identifiers that use either a keyword search or regular expression search. This page describes how to write custom data identifiers for DLP using regular expressions.

Syntax

This section describes the regular expressions syntax that the DLP engine supports. The DLP engine parser interprets regular expression syntax identically to the UNIX regular expression syntax.

Supported Operators

OperatorMatched Pattern
\Quote the next metacharacter.
^Match the beginning of a line.
$Match the end of a line.
.Match any character (except newline).
|Alternation
( )Used for grouping to force operator precedence
[xy]Character x or y
[x-z]The range of characters between x and z
[^z]Any character except z

Supported Quantifiers

OperatorMatched Pattern
*Match 0 or more times
+Match 1 or more times
?Match 0 or 1 times
{n}Match exactly n times
{n,}Match atleast n times
{n,m}Match atleast n times, but no more than m times

Note

The use of unrestricted greedy quantifiers of arbitrary characters such as, .* or .+ are not allowed. If you are attempting to include the characters in a class or set, reverse them. For example, *.

Metacharacters

OperatorMatched Pattern
\tMatch tab
\nMatch newline
\rMatch return
\fMatch form feed
\aMatch alarm (bell, beep and so on)
\eMatch escape
\vMatch vertical tab
\21Match octal character (in this example, 21 octal)
\xF0Match hex character (in this example, F0 hex)
\x{263a}Match wide hex character (Unicode)
\wMatch word character (alphanum plus ‘_’)
\WMatch non-word character
\sMatch whitespace character. This metacharacter also includes n and r
\SMatch non-whitespace character
\dMatch digit character
\DMatch non-digit character
\bMatch word boundary
\BMatch non-word boundary
\AMatch start of string (never match at line breaks)
\ZMatch end of string. Never match at line breaks; only match at the end of the final buffer of text submitted for matching

Examples of Regular Expressions

  • Regex to detect 16-digit credit card number
    Regex
    \d{4}-?\d{4}-?\d{4}-?\d{4} 
    \d – Checks for digit character.

    {4} – Match exactly n times. It validates that there are exactly 4 digits.-? – This would validate that the digits are occasionally separated by hyphen. ? indicates 0 or 1 times.This simple regex would validate that the number is a16 digit number occasionally separated by -.

    Example matchesThe regex would match 1234-5678-9123-4567 or 1234567891234567.
  • Regex to validate if the 16-digit credit card number is from a major credit card issuer

    Matches major credit cards including Visa (length 16, prefix 4) or MasterCard (length 16, prefix 51-55)

    Regex
     ^((4\d{3})|(5[1-5])\d{2})-?\d{4}-?\d{4}-?\d{4} 
    ^ – Matches beginning of the line

    4 – To validate if the first digit is 4. Visa card starts with 4\d{3} – followed by 3 digits| – Alternation is used for matching a single regular expression out of many possible regular expressions(5[1-5]\d{2}) – Matches MasterCard prefix 51 to 55 followed by 2 digits-? – This validates if the digits are occasionally separated by hyphens. ? Indicates 0 or

    Example matchesThe regex would match 4001123456781234 or 5100123456781234.
  • Regex to check the medical record number

    Assume you have a medical record number which is 16 characters long prefixed by “NWH” which represents that the patient record is from Northwestern Hospital, followed by first 3 letters of the first name and 3 letters of the last name, followed by 7 digits.

    Regex
     \b(NWH)-?[a-zA-Z]{3}-?[a-zA-Z]{3}-?\d{7}\b 
    \b – Match the word boundary

    (NWH) – Looks for prefix NWH-? – This is to check if 0 or 1 occurrence of “-” exists[a-zA-z]{3} – Checks for three alphabet characters. It could be any character from a-z or A-Z\d{7} – Check for seven digit character

    Example matchesThe regex would match NWHCARVAN0000001 or NWH-TIM-BRO-0000002.
Share this Doc
In this topic ...