Netskope Help

Building Regular Expressions

DLP engine contains 3000+ predefined data identifiers that can be used in the DLP rules. DLP engine also supports custom data identifiers that use either a keyword search or regular expression search. This page describes how to write custom data identifiers for DLP using regular expressions.

Syntax

This section describes the regular expressions syntax that the DLP engine supports. The DLP engine parser interprets regular expression syntax identically to the UNIX regular expression syntax.

Supported Operators

Operator

Matched Pattern

\

Quote the next metacharacter.

^

Match the beginning of a line.

$

Match the end of a line.

.

Match any character (except newline).

|

Alternation

( )

Used for grouping to force operator precedence

[xy]

Character x or y

[x-z]

The range of characters between x and z

[^z]

Any character except z

Supported Quantifiers

Operator

Matched Pattern

*

Match 0 or more times

+

Match 1 or more times

?

Match 0 or 1 times

{n}

Match exactly n times

{n,}

Match atleast n times

{n,m}

Match atleast n times, but no more than m times

Note

The use of unrestricted greedy quantifiers of arbitrary characters such as, .* or .+ are not allowed. If you are attempting to include the characters in a class or set, reverse them. For example, *.

Metacharacters

Operator

Matched Pattern

\t

Match tab

\n

Match newline

\r

Match return

\f

Match form feed

\a

Match alarm (bell, beep and so on)

\e

Match escape

\v

Match vertical tab

\021

Match octal character (in this example, 21 octal)

\xF0

Match hex character (in this example, F0 hex)

\x{263a}

Match wide hex character (Unicode)

\w

Match word character (alphanum plus '_')

\W

Match non-word character

\s

Match whitespace character. This metacharacter also includes \n and \r

\S

Match non-whitespace character

\d

Match digit character

\D

Match non-digit character

\b

Match word boundary

\B

Match non-word boundary

\A

Match start of string (never match at line breaks)

\Z

Match end of string. Never match at line breaks; only match at the end of the final buffer of text submitted for matching

Examples of Regular Expressions
  • Regex to detect 16 digit credit card number

    Regex

    \d{4}-?\d{4}-?\d{4}-?\d{4}
    

    \d - Checks for digit character.

    {4} - Match exactly n times. It validates that there are exactly 4 digits.

    -? - This would validate that the digits are occasionally separated by hyphen. ? indicates 0 or 1 times.

    This simple regex would validate it is 16 digit number occasionally separated by -.

    Example matches

    The regex would match 1234-5678-9123-4567 or 1234567891234567.

  • Regex to validate if the 16-digit credit card number is from a major credit card issuer

    Matches major credit cards including: Visa (length 16, prefix 4) or MasterCard (length 16, prefix 51-55)

    Regex

     ^((4\d{3})|(5[1-5])\d{2})-?\d{4}-?\d{4}-?\d{4}
    

    ^ - Matches beginning of the line

    4 - To validate if the first digit is 4. Visa card starts with 4

    \d{3} - followed by 3 digits

    | - Alternation is used for matching a single regular expression out of many possible regular expressions

    (5[1-5]\d{2}) - Matches MasterCard prefix 51 to 55 followed by 2 digits

    -? - This validates if the digits are occasionally separated by hyphens. ? Indicates 0 or

    Example matches

    The regex would match 4001123456781234 or 5100123456781234.

  • Regex to check the medical record number

    Assume you have a medical record number which is 16 characters long prefixed by "NWH" which represents that the patient record is from Northwestern Hospital, followed by first 3 letters of the first name and 3 letters of the last name, followed by 7 digits.

    Regex

     \b(NWH)-?[a-zA-Z]{3}-?[a-zA-Z]{3}-?\d{7}\b
    

    \b - Match the word boundary

    (NWH) - Looks for prefix NWH

    -? - This is to check if 0 or 1 occurrence of "-" exists

    [a-zA-z]{3} - Checks for three alphabet characters. It could be any character from a-z or A-Z

    \d{7} - Check for seven digit character

    Example matches

    The regex would match NWHCARVAN0000001 or NWH-TIM-BRO-0000002.