- Create XML file.
- Import XML into Exchange 2013 or Office 365.
- Create new DLP Policy.
- Create rule based off the DLP policy.
Pretty simple.
However, as you are about to find out, the most difficult part will be the creation of the XML file that is needed to define the policy settings. There are numerous resources out there to help with the creation of the XML file. What I have found to be most useful when creating the file is the following:
- GUID Generator – the XML file has a few GUIDs that need to be replaced with new ones to ensure uniqueness of GUIDs. A good GUID generator can be found at this site.
*** Note that GUIDs can be generated in PowerShell as well (see part two of this series for more information). - RegEx checker – RegEx is the syntax that allows the Exchange DLP feature to know what to look for in a message. Regexr is a good website to use for this purpose. The site allows testing of the regex syntax and can determine if test text matches the criteria you specified.
- Good Notepad Editor – this is key for being able to properly work on XML files. Notepad++ is a prime example of this type of editor. While I believe it has some UTF16 limitations, UTF8 formatting works fine.
XML Creation
Where do we start? A good place is a Microsoft TechNet article which contains some sample code for a DLP XML file. The sample XML looks like this:
<?xml version="1.0" encoding='UTF-8'?> <RulePackage xmlns="http://schemas.microsoft.com/office/2011/mce"> <RulePack id="b4b4c60e-2ff7-47b2-a672-86e36cf608be"> <Version major="1" minor="0" build="0" revision="0"/> <Publisher id="7ea13c35-0e58-472a-b864-5f2e717edec6"/> <Details defaultLangCode="en-us"> <LocalizedDetails langcode="en-us"> <PublisherName>DLP by the Cloud Master</PublisherName> <Name>Custom SSN Classification</Name> <Description>Custom SSN Classification</Description> </LocalizedDetails> </Details> </RulePack> <Rules> <!-- SSN --> <Entity id="0ba2cb9d-4ef1-4fdd-bd16-c3f431363d4b" patternsProximity="300" recommendedConfidence="75"> <Pattern confidenceLevel="85"> <IdMatch idRef="FormattedSSN" /> </Pattern> <Pattern confidenceLevel="85"> <IdMatch idRef="UnformattedSSN" /> </Pattern> </Entity> <Regex id="FormattedSSN"> (?!\b(\d)\1+-(\d)\1+-(\d)\1+\b)(?!123-45-6789|219-09-9999|078-05-1120)(?!666|000|9\d{2})\d{3}-(?!00)\d{2}-(?!0{4})\d{4} </Regex> <Regex id="UnformattedSSN"> (?!\b(\d)\1+\b)(?!123456789|219099999|078051120)(?!666|000|9\d{2})\d{3}(?!00)\d{2}(?!0{4})\d{4} </Regex> <LocalizedStrings> <Resource idRef="0ba2cb9d-4ef1-4fdd-bd16-c3f431363d4b"> <Name default="true" langcode="en-us"> Custom Social Security Number </Name> <Description default="true" langcode="en-us"> A custom classification for detecting Social Security numbers </Description> </Resource> </LocalizedStrings> </Rules> </RulePackage>
What exactly is all of this? Let’s start with the easy part. GUIDs. There are 3 GUIDs listed in this file. One of them is listed twice. To generate these we need to either use the website or use PowerShell to generate this. Once we have three GUIDs, we need to replace them at these locations within the file:
Once the GUIDs are replaced, we can concentrate on the name and descriptions to help better identify the purpose of this DLP policy.
- Line 8 – who created the rule (person, company, or department as some examples).
- Line 9 – provide a name of your DLP policy (Social Security, Bank Account Numbers, etc).
- Line 10 – provide a description of the policy, as short or as long as is needed.
- Line 18 – Name of the condition to be referenced lower. Repeat for each condition.
- Line 24 – repeat of line 18 and subsequent rule names
- Line 33 – Name of the DLP policy.
Last section to concentrate on are the RegEx conditions used by the rule to determine when to trigger the rule:
For the RegEx code, the best option is to test it before trying to add it to the XML file:
In this case I have a SSN rule that is looking for a pattern of ###-##-#### or ### ## ####. There are plenty of options for this pattern and you can search the Internet for the various flavors and iterations. However this one works for my client. Once you have your RegEx syntax, this can be tested against live text to see if there are any matches. Once the RegEx syntax is perfected, place that into the XML file. Once the XML file is completed this file can be imported into Exchange 2013 on premises servers, or Office 365 servers.
To import the rule into Exchange 2013, simply run this one liner which will then make this DLP policy available for Transport rules:
New-ClassificationRuIeCoIIection -FileData -Path "<file name and full path>" -Encoding Byte -ReadCount 0))
To import the rule into Office 365, we need to use the Azure AD Module for PowerShell and run these lines:
$LiveCred = Get-Credential $Session = New-PSSession -name ExchangeOnline -ConfigurationName Microsoft.Exchange -ConnectionUri https://ps.outlook.com/powershell/ -Credential $LiveCred -Authentication Basic -AllowRedirection Import-PSSession $Session New-ClassificationRuIeCoIIection -FileData ([Byte[]]$(get-content -Path "C:Xtemp\ssn2 .xml" -Encoding Byte -ReadCount 0))
Once the import is successful, you will see this:
BR>
You can verify the collection via PowerShell:
Now that the XML file is imported, we can create a DLP rule that references this:
Create a new DLP rule:
Select Sensitive Information Type:
Select the custom rule you created:
Click OK.
Once the rule is created, the DLP rule can be tested with a new email (See the Policy Tip that appears):
NOTES
When trying to import the XML file into Office 365 I was consistently getting an encoding error.
Thinking I had an issue with the way the file was saved, I tried Notepad, WordPad and Notepad++ without success. After do some line by line verification, I found that my RegEx syntax was not quite correct. Once this was changed, I was able to import the XML file.
Another issue is the format of the XML file itself. To test the formatting, simply open the file with your favorite web browser. If there is an issue, the page will either be blank or some information will appear:
If the XML file is correct, something like this will appear:
Final Word
There is so much more that can be done with the custom DLP policies and this article only scratches the surface. Good sources of information:
http://technet.microsoft.com/en-us/library/dn781122(v=exchg.150).aspx
http://blogs.technet.com/b/govcloud/archive/2014/04/15/dlp-creating-custom-rules.aspx#.VGzf2_9OUiR
Hi, Thank you for this. Its a very helpful information.
I wish to know if there is a way to create a policy which checks for a 7 digit value match from a dictionary. We have certain 7 digit numbers and we want to make a policy that if any of those numbers show up in an email, it should be blocked.
We are using RSA DLP currently which allows us to import a text file which looks something like this:
1264967
2052781
1506868
2044661
1551890
2137657
1855740
1336706
1484340
1653824
1823220
1964979
1298132
…
…
Can Office DLP provide these functionality?
Thank you
Yes, you can create RegEx to handle these numbers. RegEx usually is about matching just patterns, but it can be coded to handle a series of numbers. The RegEx search pattern would look like this:
“^(1264967|2052781|1506868||2044661|1551890|2137657|1855740|1336706|1484340|1653824|1823220|1964979|1298132)$”
I got this error after loading my regex expression in xml in a DLP.
The following rule(s) reference more than 20 distinct regular expression text processor(s) and may impact performance:
can anyone help in this??
Need more information. What is the RegEx you are using and what are you trying to match?
am using regex expressions for c language and java language for all the syntaxes.. how do import that into xml??..am getting error as The following rule(s) reference more than 20 distinct regular expression text processor(s) and may impact performance:
have created a policy to identify syntaxes in c and Java using regex in a xml file. but was getting error as “The following rule(s) reference more than 20 distinct regular expression text processor(s) and may impact performance”. Do you have any idea about this?