4.7/5 - (4 votes)

Over the course of your Office 365 administration duties, you may be called to locate data matching particular data patterns (such as matching a particular regular expression or a Sensitive Information Type), either for eDiscovery or data classification purposes. The good news is you can actually do that. In this post, we’re going to walk through a couple of ways of identifying sensitive data using the custom DLP rule package entities in my previous post. The sensitive information types we’re going to look for are U.S. Social Security Numbers (but these steps will work for any of the sensitive information types).

Content Search and eDiscovery

In this set of steps, we’re going to choose a sensitive information type to search for using either PowerShell or the portal, and then use either Content Search or an eDiscovery case to look for matching content.

Connect to the Office 365 PowerShell for the Compliance Center or navigate to the Security & Compliance Center | Classifications | Sensitive Information Types page and look for the name of the sensitive information type you wish to identify in Office 365.

If doing it through PowerShell:

Connect to Office 365 PowerShell:

$Credential = Get-Credential
$Session = New-PSSession -ConfigurationName Microsoft.Exchange -ConnectionUri https://outlook.office365.com/powershell-liveid/ -Credential $Credential -Authentication Basic -AllowRedirection
Import-PSSession $Session
Connect-MsolService -Credential $Credential
$ComplianceSession = New-PSSession -ConfigurationName Microsoft.Exchange -ConnectionUri https://ps.compliance.protection.outlook.com/powershell-liveid -Credential $Credential -Authentication Basic -AllowRedirection
Import-PSSession $ComplianceSession -AllowClobber

Since the sensitive information types we’re looking for have a “Undocumented Features” as the publisher and Social Security as part of the name, we can run this cmdlet to create the filter we want:
```
(Get-DlpSensitiveInformationType | ? { $_.Publisher -eq "Undocumented Features" -and $_.Name -like "*social security*" }).Name
```
Keep this value handy. You’ll need to copy and paste these names into a search box later on.

If doing it through the Security & Compliance Center user interface, open a separate browser window or tab to the Sensitive Information Types page and keep the page handy. You’ll need to copy and paste these names into a search box later on.
Open the Security & Compliance Center| Search & Investigation and either create an eDiscovery case with a search or do a Content Search. In this example, I’m just going to do a Content Search (since the search interface and process is nearly identical for an eDiscovery case).
Create a search. You can select + Guided Search follow the bouncing ball or select + New Search and enter data directly in the keywords box. I did the wizard (Guided search) just because I like steps.
Name the search.
Select locations and click Next.
On the Condition Card, enter a search for a Sensitive Information Type using the names of the Sensitive Type rules you identified under Classifications | Sensitive Information Types or from the PowerShell cmdlet earler, and then click Finish. The format is SensitiveType:”<name>” . For each additional search term, make sure you use a capital OR. You’ll need the names of the sensitive information templates earlier. In my case, I’m going to use:
```
SensitiveType:"Social Security Number Only (Function)" OR SensitiveType:"Social Security Number Only (Regular Expression)"
```
Click Finish.
Review the results.
Export the results as necessary.

Applying labels for classification and search

You can create a label/classification for content and search for that in your tenant as well. When using a label, you can either publish it (so that users can choose to apply it to relevant content) or you can publish and automatically apply it, meaning that if the content matches the rules of the sensitive information type, the label will be applied to the content automatically. Note: if you are applying labels automatically, it will take some time for them to show up (from a few hours, up to a week or so, depending on the amount of content in Office 365 and when the various indexing processes run). Using this process, newly created content will be tagged automatically and will show up much sooner in search results.

Create a label in the Security & Compliance Center. I created one called Social Security Label. You can find detailed instructions for doing so here, but the gist is launch https://protection.office.com/#/tagslibrary, click + Create label, and then enter a name for it.
Publish the label. You can auto apply (EMS E5) or manual (EMS E3) the label. I’m going to do auto application. To do so, select the label, and then select Auto apply label.
Verify that the label is correct and click Next.
Select the radio button Apply label to content that contains sensitive information and click Next.
Select Custom and click Next.
+
On the Sensitive information picker page, click + Add.
Select the sensitive information types from the list displayed, and then click Add.
Confirm the sensitive information types show up in the list and then click Done.
Verify settings and click Next.
Name the label policy (and optionally, you can provide a description) and click Next.
Select a location and click Next.
Confirm and click Auto-apply. Note the name of the label that will be applied (located at the bottom of the page).

Searching for a label

Once you have applied labels to your content, you can then use content search to look for those values applied. The user interface search term we’re going to leverage is Compliance Tag.

Launch Search & Investigation and either create a case and a new search or just do a content search. For this example, we’re just going to navigate to content search, but the same process applies to eDiscovery cases.
Click + New search.
Select the appropriate Locations radio button (I’m just going to search everything because it’s easy in this example), and then click the + Add conditions button.
Select Compliance Tag out of the list, and then click Add.
In the Conditions box, enter Social Security Label (the value specified in the last step of the previous section) and then click Save & run. Don’t enter quotes, or you’ll receive this error later: The query of the search is invalid: Specified argument was out of the range of valid values. Parameter name: Double quotation mark in the middle of string property value is not supported by KQL.
Enter a name for the search and click Save.
After search completes, you can preview up to 500 of the returned results or export as you normally would.

Congratulations! You’ve got this.