Make your Data Processing more efficient with PowerShell

This is second post from the Data Manipulations with PowerShell series, where we are diving into the specific phase of data operations processes and apply/use PowerShell along the way.

Read the first post: Empower your Data Retrieval with PowerShell

During this phase data can undergo multiple processing tasks. The main goal is to rotate and shape data into desired way by filter, sort and group operations.

We are going to focus on data processing actions:

  1. Data Filtering – selecting specific elements from a collection based on certain criteria.
  2. Data Sorting – arranging data in a specific order based on one or more properties. 
  3. Data Grouping – categorizing data based on a specific property and then performing operations on each group.

Very rarely you can achieve desired data processing outcomes via only one action. At minimum it will be a combination of two of them and more commonly mix of all of them.

As always, first we will check out of the box supplied functionality and then explore PowerShell Gallery modules.

Data Filtering

Out of the Box

PowerShell provides several methods to filter data: Where-Object cmdlet with -match operator & Select-String.

Where-Object is part of Microsoft.PowerShell.Core module. As it is stated in documentation, it selects objects from a collection based on their property values. Some of the standard use cases are to select files that were created after a certain date, events with a particular ID, or computers that use a particular version of Windows. There are two different ways to construct a Where-Object clause: using comparison statement and using script block.

Comparison statement is much more like natural language. Here are two examples that provide the same result:

$DataCollection | Where-Object -Property Label -eq -Value "Internal"
$DataCollection | Where-Object Label -eq "Internal"

Script block to specify the property name, a comparison operator, and a property value. Where-Object returns all objects for which the script block statement is true.

$DataCollection | Where-Object {$_.Label -eq "Internal"}

All PowerShell comparison operators (e.g., `-eq`, `-ne`, `-gt`, `-lt`, `-like`, `-match`) are valid in the script block & comparison statement formats.

Where-Object is pretty PowerShellish way to filter data but there is always hardcore old school option – Regular Expressions.

PowerShell allows usage of regular expressions (`-match` operator) for more advanced pattern matching and filtering. Regular expressions allow you to define complex search patterns.

Select-String is part of the PowerShell.Utility module and is used to find text in strings and files.

Similar to -match operator it uses regular expression to find the needed data.

$DataCollection | Select-String -InputObject {$_.Label} -Pattern 'Confidential'

Community Provided

Number of modules with *filter* in the name – 10.

I would like to highlight few of them:

  1. MapReduceFilter by Lonnie VanZandt – provide functionality to perform functional-style operations on each element of a sequence or stream of objects in a Powershell pipeline.
  2. poco by  JasonMArcher – interactive pipeline filtering in PowerShell (a port of peco).
  3. New-BloomFilter by Lee Holmes – creates a data set that stored in a highly-efficient manner (a Bloom Filter) the existence of items supplied to the function. 

Data Sorting

Out of the Box

Out of the Box we are provided with Sort-Object cmdlet. It is part of the PowerShell.Utility module. Which is one of the most useful and important modules available in PowerShell. Please read about the functionalities and use cases of this module here.

The name of the cmdlet is pretty self-explanatory and the obvious main use case is to sort objects in ascending or descending order.

Sorting is being done by object properties. If sort properties are not specified, PowerShell uses default sort property of the object. If object do not have one, PowerShell tries to compare full objects using Compare method for each property. If object property do not have an implementation of IComparable, cmdlet converts property value to a string and uses the Compare method for System.String.

$DataCollection | Sort-Object -Property Label

Community Provided

Number of modules with *sort* in the name – 5.

I would like to highlight few of them:

  1. pstools.sortlibrary by Hannes Palmquist (aka hanpq) – contains alternative sort algorithms.
  2. PSSort by Hannes Palmquist (aka hanpq)

Data Grouping

Out of the Box

And again PowerShell.Utility module caters to us the very neat Group-Object cmdlet. In a nutshell, it displays objects in groups based on the value of a specified property. It returns a table with one row for each property value and a column that displays the number of items with that value.

$DataCollection | Group-Object -Property Label -NoElement

Community Provided

I did not find any modules for grouping data.

Let me know if I missed any of the cmdlets or module that you are using and believe that more people should be aware of it in the comments section below.

Finally, I think that out of the box capabilities for data processing are more than enough and there is no need to extend the functionality with custom modules and scripts and that is why we see so low number of community provided modules available in PowerShell Gallery.

Next: Data Transformation Made Easy with PowerShell.

Data icons created by Freepik – Flaticon.

Thanks a lot for reading.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.