Exploring the Power of the KQL Evaluate Operator
Unleashing Advanced Functions for Machine Learning and Beyond
The Kusto Query Language (KQL) is a powerful tool for querying large datasets in Azure Data Explorer and Microsoft Fabric. Among its many operators, the Evaluate operator stands out for its ability to invoke advanced query extensions, known as plugins. This operator is essential for performing complex functions, including machine learning models, which require a deeper understanding of KQL extensions.
What is the Evaluate Operator?
The Evaluate operator is a tabular operator that allows users to invoke service-side query extensions. These extensions, or plugins, are not bound by the relational nature of KQL, meaning they can produce dynamic output schemas based on the data they process.
Syntax and Parameters
The syntax for the Evaluate operator is straightforward:
[T | ] evaluate [evaluateParameters] PluginName ([PluginArgs])
T: A tabular input to the plugin.
evaluateParameters: Space-separated parameters controlling the behavior of the operation.
PluginName: The name of the plugin being invoked.
PluginArgs: Arguments provided to the plugin.
Key Plugins
The Evaluate operator supports various plugins, each designed for specific tasks. Some notable plugins include:
autocluster: Automatically clusters data based on similarities.
pivot: Transforms data from rows to columns.
R plugin: Integrates R scripts for advanced statistical analysis.
rolling-percentile: Calculates rolling percentiles over a dataset.
Distribution Hints
Distribution hints specify how the plugin execution is distributed across cluster nodes:
single: A single instance runs over the entire query data.
per_node: Instances run on each node over the data they contain.
per_shard: Instances run over each shard of the data.
NOTE: To use the following queries, see: Sample Gallery
Example 1: Using the autocluster
Plugin
This example demonstrates how to automatically cluster data based on similarities using the autocluster
plugin.
StormEvents
| evaluate autocluster()
This query takes the StormEvents
table and applies the autocluster
plugin to group similar events together.
Example 2: Using the pivot
Plugin
This example shows how to transform data from rows to columns using the pivot
plugin.
StormEvents
| summarize Count = count() by State, EventType
| evaluate pivot(State, EventType, Count)
Here, the pivot
plugin is used to create a pivot table that shows the count of events by state and event type.
Example 3: Using the R
Plugin for Advanced Statistical Analysis
This example integrates an R script to perform advanced statistical analysis.
StormEvents
| evaluate script("R", "summary(lm(DamageProperty ~ State + EventType, data))")
In this query, the R
plugin runs an R script that performs a linear regression analysis on the DamageProperty
based on State
and EventType
.
Example 4: Using the rolling-percentile
Plugin
This example calculates rolling percentiles over a dataset.
StormEvents
| evaluate rolling_percentile(90, Timestamp, DamageProperty)
This query calculates the 90th percentile of DamageProperty
values over time.
Example 5: Using Distribution Hints
This example demonstrates how to use distribution hints with the pivot
plugin.
StormEvents
| summarize Count = count() by State, EventType
| evaluate hint.distribution = per_node pivot(State, EventType, Count)
In this query, the hint.distribution = per_node
hint is used to distribute the pivot
operation across nodes.
TLDR
The Evaluate operator is a versatile tool in KQL, enabling advanced functions and machine learning models. By understanding its syntax, parameters, and supported plugins, users can leverage its full potential to perform complex data analysis tasks.