Powerful KQL Operators You Didn't Know You Needed
Because 'summarize' is just the tip of the KQL iceberg
The Kusto Query Language (KQL) is a robust tool for querying and analyzing data in Azure Data Explorer, Log Analytics, and other Microsoft services. While most users are familiar with basic operators like where
, summarize
, and join
, KQL offers a treasure trove of lesser-known operators that can supercharge your data analysis. In this blog post, we'll explore five powerful KQL operators that you might not have realized you needed, complete with practical examples to demonstrate their utility.
1. mv-expand
: Unraveling Arrays and Dynamic Types
The mv-expand
operator is a game-changer when dealing with dynamic data types, such as arrays or JSON objects. It expands multi-valued columns into individual rows, making it easier to analyze nested or complex data structures.
Example: Analyzing Log Data with Arrays
Suppose you have a table Logs
with a column Tags
containing arrays of tags associated with each log entry. You want to analyze the frequency of each tag.
Logs
| where Timestamp > ago(7d)
| mv-expand Tags
| summarize TagCount = count() by tostring(Tags)
| order by TagCount desc
What it does: This query takes the Tags
array, expands it so each tag gets its own row, and then counts the occurrences of each tag. The tostring
function ensures the tag is treated as a string for grouping. This is invaluable for log analysis where tags or labels are stored as arrays.
Why you need it: Without mv-expand
, you'd struggle to process array data without complex parsing or scripting. It simplifies working with JSON-like structures, which are common in telemetry and log data.
2. make-series
: Time Series Analysis Made Simple
The make-series
operator creates a time series from your data, aggregating values over regular time intervals. It's perfect for trend analysis, especially when dealing with time-based data like metrics or logs.
Example: Tracking CPU Usage Over Time
Imagine a Metrics
table with CPU usage data. You want to visualize average CPU usage per hour over the last day.
Metrics
| where Timestamp > ago(1d)
| make-series AvgCPU = avg(CPUUsage) default=0 on Timestamp from ago(1d) to now() step 1h by ServerName
| project ServerName, AvgCPU, Timestamp
What it does: This query creates a time series of average CPU usage (AvgCPU
) for each ServerName
, with data points every hour. The default=0
ensures missing intervals are filled with zeros, and the project
operator formats the output for clarity.
Why you need it: make-series
is essential for creating smooth, continuous time series data for charting or anomaly detection, saving you from manual bucketing or scripting.
3. parse
: Extracting Data from Strings
The parse
operator is a powerful tool for extracting structured data from unstructured or semi-structured strings, such as log messages or URLs. It allows you to define a pattern and map parts of the string to new columns.
Example: Parsing URLs in Web Logs
Suppose you have a WebLogs
table with a RequestUrl
column containing URLs like https://example.com/api/v1/users/123
. You want to extract the endpoint and user ID.
WebLogs
| parse RequestUrl with * "/api/v1/" Endpoint "/" UserId
| summarize RequestCount = count() by Endpoint, UserId
What it does: The parse
operator matches the URL pattern, extracting the Endpoint
(e.g., users
) and UserId
(e.g., 123
) into new columns. The summarize
operator then counts requests by endpoint and user.
Why you need it: Parsing unstructured data like logs or URLs is a common challenge. The parse
operator simplifies this without requiring complex regular expressions or external tools.
4. top-nested
: Hierarchical Aggregation
The top-nested
operator is a hidden gem for hierarchical data analysis. It allows you to drill down into data across multiple dimensions, returning the top N results at each level.
Example: Top Users by Region and Activity
Consider a UserActivity
table with columns Region
, UserId
, and ActivityType
. You want to find the top 2 regions and, within each, the top 3 users by activity count.
UserActivity
| where Timestamp > ago(30d)
| top-nested 2 of Region by RegionCount = count(),
top-nested 3 of UserId by ActivityCount = count()
| project Region, RegionCount, UserId, ActivityCount
What it does: This query first finds the top 2 regions by activity count, then within each region, it finds the top 3 users by activity count. The result is a hierarchical view of the data.
Why you need it: top-nested
is perfect for multi-level analysis, such as identifying top customers, regions, or products in a hierarchical structure, without writing multiple queries.
5. evaluate bag_unpack
: Flattening Dynamic Columns
The evaluate bag_unpack
operator (used with the bag_unpack
plugin) is ideal for flattening dynamic columns (like JSON objects) into individual columns. This is particularly useful when dealing with semi-structured data.
Example: Unpacking JSON Metadata
Assume a Events
table has a Metadata
column containing JSON objects like {"Environment": "Prod", "Version": "1.2.3"}
. You want to extract these fields as columns.
Events
| evaluate bag_unpack(Metadata)
| summarize EventCount = count() by Environment, Version
What it does: The bag_unpack
operator transforms the Metadata
JSON into columns Environment
and Version
. The summarize
operator then groups events by these new columns.
Why you need it: JSON data is common in modern datasets, but analyzing it can be cumbersome. bag_unpack
makes it easy to convert JSON properties into queryable columns without manual parsing.
TLDR
These five KQL operators—mv-expand
, make-series
, parse
, top-nested
, and evaluate bag_unpack
—are powerful tools that can simplify complex data analysis tasks. Whether you're unraveling arrays, building time series, parsing strings, analyzing hierarchies, or flattening JSON, these operators can save you time and effort. Next time you're writing a KQL query, consider whether one of these hidden gems can make your life easier.
Try them out in your next KQL query and unlock new insights from your data!