KQL Treasure Hunt: Solve a Data Mystery with Queries
Your mission: use KQL to analyze the station’s data logs, uncover clues, and solve the mystery before it’s too late.
Welcome, detective, to the Starlight Outpost, a bustling space station orbiting a distant planet in the year 2247. A mysterious signal has disrupted the station’s operations, and the crew is in chaos. Your mission: use KQL (Kusto Query Language) to analyze the station’s data logs, uncover clues, and solve the mystery before it’s too late. Each query you craft will reveal a piece of the puzzle, teaching you KQL concepts like filtering, joins, and aggregations along the way. Ready to dive into the data? Let’s begin!
Download the dataset for this article from https://github.com/rod-trent/kql-treasure-hunt
The Setup: The Starlight Outpost Dataset
The station’s logs are stored in three KQL tables:
CrewLogs: Tracks crew activities.
Columns: Timestamp (datetime), CrewId (string), Location (string), Action (string).
SystemAlerts: Records system warnings and errors.
Columns: Timestamp (datetime), AlertId (string), System (string), Severity (int).
CommSignals: Logs communication signals.
Columns: Timestamp (datetime), SignalId (string), Source (string), Message (string).
Here’s a sample of the data (simplified for brevity):
CrewLogs:
Timestamp | CrewId | Location | Action
---------------------|--------|----------------|--------
2247-06-29T10:00:00Z | C001 | Bridge | Login
2247-06-29T10:05:00Z | C002 | Engineering | Maintenance
2247-06-29T10:10:00Z | C001 | CommRoom | Access
SystemAlerts:
Timestamp | AlertId | System | Severity
---------------------|---------|------------|---------
2247-06-29T10:02:00Z | A001 | CommSystem | 3
2247-06-29T10:08:00Z | A002 | PowerGrid | 1
2247-06-29T10:12:00Z | A003 | CommSystem | 5
CommSignals:
Timestamp | SignalId | Source | Message
---------------------|----------|------------|--------
2247-06-29T10:11:00Z | S001 | Unknown | "Initiate Phase 1"
2247-06-29T10:13:00Z | S002 | Unknown | "Access granted"
Your goal is to use KQL to analyze these logs, find the source of the mysterious signal, and uncover the hidden plot. Let’s start the investigation!
Clue #1: Suspicious Activity in the CommRoom
The station’s AI reports that the mysterious signal began around 10:00 AM on June 29, 2247, and the CommRoom is a likely source. Let’s check the CrewLogs to see who was in the CommRoom around that time.
KQL Concept: Filtering with where
The where operator filters rows based on a condition. Let’s query CrewLogs to find entries where the Location is "CommRoom" and the Timestamp is around 10:00 AM.
Query:
CrewLogs
| where Location == "CommRoom"
| where Timestamp >= datetime(2247-06-29T10:00:00Z) and Timestamp <= datetime(2247-06-29T10:15:00Z)
Result (example):
Timestamp | CrewId | Location | Action
---------------------|--------|-----------|-------
2247-06-29T10:10:00Z | C001 | CommRoom | Access
Discovery: Crew member C001 accessed the CommRoom at 10:10 AM. That’s suspicious timing! Let’s keep C001 in mind and dig deeper.
KQL Tip: The where operator is case-sensitive for string comparisons (use == for exact matches). You can chain multiple where clauses to refine your filter, as shown above.
Clue #2: Linking Crew Actions to System Alerts
The station’s systems logged several alerts around the same time. Could C001’s actions in the CommRoom have triggered something? Let’s join the CrewLogs with SystemAlerts to see if any alerts occurred close to C001’s CommRoom access.
KQL Concept: Joining Tables with join
The join operator combines two tables based on a condition, like matching timestamps. We’ll use a kind=inner join to find records where the timestamps are within 5 minutes of each other.
Query:
CrewLogs
| where CrewId == "C001" and Location == "CommRoom"
| join kind=inner (
SystemAlerts
| where System == "CommSystem"
) on $left.Timestamp == $right.Timestamp
| project CrewTimestamp=Timestamp, CrewAction=Action, AlertId, Severity
Result (example):
CrewTimestamp | CrewAction | AlertId | Severity
---------------------|------------|---------|---------
2247-06-29T10:10:00Z | Access | A003 | 5
Discovery: A high-severity (Severity=5) alert (A003) in the CommSystem occurred at the exact time C001 accessed the CommRoom. This can’t be a coincidence! The alert suggests a system breach.
KQL Tip: The join operator requires a common column (here, Timestamp). Use project to select and rename columns for clarity. If timestamps don’t match exactly, you can use join with a time window (e.g., | join ... on $left.Timestamp ~ $right.Timestamp with a bin function).
Clue #3: Decoding the Mysterious Signal
The CommSystem alerts point to unauthorized signals. Let’s check the CommSignals table to see what messages were sent around 10:10 AM, when C001 was in the CommRoom and the alert fired.
KQL Concept: Aggregating with summarize
The summarize operator groups data and computes aggregates, like counting or collecting unique values. Let’s group CommSignals by Source to see how many signals came from "Unknown" sources around the critical time.
Query:
CommSignals
| where Timestamp >= datetime(2247-06-29T10:00:00Z) and Timestamp <= datetime(2247-06-29T10:15:00Z)
| summarize SignalCount=count(), Messages=make_list(Message) by Source
Result (example):
Source | SignalCount | Messages
--------|-------------|-------------------------------
Unknown | 2 | ["Initiate Phase 1", "Access granted"]
Discovery: Two signals from an "Unknown" source were sent, with cryptic messages: "Initiate Phase 1" and "Access granted." This suggests a coordinated breach, possibly tied to C001’s actions.
KQL Tip: summarize is powerful for grouping data. Use functions like count(), sum(), or make_list() to aggregate values. make_list() creates an array of values, perfect for collecting messages like we did here.
Clue #4: Visualizing the Timeline
To piece it all together, let’s create a timeline of crew actions, alerts, and signals to confirm the sequence of events. This will help us visualize the breach.
KQL Concept: Visualizing with render
The render operator generates charts, like timecharts, to visualize data trends. Let’s combine all three tables and plot events over time.
Query:
CrewLogs
| where CrewId == "C001"
| project Timestamp, EventType="CrewAction", Description=Action
| union (
SystemAlerts
| where System == "CommSystem"
| project Timestamp, EventType="SystemAlert", Description=tostring(Severity)
)
| union (
CommSignals
| where Source == "Unknown"
| project Timestamp, EventType="Signal", Description=Message
)
| where Timestamp >= datetime(2247-06-29T10:00:00Z) and Timestamp <= datetime(2247-06-29T10:15:00Z)
| render timechart
Result: A timechart showing:
10:10 AM: C001 accesses CommRoom.
10:10 AM: High-severity CommSystem alert (A003).
10:11 AM: Signal "Initiate Phase 1."
10:13 AM: Signal "Access granted."
Discovery: The timeline confirms the sequence: C001’s access triggered a CommSystem alert, followed by unauthorized signals. This points to C001 initiating a breach, possibly granting access to an external entity.
KQL Tip: Use union to combine multiple tables or queries. The render operator supports visualizations like timechart, barchart, or piechart. Specify columns with by or with for custom visuals.
The Final Reveal: Uncovering the Plot
With the evidence in hand, let’s summarize the findings:
Crew member C001 accessed the CommRoom at 10:10 AM.
A high-severity CommSystem alert fired simultaneously, indicating a breach.
Unknown signals with messages "Initiate Phase 1" and "Access granted" were sent shortly after, suggesting an external entity gained access.
The mystery is solved: C001 is likely a rogue operative who accessed the CommSystem to transmit a signal, granting unauthorized access to an unknown entity. The station’s security team must detain C001 and trace the "Unknown" source to prevent further breaches.
Your Next Mission
Congratulations, detective! You’ve cracked the case using KQL, mastering:
Filtering with where to isolate relevant data.
Joining tables with join to connect related events.
Aggregating with summarize to summarize patterns.
Visualizing with render to see the big picture.
Want to keep honing your KQL skills? Try these challenges:
Query CrewLogs to find other crew members in the CommRoom that day (where Location == "CommRoom").
Use join to correlate SystemAlerts with CommSignals by timestamp to find more suspicious activity.
Create a summarize query to count alerts by System and visualize with render barchart.
Explore the Starlight Outpost dataset further, or create your own mystery dataset to solve with KQL. The galaxy is full of data waiting to be uncovered!
Note: Want to try these queries yourself? Set up a free Azure Data Explorer instance, load the sample data, and start querying. Check out Microsoft’s KQL documentation for more tips.
What did you think of this KQL adventure? Share your own data mysteries or KQL queries in the comments below!