Threat Hunting with Python and Jupyter Notebook.

7 min readJan 8, 2023

In this post, we will delve into the powerful combination of Python and Jupyter notebooks for detecting the use of PsExec, a utility that can be leveraged by attackers to run processes on remote systems. By following along with this tutorial, you’ll learn how to use these tools to uncover hidden threats lurking within your network, and discover the numerous benefits of building out a threat hunting playbook in Jupyter notebooks.

So, why is it worth the effort to create a threat hunting playbook in Jupyter notebooks? For starters, Jupyter notebooks allows for easy collaboration. Team members can easily add, edit or remove content seamlessly. Additionally, Jupyter notebooks is perfect for documenting an idea, hypothesis or thought process making it easy for others to understand. In addition, Juypter provides a platform that allows you to integrate other tools or libraries which makes it a viable solution to perform a wide variety of tasks.

Join me as we explore how to use Python and Jupyter notebooks for effective threat hunting and discover the many advantages of creating a playbook in this powerful platform.

Threat hunting is a critical aspect of cyber security, as it allows organizations to proactively search for and detect hidden threats within their networks. One tool that is frequently used by attackers is PsExec, a command-line utility that allows users to run processes on remote systems. In this blog post, we’ll explore an incident in which PsExec was used to extract LSA secrets from a compromised host. You can follow along by visiting my github repo and downloading the materials provided here.

To make the most of our event log data we need to initialize all the objects that will be required in order to manipulate and properly analyze the data set.

This can be achieved through the following steps:

Import the necessary libraries: pandas, json, re.

2. To read the log data from the provided json file we’ll use the read_json() function from the pandas library. Then we’ll store it in a dataframe named eventlog_df.

The Following code snippet is what I used to set up the dataset.

import pandas as pd
import json
import re


eventlog_df = pd.read_json('cmd_psexec_lsa_secrets_dump_2020-10-1903305471.json', lines=True)

To get a sense of the structure and content of the event log data, as well as ensure the integrity of the data, it’s important to preview the data stored in our dataframe. One way to do this is by using the head function. This function allows us to quickly view the first few rows of the dataframe, giving us a sense of the structure and content of the data before we dive into more detailed analysis.

For example, we might use the following code to preview the first five rows of our eventlog_df dataframe:

I also like to loop over all the column names and print them out for reference.

Hunting For PsExec

One way to detect the use of PsExec is to look for evidence of its execution artifacts, such as prefetch files, amcache entries, and shimcache entries. These artifacts can often provide clues about the programs and processes that have been run or installed on a system.

In addition, PsExec itself often leaves behind trace evidence in event logs. For example, when PsExec installs itself as a service on a destination host, the service file name is recorded as PSEXESVC.exe. The windows System event log will record the service registration as an EventID 7045.

Another helpful method for detecting PsExec activity if you have access to an EDR or sysmon is to search for command lines that contain the “psexec” string. For example, when a threat actor first runs PsExec, it will typically ask the user to accept the EULA. The command line for this action may look similar to:

PsExec.exe -accepteula “whatever command”

Additionally, a registry entry will be created in the HKCU\Software\Sysinternals\PsExec\EulaAccepted key with a value of “1” to indicate that the EULA has been accepted.

Here’s a code snippet that defines a regular expression to find a pattern of “psexec” and iterates over the CommandLine column to find any matching patters. If it does, it then prints out the results. It is a useful method of identifying PsExec useage if your org has the ability to record process command lines in event logs or via sysmon. If you have an EDR a similar search should reveal similar activity.

Nice! Here we see evidence of the threat actor issueing a psexec command with the -s argument (Run with System privs). They use the reg save command to save a copy of the secrets registry key in the users local profile. The attacker will attempt to decrypt any secrets contained within the registry file offline.

Hunting for service install events with a service file name that contains PSEXESVC

As mentioned earlier when PsExec installs itself as a service on a host, the service file name is recorded as PSEXESVC.exe. The windows System event log will record the service registration as an EventID 7045.

Knowing this we can create a simple query to search for this behavior within our eventlog_df data frame. Here’s an example of how this can be accomplished:

This code is essentially selecting rows from the eventlog_df dataframe where the value of the “EventID” column is 7045 and the value of the “ServiceName” column is “PSEXESVC”, and then iterating through those rows to print matches.

Stacking

Lets talk about stacking.

Stacking is a less targeted approach but another way to help identify unexpected or uncommon behavior. In this exercise, I’ll be stacking EventID’s but stacking should not be limited to just that. For instance sometimes it might be useful to stack amcache, prefetch, shim, IP Addresses, User Agents etc.

Its important to note that while stacking is a great way to identify malicious activity it’s not always as easy to find a needle in the haystack in a real world scenario like we do in this scenario. This dataset is much smaller and only recorded events around the time the simulated attack occurred. In the real world its not normally this easy. You’ll need to pivot and correlate from other data sources and filter out numerous false positives.

By stacking the event IDs contained within the dataframe, we can identify patterns of behavior that may be indicative of malicious activity. In this case, we will focus on events that occur less than 10 times, as these may be less common and worthy of further investigation. By adding these events to a timeline, we can more easily review and analyze the data, and build a clearer understanding of any potential threats. Here’s how I accomplished this:

First — I simply created a query that would stack all the eventIDs and sort them from the least amount of times by occurrence to the most frequent.

Next, I wrote a query to identify all of the EventIDs that occured less than 10 times, and storing those events in a new dataframe called stacked_df. Then I iterated through each row in the stacked_df dataframe and printed the eventID and message for each row.

Lastly, I wrote some code to get the events from the new dataframe stacked_df into a timeline. Adding this type of data into a timeline helps in easier review of the data and collects the evidence into a format that is easy to read and useful when building narrative.

Here’s an example of the output. While doing a hunt or an investigation you would spend some time going over all the events and forming the narrative here and moving relevant evidence to a master timeline. I will not be going over how to analyze the data and explain every relevant piece of information as this post is not about log analysis, but if you are interested in viewing the output you can find the timeline along with the workbook here. Here’s an example of what it looks like.

This is just one example of how to create a threat hunting playbook with Juypter notebooks. Maybe you work in an industry that a specific threat actor has been targeting lately and one of the TTPs for that threat actor was to steal LSA secrets via PsExec. This would be a great way to document your hyptothesis and the steps you took to search your environment for that particular activity. The ability to easily collaborate and document your process makes Jupyter notebooks an ideal tool for creating a comprehensive threat hunting playbook. Whether you are responding to a specific threat or conducting a more general investigation If anything, its just fun to use python for investigations. Happy hunting!

Threat Hunting with Python and Jupyter Notebook.

Hunting For PsExec

Hunting for service install events with a service file name that contains PSEXESVC

Stacking

Written by Mike Dockry