Back in 2022, I did a Suricon presentation titled Jupyter Playbooks for Suricata. This led into a blog series, also available as a notebook, meant to expand on the topic and provide more context to those interested in notebooks who lack experience to get started.
This post can be thought of as continuation of that blog series, though taking on a slightly new direction. Whereas the original Jupyter series was focused on core Suricata, then now we look toward the Stamus Networks software platforms . Note that I do not mean SELKS, our turnkey Suricata based open-source IDS/NSM threat hunting system, nor Stamus Security Platform, our commercial network-based threat detection and response (NDR) system.
Instead, the focus of this article is on a software component that is shared between the two solutions. Our goal is to extend the functionalities of these products beyond what is already implemented in the user interface. Both Stamus Security Platform and SELKS (via Stamus Community Edition, or Stamus CE) share a Django web application that implements our management and backend functionalities was chosen for this task. For simplification, we’ll refer to this by its historical name, Scirius, in this article.
Interfacing our Jupyter data connectors to open-source Stamus CE means we can contribute to the Suricata community while also enhancing the product to our customers. Unfortunately, Suricata is still seen as nothing more than a rule-based IDS engine, whereas in reality it produces a ton of additional useful NSM events. Our goal is to expose this data to users in a meaningful manner. And Jupyter notebooks are the tool we chose for this task.
So far we've talked about processing simple EVE log files with pandas and Jupyter notebooks. However, this approach does not scale. Pandas is designed to be simple to use, not for ingesting and transforming vast amounts of data. All processing is done in-memory, and it's not conservative in using it. On the other hand, Suricata can produce vast amounts of NSM events. Does it mean pandas is not fit for processing Suricata EVE JSON logs at scale? No, not at all!
Pandas is an amazing tool for interacting with data and for gaining quick insights. The problem is filtering and transforming large datasets. Scirius is already able to do the former by relaying queries to the backend Elasticsearch database. Our commercial Stamus Security Platform also has a powerful streaming pipeline to enhance core Suricata events, which addresses the data preparation. But even without it, the core Suricata EVE logs have so much to offer that most users never dig into.
The REST API
A little known feature in our products is the ability to query the REST API. REST, which stands for REpresentational State Transfer, is a standard paradigm for building web applications whereby the backend server is responsible for frontend components via API requests. In our case, most frontend components simply fetch and display data from backend URLs. The important part is that we have already implemented a number of useful API endpoints to fetch useful data. It's also fairly simple to add new endpoints.
But before we can discuss newly added endpoints or even how anyone could contribute to adding them, we must first explore how API queries work. In short, anyone with a proper API token is able to issue authenticated requests to endpoints. To generate that token, we must first navigate to the “Account Settings” section which is available at the top right corner of the title menu.
Then on the left hand side, we choose Edit Token.
Finally, the token will be visible in the Token field. If empty, then simply click the “Regenerate” button to create a new one. Then, copy the value to a keychain or password safe of your choice.
Once we have found our token, we can start issuing queries to Scirius REST API (it’s still called “Scirius” in the code). We can even fetch data from the command line! Simply point your web client to the appliance IP or fully qualified domain name with API endpoint in the URI path. API token must be defined within the Authorization header.
curl -XGET "https://$SELKS_OR_SSP/rest/rules/es/alerts_count/" \
-H "Authorization: Token $TOKEN" \
-H 'Content-Type: application/json'
This very simple endpoint returns the number of alerts that match within the given time period. If left undefined, it will default to 30 days in the past to now.
We can pull data directly from any SELKS or SSP instance, directly from the command line. That's pretty cool! But let's look at something more powerful.
Suricata Analytics Project
The most difficult aspect of working with notebooks is data ingestion. Most notebooks become unusable over time since they depend on CSV or JSON files for input. Even worse, those files might be preprocessed and the notebook assumes the existence of some fields that are not actually present in raw data. Jupyter notebooks are often used as references when working on new notebooks simply because they cannot be used without being shipped with the exact data they were originally developed with. This clearly diminishes their usefulness. By using Scirius as our data ingestion point, we're able to mitigate that problem. We can make assumptions about what data is present and how it's formatted without shipping it with notebooks.
This was one of the critical factors that motivated us to start the Suricata Analytics project. If the REST API is the server component, then Suricata Analytics notebooks are the clients. Those notebooks use Python to interact with Scirius REST API. The next section will explain how it works.
Scirius REST API with Python
First, we need to point our notebooks to the right host. We also need to store the authentication token along with any parameters that might alter the connection. After all, hard coding variables like this into each notebook will severely diminish their usability. And to make matters worse, committing and pushing API tokens is a security breach. To keep things simple, we decided to use .env files. In fact, our SELKS on docker setup uses the same method, so it was only natural to use it for notebooks as well. It can be set up as described in Suricata Analytics main README file.
SCIRIUS_HOST=<IP or Hostname>
For now we handle a very limited set of options. Those being the token value itself, server IP or hostname, and an option to disable TLS verification if using self-signed certificates. The latter being the default for most lab setups and out of the box SELKS installations.
Python has a “dotenv” package to import variables in this file into a python session. Once imported, dotenv_values allows us to use variables in the environment file like any other python dictionary. Note that Suricata Analytics project includes a reference docker container which mounts the environment file from the project root directory into the home folder of the container. The subsequent example is written with this in mind.
We can use the Python requests package to interact with Scirius REST API. But before we do, we need to set up some parameters. Like before, the API token is passed with the Authorization header. Though this time it's more structured. We can also use the environment dictionary to dynamically build the URL and authentication.
Each API endpoint usually defines its own parameters, but some are common for most. The important ones being:
- “qfilter” for passing a KQL style query to the endpoint;
- “from_date” unix epoch to define point in time from which we want to retrieve the events;
- “to_date” unix epoch to define point in time to which the data should be retrieved;
- “page_size” how many documents should be fetched;
It is important to note that we can pass any Kibana style query to the endpoint using the “qfilter” parameter, essentially allowing us to fetch any data we want. We can also modify the query period. The default is to fetch data from the last 30 days. This is something to be careful with since many queries might match more documents than what's returned by Elasticsearch. A wide query over the past 30 days with default page size would return a tiny sample of overall data, and would thus not be very useful.
Ideally, we would need to fetch something specific. For example, we might be interested in http events where HTTP URI contains a command injection.
Most data can simply be fetched with HTTP GET requests. A very powerful API endpoint to get started with is events_tail which allows the user to query raw EVE events.
Once the data is retrieved, we can simply load the values from the “results” JSON key and pass them to Pandas json_normalize helper to build a flat dataframe of EVE events. Once done, we can interact with the data as described in previous posts.
We can simply measure how many events were fetched.
Or we could subset the data frame for a quick glance.
Naturally, a more useful interaction would be some kind of aggregate report. For example, we could see what URL-s were accessed and what user agents were used per individual HTTP hosts.
This is really powerful but involves some boilerplate. In the next section we'll see how Suricata Analytics improves on this.
Suricata Analytics Data Connector
Boilerplate refers to code that repeats in many parts of the code with little variation, but it must be there to set up some other functionality. In our case, users would need to import the API token and Scirius server address in every notebook using dotenv. If we ever changed how they are stored, then every notebook would break. Secondly, we would need to import requests and set up HTTP query parameters all the time.
Notebooks can become really complex. Especially when weighed down with code that's actually not relevant for exploring data. Having discarded many notebooks for that reason, we decided to write a Python data connector to move this complexity from notebooks to importable libraries. This connector is also part of the Suricata Analytics project and can simply be installed with pip install while in the project root directory. This idea was very much inspired by MSTIC Jupyter and Python Security Tools, developed by the Microsoft Threat Intelligence team (MSTIC). Like our project, it provides data connectors to quickly import and analyze security data into Jupyter Notebooks.
Once installed, the connector can be imported into any notebook.
Then we create a new connector object. The environment file is automatically detected on object initialization, though the user can override the parameters with object arguments as well.
The object maintains a persistent state so the user only needs to set certain parameters once. Page size parameter is one that could be easily overlooked. Users might execute one query with modified page size yet forget to pass that argument in the next. That could skew the results since the second data fetch might be partial, due to more documents matching the query than would be returned by Elasticsearch.
The object allows the user to simply set the parameter once. All subsequent queries would then use the value until it's once again updated.
The same is true for defining the query time period. Relative time queries are very common when working with NSM data. Most users simply need to know what happened X amount of time ago in the past, and might not really care for setting exact timestamps.
We provided a helper method that handles this calculation automatically. Likewise, the time frame will apply to all subsequent queries once set.
Naturally, the user could also explicitly set from and to timestamps as RFC3339 formatted strings, a unix Epochs, or parsed Python timestamp objects. Our library handles basic validation such as ensuring that timestamps are not in reverse.
These are just some of the ways we can easily prepare the following method call. That call would then be functionally identical to the “requests” example that was shown in the prior section, albeit with fewer lines of code. We also do not need to worry about parsing the results. Our library automatically converts the resulting JSON into a normalized pandas data frame, further reducing redundant code.
In this post we introduced Scirius REST API, the application programming interface to Stamus CE and Stamus Security Platform. Interacting with it simply requires generating an API token, which can then be passed via HTTP authorization header. Requests could easily be made from the command line. though a programming language like Python opens up a heap of possibilities.
Recognizing this for the potential of extending our product, we created the Suricata Analytics project which implements Python data connection and widgeting library, notebooks for hunting and data exploration, and containerized environment that enables anyone to quickly get started.
Currently we focused purely on the basics - introduction of API concept, token setup, and simple data retrieval. In the future, we will introduce open-source API endpoints that we created purely for this project and highlight how they can be useful for threat hunting.