Sometimes you need to gather threat intelligence data as quickly as possible, and Rapid7’s Project Sonar Opendata can provide great insights.
However, there’s a challenge: you can’t easily grep the HTTP response body with the lovely jq tool because the data field in the resulting JSON is base64 encoded:
{
"data": "SFRUUC8xLjAgNDAwIEJhZCBSZXF1ZXN0DQpTZXJ2ZXI6IEFrYW1haUdIb3N0DQpNaW1lLVZlcnNpb246IDEuMA0KQ29udGVudC1UeXBlOiB0ZXh0L2h0bWwNCkNvbnRlbnQtTGVuZ3RoOiAyMDgNCkV4cGlyZXM6IE1vbiwgMjMgQXByIDIwMTggMDc6NDA6MjkgR01UDQpEYXRlOiBNb24sIDIzIEFwciAyMDE4IDA3OjQwOjI5IEdNVA0KQ29ubmVjdGlvbjogY2xvc2UNCg0KPEhUTUw+PEhFQUQ+CjxUSVRMRT5JbnZhbGlkIFVSTDwvVElUTEU+CjwvSEVBRD48Qk9EWT4KPEgxPkludmFsaWQgVVJMPC9IMT4KVGhlIHJlcXVlc3RlZCBVUkwgIiYjOTE7bm8mIzMyO1VSTCYjOTM7IiwgaXMgaW52YWxpZC48cD4KUmVmZXJlbmNlJiMzMjsmIzM1OzkmIzQ2OzFmMzEzMjE3JiM0NjsxNTI0NDY5MjI5JiM0NjsyNDFiOGFmCjwvQk9EWT48L0hUTUw+Cg==",
"host": "REDACTED",
"ip": "REDACTED",
"path": "/",
"port": 80,
"vhost": "REDACTED"
}
While you could probably grep this using a decent bash script, I believe I have a better option.
Update:
Apparently the latest
jq
version has a@base64d
filter so there is definitely another way to accomplish this using only jq. But still it’s nice to have multiple options.
My Solution
To overcome this issue, I wrote a quick and dirty app called sonargrep. It accepts gzipped data from stdin and tries to find host entries that contain the data you’re looking for. You can install sonargrep with the following command (assuming Go is already installed):
$ go install github.com/ilyaglow/sonargrep
Usage
For example, here’s how to get a list of WordPress-related IPs:
$ curl -L -s https://opendata.rapid7.com/sonar.https/2018-04-24-1524531601-https_get_443.json.gz \
| sonargrep -w wordpress -i \
| jq -r '.ip'
It accomplishes the following:
- Greps records containing “wordpress” (case-insensitive) in their HTTP response body from the
sonar.https
dataset. - Extracts IPs using jq.
- Does all this without saving the 50GB file to disk.
Update: I’ve created a Docker image for the latest jq GitHub version, so you can perform the same task like this:
$ alias jq="sudo docker run -i --rm ilyaglow/jq"
$ curl -L -s https://opendata.rapid7.com/sonar.https/2018-04-24-1524531601-https_get_443.json.gz \
| gunzip \
| jq -r 'select((.data | @base64d) | match(".*wordpress.*", "i")) | .ip'
Workflow
Here’s how my opendata research workflow with sonargrep now looks:
- Spin up a DigitalOcean droplet
- Start grepping
- Wait for an hour or so, while playing with the results that come in real-time
- Terminate the droplet
This approach allows for quick and efficient analysis of large datasets without the need for extensive local storage or processing power.