Data Apps in Practice: Cybersecurity on Snowflake
An interview with Omer Singer, Head of Cybersecurity Strategy at Snowflake
Apps on Snowflake have become a popular discussion topic in the data community but can often be hard to conceptualize from the 30,000-foot view most thought pieces (mine included) are written from. To bring some of the ideas of my last Snowflake post to life, I spoke with Omer Singer, Head of Cybersecurity Strategy at Snowflake, about how security companies are actually building on top of Snowflake.
I learned a lot from speaking with Omer about how the innovations that fueled Snowflake’s success in analytics can be translated into security, and how critical partners (usually earlier-stage startups) are in that journey. I hope it offers some concrete examples of the ways new companies can build on the shoulders of recently emerged giants.
This interview has been edited for clarity and brevity.
Thanks for doing this, Omer! Let’s start at the top. There are a lot of places where Snowflake could focus—why did you decide to zero in on security use cases?
The journey actually started with what we needed to secure ourselves. At Snowflake, customers trust us with their most sensitive asset—their data. So for us, security was paramount. As we were growing, the only way that we found we could get the visibility and the automation that we needed to successfully secure everything we were doing at scale was by using Snowflake as a single source of truth.
We had many, many terabytes a day being generated and needed to analyze that. We didn't want to build a huge team to just throw bodies at the problem. Instead, we wanted to really take an analytics approach to it. So, we found that Snowflake worked really well for our security program.
And then we were telling customers about how we're protecting their data, they said, “You know what? That's interesting because we actually are struggling with this as well.” And so we started talking to customers about their approach to doing detection and response at cloud scale, to identifying risks in their environment, to measuring the improvement of their security program over time. And we found that everybody was struggling with this and that the current standard approach was actually not working well in this new cloud-centric reality that we were all in.
And we found that we were able to identify and work with vendors that saw the world the way that we did. Where wouldn't it be nice if customers could own all their data in their own data platform with the vendor enabling that? So the ecosystem started coming in and it became much easier. We started seeing that there's a real problem here, especially around SIEM (Security Information and Event Management), with customers very much struggling and actively looking for alternatives.
At the same time, we saw these kinds of open threat detection solutions that embrace the idea of running on Snowflake gaining more and more customers. And we decided that this will be a key focus for the company and to work on it.
I'd love to hear a bit about the SIEM use case. It seems like that's been the biggest area of focus for you so far. What are the initial tangible benefits that you see customers realizing when they move their SIEM to Snowflake?
The existing SIEM model is one that almost feels personal for me. On the one hand, I see the promise that the traditional SIEM vendors make when they say, “We will centralize all your data in one place.” And then I see the reality. You talk to their customers and none of their customers have all their data in one place. Of all the customers who I've spoken to that rely on a traditional SIEM that uses a traditional architecture under the hood, they have some of the data in the SIEM for some of the time. They have a lot of cloud data sitting out there in cloud buckets, a lot of data that they're never pulling in from their source systems, and that puts them in a very challenging position to detect threats. How can you detect threats across the different sources if they're not centralized?
It also makes it impossible to do effective behavior analytics, because you would need to be able to understand what a single user is doing on the endpoint, in the cloud, etc. And as long as that data is siloed, the behavior analytics don't work.
And then there’s the broader analytics issue. If you look at the kind of analytics that the traditional SIEM supports, it's actually very limited. It's actually just search. If you know what you're looking for, you can search for it and find it. But the reality in security is that oftentimes you want to identify TTPs (Tactics, Techniques, and Procedures), which are pretty fuzzy concepts. You're not actually looking for something very specific, like if you knew exactly what you're looking for, you'd block it to begin with.
So the search approach, while it's the standard, is actually not very good and really what’s much more effective for threat detection and response is when you're able to identify emergent properties that you identify through true analytics and joining data sets. These things aren’t even supported by the languages that the traditional SIEMs have.
And then in terms of how you make this data accessible, the fact that the traditional systems don't work with enterprise BI, you have security working in a totally separate stack. They need to take screenshots of the insights they have in order to share with their teams. Those are all things that we're changing.
How much of this paradigm change do you attribute to a difference in pricing approach, namely not penalizing people for ingesting data and just charging them for compute instead?
Pricing is just a side effect of architecture. And that's why the traditional SIEM solutions out there trying to change their pricing models are saying, “Oh, we don't charge you based on ingest. We charge you based on Virtual CPU usage or whatever.” That doesn't actually fix the problem for the customer. The underlying architecture still requires more servers to bring in more data, and they're still storing that data in those servers very much in a traditional data center approach.
To solve the problem, you must have the separation of storage from compute. You must use cloud native storage directly for storing the data. And then that architecture, which is something unique that Snowflake brings to the mix here, then enables the pricing.
It's significant not in that it's reducing the budget for customers because security teams have a zero fail mission–they've got to get this right. But it means that they're not held back from having complete visibility and that they're not trying to manage a multi-tier storage approach with hot, warm, cold, frozen, rehydrate, restore, and replay. That may look okay on paper but doesn’t work well when you have Log4J or you have SolarWinds, and you've got to go back now 13 months and figure out have you been affected and where. That's all enabled by the architecture, and that's kind of an easy one.
I think the analytics is something that is all a bit more contentious. I do hear from people, “But security doesn't know SQL” and all that. And I think that's why our “connected app” model is important to reduce the learning curve. You have these connected apps that have the kind of interface that you’re used to and translate behind the scenes. But the reality is that long-term security is absolutely learning analytics and seeing a ton of value.
I just spoke to a customer today, and they have a new team with a new lead for an Infosec Analytics team. That's so cool, right? And they're able to do it because they're using Snowflake. So I think all of that is much more significant, especially in the long run than the pricing. But the significantly lower cost on a per ingest, per retention basis enables or removes the limitation that that is holding back the industry in a big way
There's the technical component of ensuring there are benefits to moving your security on to Snowflake. And then there’s the go-to-market component of this as well, where Snowflake has historically been a tool that data teams buy and that is used for analytics. How do you go about getting security teams comfortable with moving away from tools that they might have used for a decade or two to kind of a new paradigm?
Yeah, it's interesting that security is just so far out and kind of their own separate stack even among customers who are doing tremendous things in their business. It's a conversation that we oftentimes involve our partners in. We do have this ecosystem that has a very strong DNA in these areas. And so you look at and in the same space, companies like Hunters, Panther, Securonix—their customers love them, they have the awareness, and they're able to talk in SOC to SOC kind of conversations.
But at the same time, we bring in new concepts where a lot of times there is nothing happening there today. And a lot of that could be key risk indicators and key performance indicators for the security program. It makes a ton of sense—you want to have that. When we talk to security teams about that, they get it. They just don't know where to get started because they've never been able to do it. So we talk about it not from a syntax perspective, like here's what it will mean to use this language or that language—that’s not of interest. It's about here's the new outcomes that you could drive. You can start measuring your security posture and giving visibility to your leadership and to other stakeholders and involve them in ways that you couldn't involve them before.
We do see increasingly a partnership forming between data and security. And that helps because security has been working alone for so long. They assume initially that they'll need to really take all this on themselves. And we tell them, “No, you can actually have a partnership.” If you look at marketing and finance, those are much less technically savvy departments on a whole than cybersecurity, but they still do amazing things with data because they're able to explain in English what they care about. They work with the data team to translate that into SQL, Python, BI, data science, whatever it is, so that they can measure churn and forecast the growth. That's now happening on the security side as well.
When you see real pushback from the security team on moving to Snowflake, what are the typical reasons? What are typical blockers that might prevent somebody from wanting to make the switch?
One of the most common areas of concern is around managing the security data lake because security teams got burned in the past—you know, once burned, twice shy. You look back to 2014ish. There was a lot of excitement around what Hadoop was going to do for security, and the security data lake was going to be this magical thing that gives you kind of all your data in one place. And it failed in a pretty spectacular way to deliver on its promise. And a big reason why was because Hadoop is a notoriously heavy lift, and it's a lot of work to deploy. It's a lot of work to maintain; it's a lot of work to get value from it.
And so the security leaders who have been around for a while remember what happened with Hadoop, and they say, “I don't want a repeat of that.” Now, Snowflake is a very different animal. We are delivered as a service. There is very little overhead involved in getting started or in scaling it up to petabyte scale.
To that point, there's managing scaling and then there’s managing the data model itself. Are you seeing best practices emerging around managing data models within a security data lake, especially given that you might share this Snowflake environment with other teams within your organization?
That concern comes up pretty often—not as often as you would expect, because I think this whole area is so new that a lot of security teams don’t even know that they should be asking about that kind of modeling. They're used to just tossing everything into a search engine and then just kind of Googling for it. It sounds easy to get started; just the results are jumbled when you do that approach. And that's why no analytics team in any other department would take that approach, and we just kind of accepted it in security.
But data modeling is a really interesting area and at the moment, Snowflake is pretty agnostic—meaning you can just put JSON and schema and read query into it. And we're seeing customers build out very sophisticated data models, star schemas. Actually, the Comcast Security Analytics team won the Data Driver of the Year Award last year for their work building a very sophisticated star schema where they have this is an asset, this is a user. It doesn't matter if it's Windows, laptop or a MacBook, or a Linux server—they're able to treat it the same because they've done the modeling.
They also have a significant team dedicated to it. I think this is again where the ecosystem is stepping up. You do see that kind of a lot of thought being put into data modeling by the security solutions that run on top of Snowflake and make sure that is normalized to what they collect. I think it's going to get interesting. Like what about the model between the proactive data, the vulnerability data that involve assets, but also the activity logs that involve assets and are more on the detection and response side? How do you model consistency across those? And you do see companies now like Monad Security going out there and saying like, “We have the Monad object model.” They put a lot of thought into it and hopefully something like that becomes a standard so that there is consistency. But the good news for customers is that they don't have to do it themselves because there are these options off the shelf to get the modeling.
We keep circling around some of these partnerships that you've formed with other especially earlier- stage companies in the security world. I'm curious what your framework looks like for deciding where you want to build capabilities in-house versus where you want to rely on partners to ensure a good solution for your customers.
It's very simple—if it's a data platform capability, we're going to build it. So you heard announcements at the Summit around our new streaming ingest capability, lower latency, lower costs, huge scale. That makes sense for Snowflake to build. And our partners can just take advantage of it. Many now are looking to move to this new streaming framework. Same thing with indexing. So how do you do faster point lookups? That's on us.
When it comes to shipping detection capabilities, behavioral analytics, an interface that is point-and-click for security investigations or for compliance space. For example, automatically validating that an organization meets SOC II or ISO requirements. That's security-specific content and visualizations, right? That's strictly left to our partners. So we're able to drive a very clear line. We're very committed to this ecosystem model. We don't want to compete with our partners.
It seems like there's this traditional partnerships model of you as a larger company giving leads and working on closing customers with these partners. There's kind of the other approach you’re alluding to of helping partners see real architectural and engineering benefits to building in conjunction with Snowflake. I'm curious: How do you view a partnership with Snowflake affecting the engineering approaches that these companies need to take on from their earliest days?
I think that it's a major enabler to compete with the larger players because you have some entrenched players who really tried to do it all. And being the best data platform for security data is already a full-time job. It's hard. Security players, especially newcomers, should not have to invest time building data platform capabilities when they are competing with other companies around building security capabilities. So our partners are able to take advantage of what Snowflake is delivering in the data cloud and focus on building security-specific capabilities. They're able to go to market much faster, but also able to go to market in multiple clouds, in multiple regions, without needing to take on a lot of overhead because we support that built in. Things like disaster recovery, Snowflake support across regions and even cross cloud failover. That is very significant, and they're able to get it kind of off the shelf from Snowflake.
Where we're going is to a place where the CISO will be able to go into the Snowflake Marketplace, click on the use case that they're thinking about—whether it's threat detection or compliance automation or cloud security or identity, or whatever it is that they're concerned about. Click in, find the right solution, and instantly deploy it on their existing source of truth.
That's going to be super valuable for them and a big accelerator for our partners. You can imagine a company that maybe is doing privilege rightsizing, and they can instantly get access to three years’ worth of activity logs and the full user directory and all the IAM configurations. All they need to do is to just be very good at making excellent recommendations. They can focus on that and they can solve that problem to the extent where it wasn't solved for. And a POC can go from being two weeks to being 2 hours.
Yeah, once the CISO has standardized on Snowflake, the incremental integration cost is very limited
Exactly. Why start from scratch with every solution you’re looking at?
SIEM makes sense as an obvious starting point for Snowflake in security, but what comes next from here? What other security use cases are top of mind for you?
The momentum in the ecosystem is incredible right now and we're seeing a lot of partners joining. The areas where we're particularly excited about include compliance automation. I've been shocked at how manual a lot of compliance is. Today we're seeing innovative vendors step up and say, “We're going to go from whatever cadence you're doing it today to doing it continuously. And we're going to collect the evidence to your single source of truth.”
Cloud security is another use case that’s very data-intensive, and it's a big focus area for security teams. So that's one I think where you can unlock a lot of value. And as a result, you can do a lot more with your SOAR by having all your data ready to query.
Note: Two GGV portfolio companies are already partnering with Snowflake to solve these issues. Drata, a compliance automation platform, is beginning to continuously push data collected in compliance processes to Snowflake to provide better context for the security team. Orca, a cloud security platform, is extending the value of its product by letting customers intermingle its findings with other security data, contributing to better threat hunting, among other use cases.
Super interesting. To tie this all together, I’d love to touch on the recent announcements of Unistore and Hybrid Tables. From your perspective, what types of new security use cases do these features help Snowflake address?
Hybrid Tables, as well as significant improvements to the Search Optimization service, are key developments in Snowflake's offering for SaaS providers, in my view. Cybersecurity products today rely on an assortment of databases—Postgres for OLTP and Elasticsearch for data-heavy user interfaces, for instance. Being able to unify all of the application's data storage and processing in a single platform would boost time to market especially for features that rely on analytics.
That's a big deal in the hypercompetitive cybersecurity market. It also paves the way for cybersecurity applications to run entirely within the customer's Snowflake. Ultimately, we want to provide the CISO with an app store experience where innovative new solutions can tap into the single source of truth easily and securely.
Dan, love this talk. Thanks for hosting Omer!