Are Proprietary Data Warehousing Solutions Better Than Open Data Platforms? Here’s a Look

essidsolutions

Snowflake has made a few controversial remarks recently. First, they posted an entry on the company blog site, “Where open helps and where it hurtsOpens a new window ”, where they said: “Open is often understood to encompass two broad characteristics, open standards as well as open source. In the appropriate context, these characteristics can enhance value to users of technology systems. However, these characteristics are not universally positive or without drawbacks. For many organizations who have fallen into the trap of assuming that open is synonymous with innovative and cost-effective, they have learned the hard way that neither is the case.”

Then, during the Q1 2022 Earnings CallOpens a new window , CEO Frank Slootman was asked a question that was basically, “how is it that Snowflake is able to keep the hyperscalers (i.e. MS, AWS, and Google) at bay in the data warehouse arena?” Slootman replied that Snowflake started with a clean slate and reinvented data warehousing for the cloud.

He said, paraphrased, that the hyperscalers can’t compete because they straddled on-prem and cloud and implied the hyperscalers are “sitting on a not good architecture.”

While these two items are different, they point to the same thing. Snowflake feels that it’s proprietary offering is better than the open data platforms available. Is that right? Can a proprietary solution be better than an open solution? It’s worked for Oracle for many years. Microsoft too.

Learn More: Can Oracle’s Cloud Data Warehouse Spark Self-Service Data Warehousing Trend?

Does Snowflake Match Up to the Hyperscalers?

Snowflake is no hyperscaler, and it would be hard-pressed to match any stats related to Microsoft’s, Amazon’s or Google’s cloud offerings. Those three push more data in a day than Snowflake does in a month. They offer more services and bring in more dollars. But in terms of just data warehousing, does Snowflake (and Slootman) have a point?

You can read a good analysis in the blog post by Datanami, Do Customers Want Open Data Platforms?Opens a new window In that post, the author references a Dremio co-founder who says the comments are just an assault on open data platforms to get them locked into the Snowflake landscape. Once you are locked in, you are a lifelong customer. He seemed to be referring to Teradata, but my thought was Oracle.

Don’t get me wrong. I love Oracle as a data guy and think it is still a top-tier database. Their cloud offering gets better year by year. But, and it is a big but, they are masters at locking you in. Snowflake seems to be taking lessons from them.

While I absolutely see it as a push towards lock-in, they are also not really wrong. Oracle has really only ever been the leader in database technology. They have strong entries across the IT board, but it is the database where they lead. Why do they lead? They do things the way they think is best without any regard to open standards. They do what is best for the customer, or best for performance, or best for some other category of need. In short, they are closed, so they can respond however they think is best. That closed environment also works to lock in their customers.

Snowflake is leading the big data cloud warehouse quadrant now, but they are not alone. Let’s talk about the hyperscalers and their data warehouse offerings. All three of the biggies do offer Hadoop/spark/etc. style data warehousing. But that is not really what Snowflake is competing against. The distributed, file-based, open data platforms are evolving into more traditional, although MPP, data solutions.

Snowflake is competing against the easier to use MPP SQL databases like Azure Synapse SQL from Microsoft, Redshift from Amazon, and BigQuery from Google. Guess what? All three of those databases are proprietary and closed source.

Learn More: Snowflake CEO on Why Work from Anywhere Is the Future

Of those three, I would say BigQuery is the closest in style and use case as Snowflake. But all three play in the same sandbox. And this is where the comment from Slootman at the earnings call comes into play. First, push a closed source solution as better (for real reasons) and then shoot down the competitors (although they are so much larger that you really can’t call them competitors) for the technical architecture.

What does that leave? If you buy into it, that means Snowflake stands alone. If it stands alone, that’s a lock.

I don’t buy into it. Again, I am not saying he is totally wrong. Open is good until it’s not. A vendor should be able to extend a solution for performance, extra functionality, etc., without worrying about the open label. I also don’t feel every solution has to be open. But being closed and trying to scare potential customers into your environment just to lock them in is predatory.

Slootman didn’t call out any company’s tech stack by name, but the implication was fairly obvious. Of the three I mentioned above, Redshift is the closest in lineage to an open data platform (it was birthed from PostgreSQL). However, of the three, it is the hardest to configure, maintain and scale. Three things that Snowflake excels at. I’d say Slootman makes points on that one.

The Microsoft solution (Azure Synapse) is a mix of dedicated and serverless and is a unique architecture. Not Snowflake-ish, but much easier to configure and scale elastically.

As I mentioned above, BigQuery is closest to Snowflake in form and functionality. It auto-scales as your queries run. You pay for storage and compute time. You use SQL to access your data and both have great access to data external to the actual database (for loading and, in some cases, reading).

Learn More: Exabeam and Snowflake Join Forces to Automate Data Security in Cloud

Bottom Line

Snowflake says open data platforms are on the way out. Are they? Right now, I have to agree. SQL-based databases have become sophisticated enough that they can handle the workloads that previously required file-based MPP like Hadoop and Spark. SQL and the UIs on these databases are more accessible to analysts and other users, so they definitely get the nod.

Snowflake tears down their competitors as having “not good architecture.” True? I don’t see it. It’s true that more traditional data warehouse solutions like Redshift are a bit behind the curve. Still, I would be really surprised if Amazon is not working on a next-gen database (maybe a serverless Redshift evolution). Microsoft and Google are also neck and neck with Snowflake in their data-warehouse offerings.

Finally, are Snowflake’s recent comments an aggressive push for greater market share? I’d say yes, but there’s nothing really wrong with an aggressive push for customers. Which company doesn’t do that? Being closed source is not evil and going after customers is not either.

I’d say forced lock-in is evil. And I hope Snowflake will see that it is in their best interests to be, at least somewhat, open. For one thing, Redshift, Synapse and BigQuery all run in their hyperscale cloud. Snowflake wouldn’t really exist without those hyperscalers. Snowflake doesn’t run in a vacuum, and they don’t offer much beyond a data warehouse.

As long as you can export your data from any of these large warehouses and transport them as needed, they are as open as we are likely to see. The competitive edge right now comes from being “the best” solution, and proprietary looks the best at the moment. AWS Athena and other tools like that are a good first step in that direction. Still, they can’t match the ease of use, performance and accessibility of the proprietary integrated solutions from the hyperscalers.

Did you find this article helpful? Tell us what you think on LinkedInOpens a new window , TwitterOpens a new window , or FacebookOpens a new window . We’d be thrilled to hear from you.