How to slash Snowflake costs by at least 30%

Why you need a data warehouse

In today’s Big Data world, organizations have data coming from diverse sources. In order to integrate this data for analytical purposes or ML, companies build a data warehouse to optimize and benefit from the data they have. The data warehouse provides them a platform that can be used for business intelligence, data analysis, reporting and many other use cases. Learn how to load large volumes of data to Snowflake fast

The Snowflake Data Warehouse is an analytical cloud data warehouse, and is faster, easier to use, and more flexible than most data warehouses (on-premises or Cloud). However it can prove expensive if you don’t follow some simple rules. SQL Server to Snowflake in 4 Easy Steps

What is the Snowflake Data Warehouse?

The Snowflake Data Warehouse is a data warehouse based on the Software-as-a-Service (SaaS) concept. Snowflake offers immense scalability, is extremely easy to use and offers more flexibility than traditional data warehouses. Every user and workload has access to instantly and infinitely scale compute resources with a platform that does not need management. Build a Snowflake Data Lake or Data Warehouse

Snowflake Data Warehouse Benefits

Snowflake makes it easy for organizations to incorporate data into business by integrating business applications and sharing live data with business partners and customers. Organizations can integrate and analyse previously siloed data to gain valuable insights while adhering to policies of compliance and governance. What makes Snowflake unique is that there is no infrastructure to be installed, configured, managed or maintained, no software or hardware to be installed. It is all handled by Snowflake that runs solely on cloud infrastructure. It uses virtual compute instances for compute requirements and a storage service to store data. BryteFlow as an alternative to Matillion and Fivetran for SQL Server to Snowflake Migration

Snowflake Pricing

Snowflake offers various payment plans for their cloud data platform. ‘Snowflake on Demand^TM’is a usage based, per second Snowflake pricing plan that is fast and easy. Pre-purchased Snowflake capacity plans are also available. Compute usage is billed to users on a per second basis, minimum being 60 seconds. View Snowflake Pricing Details

Here’s how you can save on Snowflake costs with some simple tips:

Understand the pricing model

Snowflake separates data warehousing into three distinct functions: Compute Resources, Data Storage, and Cloud Services. The costs associated with using Snowflake are based on your usage of each of these functions. SQL Server to Snowflake in 4 Easy Steps

It’s always good to track down the usage which can be done by:

Analysing usage data to see which workflows are the most expensive. Oracle to Snowflake: Everything You Need to Know
Using Account Usage Views for tracking history, performance and cost Snowflake CDC With Streams and a Better CDC Method
Creating roles by business functions to track spending of various departments
Using Resource Monitors to cut off DWs when you hit predefined credit amount limits

Size efficiently to reduce Snowflake costs

Scale, create, and optimize your cloud compute resources. E.g. scale up or out appropriately. Turn on or off the warehouse for certain workloads or when not in use. Use warehouse monitoring to size and limit the cost per workload. Databricks vs Snowflake: 18 differences you should know

When you have a queuing up of workloads, then add more compute clusters
When the workload is slow without queuing, then increase warehouse size

Query control parameters can help to control Snowflake compute costs

Query processing and concurrency can be controlled by setting certain parameters:

Max_concurrency_level
Statement queued timeout in seconds
Statement timeout in seconds

Data Loading needs to be optimal

Sort on ingestion: Data is automatically partitioned in Snowflake on an ingestion order. Sorting an S3 bucket before bulk load via COPY could be a lot faster than inserting with an ORDER BY Why You Need Snowflake Stages
CSV (Gzipped) is the best format for loading to Snowflake (2 to 3 times faster than Parquet or ORC) Postgres to Snowflake : 2 Easy Methods of Migration
Use COPY INTO instead of INSERT because it uses the more efficient bulk loading processes. Why automate your Data Pipeline?

Efficient script writing lowers Snowflake compute costs

Poorly written code can cause severe bottlenecks in any database or warehouse and drive up costs. It’s worth your effort to write efficient SQL code rather than adding more and more compute. ELT in Data Warehouse

Get rid of temporary and transient tables when done
Don’t use “CREATE TABLE AS”. Instead, use “CREATE OR REPLACE”
Use COPY INTO not INSERT INTO
Use staging tables to manage transformation of imported data
Use ANSI Joins because they offer more efficient optimization
Don’t use ORDER BY if you can. Sorting can prove expensive.
Don’t handle duplicate data using DISTINCT or GROUP BY

Storage practices for better ROI on Snowflake

Storage plays an important role with Snowflake costs. Optimize storage to keep costs low

Opt for S3 buckets in the same geographic region
Your bucket specifications be in line with how files are coming across (e.g. by date or application)
Files should be kept between 60–100 MB to benefit from parallel loading
Avoid using materialized views except in specific use cases (e.g. pre-aggregating)

Save big on Snowflake costs by loading data right

The most important cost savings by far, lie in how you load your data and how your queries are written. If you are replicating your data constantly with incremental loads, this becomes a very important factor to saving costs. BryteFlow as an alternative to Matillion and Fivetran for SQL Server to Snowflake Migration

BryteFlow enables the lowest compute costs on Snowflake

Our customers tell us that BryteFlow uses the lowest compute when replicating data to Snowflake. This is because it uses all the best practices implicitly without any coding and hence is a massive return on investment. Not only does it get the data to Snowflake really fast, reducing development and implementation time to deliver quick time to value, but also reduces Snowflake compute costs on average by 30%. Oracle to Snowflake: Everything You Need to Know

The BryteFlow software has two modes of getting ready-to-use data into your Snowflake database:

1) Directly to the Snowflake data warehouse: In this method the source data is ready-to-use with Type2 SCD history or without history for Snowflake hosted on AWS, Azure or GCP. This uses the internal stage on Snowflake for optimization, and saves on setting additional security measures on the cloud platform. SQL Server to Snowflake in 4 Easy Steps

2) Indirectly: Prepares and transforms data on Amazon S3 and then pushes it to your Snowflake database for data analytics. This can include real-time data transformations, after which the data is loaded directly to the Snowflake Data Warehouse. With this method you have the option of data storage on S3 (very economical) and pushing only the data you need for analytics to Snowflake, so in effect you pay only for the compute, reducing your overall Snowflake costs. Databricks vs Snowflake: 18 differences you should know

BryteFlow follows all recommended best practices when uploading to Snowflake out of the box.

Provides support for CSV (Gzipped) file format
Controls the number of files to be prepared for loading when using COPY command Postgres to Snowflake : 2 Easy Methods of Migration
Multi-thread parallel loading capability to speed up the loading process
With its Auto-recovery and Resume feature, you can stop-start your warehouses on-demand. No need to keep it up and running all the time. Turn it off when not in use and that alone can save you a ton of money.

Other important features of BryteFlow for data replication tool include:

Transaction log-based Change Data Capture with high performance and zero impact on source besides other CDC options
Real-time automated data replication and transformation with real-time access
Very high throughput – 6x faster than Oracle Goldengate for Oracle sources
Automated file merges on your Snowflake database with automated SCD Type2 history
Automated optimisation for the Snowflake Data Warehouse Snowflake CDC With Streams and a Better CDC Method
Option for remote Oracle log mining Oracle to Snowflake: Everything You Need to Know
Full extract and CDC – high performance and support for large volumes
Automated reconciliation out-of-the-box for your Snowflake data to source (at column level)
Automated, light weight Change Data Capture for Oracle, SAP, SQL Server, PostgreSQL and MYSQL
Delivers ready-to-use data or can be used further in the pipeline for real-time data preparation
Referential integrity, Metadata and Data lineage
Cost control mechanisms to lower costs and high availability out-of -the-box
Enterprise grade security using KMS, SSE, masking and tokenization for sensitive data
Recover automatically from network drop outs and source dropouts with constant retry mechanism
Customisable alerting and monitoring- Integration to CloudWatch Logs, Metrics and SNS
Swap instances whenever required with configuration
Automated dashboard displaying data latency, duration etc.

These are some simple options to reduce data costs on Snowflake. Or you can check out BryteFlow for data integration on Snowflake. BryteFlow automates the best practices you need to keep Snowflake costs well in check, so cost savings are also automated! Get a Free Trial of BryteFlow

How to slash Snowflake costs by at least 30%

Why you need a data warehouse

What is the Snowflake Data Warehouse?

Snowflake Data Warehouse Benefits

Snowflake Pricing

Here’s how you can save on Snowflake costs with some simple tips:

Save big on Snowflake costs by loading data right

BryteFlow enables the lowest compute costs on Snowflake

The BryteFlow software has two modes of getting ready-to-use data into your Snowflake database:

BryteFlow follows all recommended best practices when uploading to Snowflake out of the box.

Other important features of BryteFlow for data replication tool include:

Flexibility. Choice. Open Architecture.

RELATED POSTS

SQL Server to Snowflake in 4 Easy Steps (No Coding)

How to load terabytes of data to Snowflake fast

Oracle to Snowflake: Everything You Need to Know

Here’s Why You Need Snowflake Stages (Internal & External)

SOLUTIONS

PRODUCTS

CUSTOMERS

RESOURCES

INDUSTRIES