Here’s Why You Need Snowflake Stages (Internal & External)

Snowflake stages serve as a vital component of the Snowflake platform, facilitating data movement between different locations and within a Snowflake Database. Our blog looks at Snowflake stages – both internal and external, and provides an in-depth look at them, including definition, the various types of Snowflake stages, their use cases, and features, and how BryteFlow as an automated replication tool uses these stages.

Quick Links

About Snowflake Cloud

Snowflake Cloud is a native Cloud platform that has a well-deserved reputation for flexibility, speed, reliability, and seamless operations. Snowflake runs on multiple Clouds including AWS, Azure and GCP. It connects businesses across the world, can handle almost any type of data, and provides infinite scalability to process heavy data volumes and different types of workloads. Snowflake Cloud is a fully managed platform that is completely automated and allows for collaborative data sharing with a single source of data. Snowflake architecture consists of 3 layers – Storage, Compute and Services. In this blog we will address the Snowflake storage layer, specifically the Snowflake stage aspect of it. Create a Snowflake Data Warehouse automatically

What are Snowflake Stages?

Snowflake Stages serve as storage areas for data files, from where data can be loaded and unloaded into database tables in Snowflake. These stages help in movement of data within and outside the Snowflake environment, with the option to use either internal or external stage locations. Snowflake stages also provide businesses with the capability to manage data transfer operations while ensuring the security of their data. They offer encryption features that enhance data security and protect against unauthorized access. They empower organizations to efficiently transfer data between external sources like Amazon S3, ADLS Gen2 and internal Snowflake tables, offering flexibility and control in data management processes. Snowflake stages can be used in multiple ways to load or unload data from Snowflake. For example, a business can use a Snowflake stage to land their data from Azure Blob Storage or an S3 bucket before moving it to an internal Snowflake table on Azure or AWS respectively. How to load heavy data to Snowflake fast

What is the Difference Between a Snowflake Stage and a Snowflake Table?

A stage in Snowflake is a temporary holding or storage area that is a named object for storing data files that will be loaded into a Snowflake table. This data can be stored within Snowflake’s internal storage (internal stage) or in an external storage service (external stage) such as ADLS Gen2, Amazon S3 or Google Cloud Storage. The Snowflake stage serves to manage and control access to the data being moved into Snowflake. Compare Snowflake to Databricks

In comparison, a table in Snowflake represents a systematic and permanent aggregation of data. Data can be loaded into a Snowflake table from a stage either in one go or in multiple tranches. After the data has been loaded into a table, it becomes available for querying, transformation, and manipulation, like other data stored natively within Snowflake. Think of the Snowflake stage as a temporary transit camp for holding the data while a Snowflake table represents its permanent residence. How to cut costs on Snowflake

Snowflake Stages: Some Use Cases

Here are a few examples of how Snowflake stages can be used. Learn how to create Snowflake stages
(external link)

  • Data Export: Snowflake stages offer the capability to export data from Snowflake to various destinations, including files, databases, and other cloud storage services. Replicate SAP HANA to Snowflake
  • Data Ingestion from Cloud Storage: Snowflake stages can be employed for importing data from Azure Blob Storage, Amazon S3, and Google Cloud Storage into Snowflake. An external stage can establish a location in cloud storage for file storage, and Snowflake’s Snowpipe can be used to load data from the external stage into a Snowflake table. An automated tool like BryteFlow can make this even easier. Migrate from Teradata to Snowflake
  • Data Import: Stages can be harnessed to import data into Snowflake from a range of sources, including files, databases, and other cloud storage services. For example, a stage can be employed to import data from an Excel file into a Snowflake table. Oracle to Snowflake with BryteFlow
  • Integrating Snowflake with Other Data Pipelines: Snowflake stages can be seamlessly integrated with other data pipelines to enable the transfer of data between Snowflake and other data processing tools. For instance, you can create an external stage on S3 to land data coming from an SAP application, allowing Snowflake to consume and load the data into a table for further analysis. SAP replication to Snowflake
  • Processing Real-Time Data:By using Snowflake’s STREAMS feature, a stream can be created, and an internal stage can be used to store the incoming data. Further Snowflake’s SQL feature can be used to process the data. Oracle to Snowflake: Everything You Need to Know

Different Types of Snowflake Stages

Snowflake Stages are broadly of 2 types: Snowflake External Stage and Snowflake Internal Stage. Postgres to Snowflake : 2 Easy Methods of Migration

Snowflake External Stages

Snowflake external stages, unlike internal named stages, are storage locations outside the Snowflake environment, situated on platforms like Amazon S3, Azure Blob Storage, or Google Cloud Storage. They serve as a gateway for accessing and importing data from external cloud storage platforms. These stages contain essential connection details and access credentials required to retrieve data from external platforms. SQL Server to Snowflake: an Alternative to Fivetran & Matillion

Once the data is inside Snowflake, it becomes readily available for further processing and transformation through SQL queries or other Snowflake tools. This encompasses tasks such as data cleansing, data mapping, enrichment, and aggregation. A Snowflake external stage empowers users to monitor the data transfer process and safeguards data against unauthorized access, ensuring secure data storage. Move data from Salesforce to Snowflake

To strengthen data security, businesses can also opt to encrypt their data within the external stage. Besides facilitating data ingestion from diverse external sources, spanning on-premise systems, cloud storage, and third-party platforms, external stages play a critical role in Snowflake’s integration capabilities. It’s important to note that the external stage itself is not stored within Snowflake’s storage area, as Snowflake neither holds nor manages the stage. Learn about SAP Snowflake Integration

Features of Snowflake External Tables

  • External tables in Snowflake are housed outside of the Snowflake ecosystem and are located in external Cloud providers like AWS, GCP, or Azure.  Postgres to Snowflake : 2 Easy Methods of Migration
  • Snowflake external tables enable access to files residing in Snowflake’s external stages, which could be on Amazon S3, GCP Bucket, or Azure Blob Storage. Create an S3 data lake in minutes
  • Metadata about the data files is stored within Snowflake external tables but not the actual data. Snowflake CDC With Streams and a Better CDC Method
  • Snowflake external tables are designed solely for read-only purposes, prohibiting any DML operations.
  • Snowflake external tables permit the querying of files stored in Snowflake’s external stages just like regular tables, eliminating the need to transfer data from files to Snowflake tables, saving on time and storage costs. Databricks vs Snowflake: 18 differences you should know
  • The external tables in Snowflake support query and join operations, facilitating the creation of views, ensuring security, and enabling materialized views.
  • The external tables in Snowflake allow for querying of data across multiple files by combining them into a unified table.

Prerequisites for Snowflake External Table Configuration

  • The access to reach the external storage holding your data. SQL Server to Snowflake in 4 easy steps
  • The access to log in to a Snowflake account and have the necessary rights to establish a Snowflake external stage and external table.
  • Setting up a Snowflake external stage that links to the specific location of your data within the external storage system. Oracle to Snowflake with BryteFlow
  • Definition of the external table’s structure, encompassing column names, data types, and additional table attributes.
  • Knowing your data’s format, whether it’s CSV, JSON, or Parquet. How to choose between Parquet, ORC and AVRO
  • A basic knowledge of SQL to design and query external tables from within Snowflake.

Snowflake External Stage Use Cases

Some common scenarios where external stages can be used include:

  • Dealing with Extensive Data:When dealing with heavy datasets in external storage systems, external stages are helpful, since they are more suitable for handling large volumes of data and can be used to store and then load the data into Snowflake for further scrutiny or manipulation. Managing huge volumes of data with BryteFlow Ingest
  • Safeguarding and Retrieving Data: In situations requiring data backup or data recovery within Snowflake, external stages prove indispensable for unloading and reloading data to and from external storage systems.

Snowflake External Stages – The Security Protocols

When utilizing external stages for replicating to Snowflake, it’s imperative to implement the appropriate security protocols. This involves:

  • Safeguarding the External Data File Location: Make sure that the external data files are stored in a securely restricted location accessible solely by authorized personnel.
  • Data Validation: Prior to importing data from external files into Snowflake, validate the data to ensure its integrity and to eliminate potential security threats like malicious code. Data reconciliation with BryteFlow
  • Monitoring and Logging: Continuously monitor activities related to the activation and use of external stages and maintain a log of any suspicious events for auditing purposes.

Snowflake Internal Stages

An internal stage in Snowflake is a named storage area within a Snowflake account used for temporarily housing data files during specific operations like data loading or unloading from tables. Snowflake Internal Stage stores data files internally within the Snowflake platform and can be set up as either a permanent or temporary storage area. Once data undergoes transformation and processing within Snowflake, internal stages step in to facilitate the creation of a file format that can be consumed by other systems. SQL Server to Snowflake with BryteFlow

These internal stages are automatically generated and managed by Snowflake and are not directly accessible to users. They function as an interim storage location for data files, either before loading into a table or after unloading from a table. For anyone looking to store data files in Snowflake, internal stages offer an excellent choice. They exhibit flexibility, security, and outstanding performance. Oracle to Snowflake: Everything You Need to Know

Thus, internal stages play a vital role in Snowflake’s data manipulation and integration capabilities, enabling organizations to swiftly transfer data between various systems and accounts, while making it available for access by other systems. Snowflake ETL with BryteFlow

Snowflake Internal Stages: 3 Different Types

There are 3 types of Snowflake Internal Stages:

  • User Stage: Every Snowflake user has a default, personal stage for storing files which cannot be modified or deleted. These user stages are exclusive to each individual user and are not suitable for sharing files among multiple users. These personal storage areas are referred to as User Stages, and they are unique to each user, making them inaccessible to others. The allocation of a User Stage is automatic, and it cannot be changed or removed. Snowflake stores all the worksheets created by the user in the user stage of the user.
  • Table Stage: Each table in Snowflake has an associated default stage for file storage, which cannot be modified or removed. These stages are accessible to multiple users but are only meant for loading data into the specific table they are assigned to. While table stages are efficient for quickly importing a small number of files into a particular table, they are not suitable for scenarios where data needs to be loaded into multiple tables. These stages are essentially storage locations within a table object, but their accessibility is restricted to that specific table, preventing other tables from accessing the stored files.
  • Internal Named Stage: The User and Table stages should not be viewed as distinct database entities, instead, they are inherent stages linked to the user and the table, respectively. Named stages, in contrast, represent more versatile database objects for data loading, effectively overcoming the constraints associated with both user and table stages. It’s worth noting that Internal Named stages require manual creation, distinguishing them from User and Table stages. You have the option to establish an internal named stage using either the Web Interface or SQL.

Named stages in Snowflake are accessible to all users with the necessary privileges and can be used to load data into multiple tables. These stages are storage location objects within a Snowflake Database/Schema and are subject to the same security permissions as other database objects. Unlike user and table stages, internal named stages are not created automatically, but they offer greater flexibility for importing files into various tables and allowing multiple users to access the same stage. In essence, internal named stages are the way to go if you need to share files with different users and load them into multiple tables within Snowflake. How to load terabytes of data to Snowflake fast

Snowflake Internal Stage Use Cases

There are several scenarios where internal stages can be used, including:

  • Exporting Data: Snowflake allows you to export data in various file formats to external systems using internal stages. Subsequently, you can save the exported data in a local system file. This can be done either using named Snowflake stages or Snowflake table stages when unloading data from a specific table. Snowflake CDC With Streams and a Better CDC Method
  • Data Load: By employing internal Snowflake Stages, you can load source files from the local system into multiple Snowflake tables, enabling subsequent data processing. This seamless process ensures that the processed data finds its way into the target table. Postgres to Snowflake
  • Data Processing: If you require data transformation or processing within Snowflake before converting it into a specific file format, Snowflake internal stages offer a convenient way to temporarily store and manage the transformed data.

Snowflake Internal Stage Features

Leveraging Snowflake’s internal stage for data loading provides numerous advantages that enhance the efficiency and smoothness of the data loading procedure. Snowflake’s internal stages enhance the data loading process, making it more efficient, secure, and seamlessly integrated. Here are some of the primary benefits:

  • Optimized Data Handling: Snowflake’s architecture enhances data loading performance and scalability through its internal stages. Leveraging distributed computing and parallel processing, Snowflake efficiently handles large-scale data loading operations. It enables both – rapid data loading into tables by processing multiple data files in parallel, and swift unloading by writing multiple data files concurrently, significantly boosting performance.
  • Enhanced Security: Data stored on Snowflake’s internal stages enjoys the utmost security. Employ Snowflake’s built-in security features to control access, permissions, and encryption, ensuring your data’s confidentiality and integrity.
  • Cost-Effective Data Staging: Snowflake’s internal stages remove the need for third-party cloud storage services for data staging before loading into Snowflake, potentially resulting in cost savings, particularly for organizations dealing with substantial data volumes. How to reduce Snowflake costs
  • Data Format Flexibility: Internal stages support an array of data formats like CSV, JSON, Parquet, Avro, and more. This flexibility enables seamless handling of various data types and formats during the loading process. How to choose between Parquet, ORC and Avro
  • Version Control and Historical Data Management: Use Snowflake’s internal stages to manage different versions of data files, especially useful when maintaining historical records or tracking changes over time.
  • Reliable Error Management and Recovery: Internal Snowflake stages offer robust error-handling and recovery mechanisms, simplifying the identification and resolution of issues during data loading, ensuring a more reliable process.
  • Seamless Integration: Snowflake’s internal stages smoothly integrate with other Snowflake features, including data warehousing capabilities and SQL querying. This integration streamlines data transformations, analysis, and reporting post-data loading. ELT for Snowflake with BryteFlow
  • Cross-Region Data Loading: Snowflake’s internal stages load data from diverse geographical regions, enabling data loading into Snowflake from various locations while maintaining peak performance. How to load terabytes of data to Snowflake fast
  • Convenience in Data Uploading: Uploading data to internal stages is simple and user-friendly, whether through Snowflake’s UI, SnowSQL command-line tool, or API integrations. This convenience simplifies the process of transferring data from external sources to Snowflake.
  • Managed Storage: Snowflake internal stages offer a managed storage solution within Snowflake, eliminating the need for you to manage or provision external storage resources, streamlining the overall data loading process.

Differences between a Snowflake External Stage and a Snowflake Internal Stage

Below is a table presenting the primary distinctions between Snowflake internal stages and Snowflake’s external stages.

Feature Snowflake External Stages Snowflake Internal Stages
Table data access method Accessed via Snowflake external stage Accessed using standard SQL statements
Data storage location Outside Snowflake in an external storage system (e.g., S3, Azure Blob Storage, GCS) Inside Snowflake within Snowflake’s internal storage system
Read/Write Operations Read-only by default, but new data can be loaded using Snowpipe Supports both, Read and Write operations
Table CREATE Statement CREATE EXTERNAL TABLE CREATE TABLE
Data Loading Data is accessed directly from the external storage system Data is accessed from Snowflake’s internal storage
Data Ownership Owned and managed by the external storage system Owned and managed by Snowflake
Use cases Storing data that is frequently accessed or updated Storing data that is accessed less frequently or not updated
Auto-archiving features You can set up auto-archiving features easily Need to build your own Stored Procedure
Data load to Snowflake table Snowpipe auto-ingest or third-party tool like BryteFlow Ingest Through COPY command or REST API calls
Data Storage Ideal for back-up or later use. Storage is cheaper. Storage is expensive. Should normally delete file after loading.
Encryption and Compression Done by the user. Snowflake does all management
Viewing Data Have a user interface to easily create folders and look at files Have to use SnowSQL or LIST commands

Snowflake Stages – Best Practices

Some of the best practices to manage Snowflake stages are as follows:

  • Establish a dedicated stage for every external data source to facilitate seamless access and efficient transfer of data to the Snowflake table, ensuring user convenience. Postgres to Snowflake : 2 Easy Methods of Migration
  • If you have a requirement to load a file into multiple tables, utilizing the User Stage is the ideal approach.
  • If your task involves loading data into a single table only, opt for the Table Stage.
  • Utilize Snowflake’s table staging procedure, enabling users to load data into a temporary table before transferring it to the primary Snowflake table. This ensures a seamless data loading process, minimizing potential issues during transfer.
  • When dealing with large files, it is advisable to employ an External Named Stage.
  • Make use of the Snowflake COPY INTO stage table method to maintain control over the data transfer process and safeguard your data against unauthorized access.
  • The Internal Named Stages are accessible to multiple users and can be employed for loading data into multiple tables. If your goal is to share files with different users and load them into multiple tables, then Internal Named Stage is the recommended choice.

BryteFlow, our No-Code CDC Replication Tool

BryteFlow is a data replication tool that automates the real-time data replication process from numerous sources. It has a user-friendly interface and doesn’t require coding for any task including data extraction, merging, masking, or type 2 history. It does not need external integration with tools like Apache Hudi. Data is consumption-ready and made available at the destination almost immediately. BryteFlow’s CDC to Snowflake

BryteFlow Ingest ensures automatic merging of new change files with existing data using log-based Change Data Capture, keeping your data synced with source without impacting source systems. The tool provides high throughput and time-series data for easy point-in-time analytics. It also offers options to maintain the complete transaction history and supports automated data archiving for historical and predictive trend analysis. Additionally, BryteFlow automates schema and table creation on the destination, eliminating the need for time-consuming data preparation tasks. How BryteFlow works

How BryteFlow uses Snowflake stages in replicating data to Snowflake on various Clouds

Direct and Indirect Loading to Snowflake on AWS: 3 Staging Methods

Directly: using the Snowflake internal stage to replicate to Snowflake on AWS

BryteFlow Ingest facilitates direct data transfer to the Snowflake data warehouse on AWS. It begins by extracting data from your source, loading it into the Snowflake internal stage, and subsequently transferring it to the actual tables on Snowflake. This approach ensures that your source data is readily available for use, whether you need it with or without Type 2 SCD history (BryteFlow provides an option to retain Type 2 SCD history). Build a Data Lake on Snowflake

Indirectly after data transformation using the Snowflake external stage on Amazon S3

BryteFlow, enables you to prepare and transform your data on Amazon S3, before delivering it to your Snowflake database for analytical purposes. The process involves BryteFlow Ingest extracting data from the source and loading it into the Amazon S3 external stage. Subsequently, BryteFlow Blend takes over and transforms the data into a consumable format, such as Parquet, ORC, or Avro. The transformed data is then loaded directly into the Snowflake cloud data warehouse. BryteFlow Blend offers a user-friendly drag-and-drop interface and employs straightforward SQL for data transformation, including real-time changes. ELT for Snowflake with BryteFlow

Indirectly using the Snowflake external stage on S3

BryteFlow Ingest initiates the data transfer by loading it into Amazon S3 before moving the data to Snowflake. It starts by extracting data from the source and loading it into the S3 external stage. BryteFlow software provides log-based CDC for efficiently managing transactional data within the S3 data lake. It creates a variety of files for new record inserts, updates, and deletions. Utilizing an optimized in-memory engine with Amazon EMR, BryteFlow continuously merges new change files with existing data in the S3 bucket, ensuring that your data remains up-to-date. Subsequently, the data is pushed from the S3 data lake to the data warehouse on Snowflake for querying purposes.

Snowflake on Azure: Direct Loading using the Snowflake Internal Stage

BryteFlow Ingest transfers data directly to the internal stage in Snowflake on Azure. It extracts data from the source systems, loads it into the Snowflake internal stage, and then transfers the data to the actual tables on Snowflake in Azure. You have the option to retain SCD Type 2 history.

Snowflake on GCP: Direct Loading using the Snowflake Internal Stage

BryteFlow Ingest loads data directly to the internal stage in Snowflake on GCP (Google Cloud Platform). It replicates data from the source systems, loads it into the Snowflake internal stage, and then moves the data to the actual tables in Snowflake on GCP. An option is provided to retain SCD Type 2 history if required.

Real-time CDC to Snowflake with BryteFlow: Highlights

  • BryteFlow moves data to Snowflake on AWS, Azure and GCP from legacy databases like  OracleSQL ServerSAPPostgresSAP HANATeradataNetezza  and more, and from applications like Salesforce etc. in real-time using enterprise-level, log-based CDC.
  • BryteFlow is a completely no-code tool that automates  data replicationCDCdata extraction,  data type conversion,  data mapping,  SCD Type2 history, and DDL (Data Definition Language) etc.
  • It creates the schemas automatically on Snowflake with the optimal data types and implements replication using Snowflake best practices. Source to Target Mapping Guide (What, Why, How)
  • For very heavy datasets, BryteFlow XL Ingest implements the first full refresh of data with parallel, multi-threaded loading and configurable, smart partitioning, delivering petabytes of data in minutes. BryteFlow Ingest handles incremental data, merging changes automatically with initial data. Oracle to Snowflake: Everything You Need to Know
  • BryteFlow has an intuitive, user-friendly interface that can be used easily by business users.
  • It has a very high throughput of approx. 1,000,000 rows in 30 seconds (6x faster than GoldenGate).  About GoldenGate CDC
  • It delivers analytics-ready data on Snowflake that can be queried immediately with BI tools of choice.
  • BryteFlow has automated network catch-up and resumes operations automatically from the point where it stopped, when normal conditions are restored.

Conclusion

This article has introduced you to the different Snowflake stages to help you decide which one to adopt. Internal stages enable efficient data processing within Snowflake, while external stages offer flexibility and cost-effectiveness for handling large datasets stored in external cloud storage.

By combining both, Snowflake internal and external stages, you can optimize data processing, storage, and analysis, and create secure and efficient data management workflows. Snowflake’s robust security features ensure data can be safely stored and accessed in both types of stages, ensuring compliance with data governance and privacy regulations. Take a firsthand look at BryteFlow’s CDC technology for Snowflake. Get in touch with us for a Demo.