Executives and business leaders often ask about AWS data security for their Amazon S3 Data Lakes. Data is a valuable corporate asset and needs to be protected.
In this blog post we look at AWS Data Lake security best practices and how you can implement these using individual AWS services and BryteFlow to provide water tight security, so that your data remains safe. Read about BryteFlow for AWS ETL
Ensure data resides in your Virtual Private Cloud or VPC.
A virtual private cloud (VPC) is a virtual network dedicated to your AWS account. Hosted virtualization isolates your data from that of other companies — both in transit and in the cloud provider’s network — helping to create a more secure environment.
Use Server-Side Encryption
Use Server-Side Encryption or SSE to encrypt your data when you store data on your Amazon S3 Data Lake. Amazon S3 encrypts each object with a unique key. As an additional safeguard, it encrypts the key itself with a master key that it rotates regularly. Amazon S3 server-side encryption uses 256-bit Advanced Encryption Standard (AES-256), to encrypt your data.
With BryteFlow, enabling SSE, is as easy as clicking a check-box.
Use AWS Key Management Service or KMS
AWS KMS is a managed service that makes it easy for you to create and control the encryption keys used to encrypt your data when you transfer and store it. The customer master keys that you create in AWS KMS are protected by hardware security modules (HSMs).
With BryteFlow, enabling KMS, is as easy as just providing your KMS Key, BryteFlow does the rest for you.
Data masking and Tokenization
BryteFlow provides Data masking and tokenization out-of-the-box. Certain data like financial data or credit card data may be very sensitive and hence masking or tokenizing this is essential.
BryteFlow provides check boxes against data elements, to mask them.
User access control
You can manage access to your Amazon S3 resources using access policy options. By default, all Amazon S3 resources—buckets, objects, and related sub-resources,are private: only the resource owner, an AWS account that created them, can access the resources. The resource owner can then grant access permissions to others by writing an access policy.
Permissions to access data assets can be tied to user roles and permissions for the data processing and analytics services and tools that your data lake users will use. User policies are associated with AWS Identity and Access Management (IAM) service, which allows you to securely control access to AWS services and resources. With IAM, you can create IAM users, groups, and roles in accounts and then attach access policies to them that grant access to AWS resources, including Amazon S3.
BryteFlow Ingest keeps the ready to use raw data in separate buckets and transformed data using BryteFlow Blend in separate buckets, making it easy to give access as required. Only a select group, should have access to raw data and this can be managed very easily through AWS IAM.
In 2019, we also have AWS Lake Formation
AWS Lake Formation is an AWS service which has been released GA in certain regions. AWS Lake Formation is a service that helps you set up a secure data lake effortlessly.
It plugs into AWS Big Data services effortlessly so you don’t need to set up for each on an individual basis.
It takes the effort out of security management by providing a single place to centrally define security, governance, and auditing policies versus doing these tasks per service, and then enforcing those policies for users across analytics applications.
Superior access control: With S3, you can only control access to a single object or a file, however with AWS Lake Formation, you can get fine grain control by columns in an object or file.
BryteFlow provides an automated solution for building and maintaining a data lake and interfaces with Glue Catalog and AWS Lake Formation, making it easy to set up security across the AWS eco-system.
Audit Access frequently
As per the following example, IAM policies can be further complemented with Metadata solutions using a combination of tools like AWS Lambda, Elasticsearch and Kibana. This enables clients to have real time visibility and alerts across Amazon S3 data search and access activity. In the example below, a client has visibility across the list of users that failed authentication using a filter on the agentname, agentid and ruledescription fields.