Build a hybrid data lake on the AWS Cloud with WANdisco Fusion and AWS services
The Quick Start provides the option to deploy a Docker container, which represents your on-premises Hadoop cluster for demonstration purposes, and helps you gain hands-on experience with the hybrid data lake architecture. WANdisco Fusion replicates data from Docker to Amazon S3 continuously, ensuring strong consistency between data residing on premises and data in the cloud. You can use Amazon Athena to analyze and view the data that has been replicated.
You can also customize the Quick Start to enable a disaster recovery scenario for your on-premises Hadooop cluster, by provisioning an Amazon EMR cluster that references the data replicated into Amazon S3.
AWS CloudFormation templates automate the deployment and provide customization options for network resources, WANdisco Fusion, and AWS services. You can choose to build a new virtual private cloud (VPC) infrastructure that’s configured for security, scalability, and high availability, or use your existing VPC infrastructure for the hybrid data lake.
Read the entire article here, Build a hybrid data lake on the AWS Cloud with WANdisco Fusion and AWS services
Via the fine folks at Amazon Web Services.