Microsoft: Cloudera now supports Azure Data Lake Store
With the release of Cloudera Enterprise Data Hub 5.11, you can now run Spark, Hive, and MapReduce workloads in a Cloudera cluster on Azure Data Lake Store (ADLS). Running on ADLS has the following benefits:
- Grow or shrink a cluster independent of the size of the data.
- Data persists independently as you spin up or tear down a cluster. Other clusters and compute engines, such as Azure Data Lake Analytics or Azure SQL Data Warehouse, can execute workload on the same data.
- Enable role-based access controls integrated with Azure Active Directory and authorize users and groups with fine-grained POSIX-based ACLs.
- Cloud HDFS with performance optimized for analytics workload, supporting reading and writing hundreds of terabytes of data concurrently.
- No limits on account size or individual file size.
- Data is encrypted at rest by default using service-managed or customer-managed keys in Azure Key Vault, and is encrypted with SSL while in transit.
- High data durability at lower cost as data replication is managed by Data Lake Store and exposed from HDFS compatible interface rather than having to replicate data both in HDFS and at the cloud storage infrastructure level.
To get started, you can use the Cloudera Enterprise Data Hub template or the Cloudera Director template on Azure Marketplace to create a Cloudera cluster. Once the cluster is up, use one or both of the following approaches to enable ADLS.
Read the entire article here, Cloudera now supports Azure Data Lake Store
via the fine folks at Microsoft.