Chaos Sumo BETA FAQ – Smart Object Storage on Amazon S3
Chaos Sumo BETA FAQ
What is Chaos Sumo?
Chaos Sumo is a Smart Object Storage service built on top of Amazon S3. With Chaos Sumo’s powerful discovery capabilities, visual data refinement studio, and RESTful API, organizations can quickly get more out of their cloud object storage. Users can easily explore, process and prepare data for analytics in the cloud, and build intelligent data pipelines without moving files or replicating storage. With Chaos Sumo, Amazon S3 is streamlined and organized, and through a programmable API data can quickly be transformed, coded, and queried on the fly. Chaos Sumo turns S3 into an intelligent data platform.
Q: Who can benefit from Chaos Sumo?
Chaos Sumo is a general-purpose Smart Object Storage service with built-in data preparation and analytics capabilities. Customers that want immediate insights into their S3 buckets, or want to reduce costs or need better performance and manageability of their data within S3 will get immediate value from Chaos Sumo. In addition, Chaos Sumo can augment services such as AWS EMR, AWS Athena, AWS Redshift, or AWS ElasticSearch for data insights/analytics. So, we see several user types that are benefiting from Chaos Sumo – DevOps, web developers, data analysts and serious data scientists all can benefit from the platform.
Q: How does Chaos Sumo work?
Chaos Sumo is a serverless solution that creates an abstraction layer on S3 and uses S3 to both store customer’s data (no change or movement) and Chaos Sumo internal metadata. Just as S3 is a cost-effective scalable storage solution, Chaos Sumo was designed to distributively scale from our use of the Scala/Akka framework and unique Data Edging technology.
Q: What can I do with Chaos Sumo?
Chaos Sumo has been designed to add immediate value to anyone that uses Amazon S3. The service turbocharges S3 so that users can immediately understand what’s in their buckets, organize and view their data gaining first order insights via both relational and text search analytic queries across a multitude of data sources (e.g. CSV, LOG, JSON, TEXT). Data can also be easily prepared for use with higher level visualization tools.
Q: What are some example use cases?
• Data discovery – with the growing amount of data exhaust piling up in S3, Chaos Sumo is a great way to quickly and easily discover what’s in your buckets with high level aggregate views of your data as well as the ability to view into specific files. • CSV and log file management – quickly draw insights from standard and malformed files. With many services outputting data in csv or log formats, there is often value locked inside those files but the files can be large, messy and poorly organized. • Chaos Sumo makes it easy to organize files into logical groupings and query across these new object groups. • Detecting Sources of Risk – What are the shared behaviors of people trying to hack my website? • Stream Analytics & Content Recommendations – If users bought this type of gardening gloves, what other products might they be interested in? • Identifying Relationships – Which people on Stack Overflow have expertise in both Hadoop-related technologies and python-related tech?
Q: How do I get started with Chaos Sumo?
Chaos Sumo is an always-on service that has a web-portal that one can easily access via a browser, specify their AWS S3 public credentials (see portal “how to” documentation), and start storing new or discovering existing data, to ultimately derive business value – simply and directly on S3 – without incurring an expensive overhead of other data platforms and transformation processes. To get started with Chaos Sumo, simply log into the Chaos Sumo service on AWS using your S3 credentials.
Q: What are the service limits associated with Chaos Sumo during the Beta period?
Please click here to learn more about Chaos Sumo Beta service limits and guardrails.
Q: What skill set do I need to utilize Chaos Sumo?
Chaos Sumo was designed from the outset to bring data analytics to the masses. The overarching principle is to couple the simplicity of S3 with a performant data analytics platform – while keeping it as easy to use as native S3. Chaos Sumo is S3 with simple API extensions to perform data discover, refining, and queries traditionally done through complicated and expensive scaffolding. Even if you are not familiar with S3’s API, Chaos Sumo’s graphical user-interface is all one needs to perform any phase of analytics for business insights. The technology is suited for a broad range of users including developers, devops, IT admins, product managers, business analysts and data scientists.
Q: Can I use Chaos Sumo’s console like I use the S3 console?
Yes. All S3 functionality is available within the Chaos Sumo console. You will not need to toggle back to S3 once in Chaos Sumo. In the Chaos Sumo console, you can:
- Create, delete and empty physical buckets.
- Upload, and delete and files.
- Create and delete folders.
- Get size and download files
Q: What new capabilities does the Chaos Sumo console offer over S3?
Chaos Sumo extends S3 capabilities in both the Console and the API. You can discover, model and query what’s in your buckets. For the first time, you will be able to quickly understand not only what files you have on S3, but will be able to determine what’s in each file. Then you will be easily group and link files of different types, and perform analysis of these files, all without having to copy or move them.
Q: What functionality would I still need to do in the S3 console?
Bucket permissions and properties should be configured through S3 console.
Q: What do you mean by discover / “Instant Insights”
You can quickly discover overall content distribution, structure, type, size and history of your objects in your S3 buckets. This provides a comprehensive analysis and visibility into your S3 buckets and associated data. The process is completely non intrusive and supports all data formats: CSV, LOG, JSON, Text, Image, Video. Chaos Sumo never touches or moves your data.
Q: Can I do anything with the data that’s been discovered?
Chaos Sumo turns your S3 buckets into query-able result sets. S3 bucket views are extended with customizable filters that we call “object groups”. “Object groups” can be further refined into result sets called “virtual buckets”, that can be queried or published to higher-level services for further analysis. All of this is performed in an intelligent data studio using an intuitive drag and drop interface.
Q: What is a Chaos Sumo object group?
Chaos Sumo object groups are customizable filters into viewing what’s in your buckets for fine grain object analysis. Object groups are also the entry point for data refining and modeling.
Q: What do you mean by object modeling?
Chaos Sumo Object modeling is a deep data analysis of S3 content based on an object group filter. Once the analysis is complete a comprehensive report of your data is displayed (e.g. data structure, trending analysis) that can be used to initiate the creation of virtual buckets.
Q: What are “Virtual Buckets” and what can you do with them?
Chaos Sumo is a service, but also an abstraction layer on top of S3. The S3 service has a concept called buckets where one can put, get, and list information like a file system. The Chaos Sumo service extends this bucket concept into a capability we call “virtual buckets”. Upon a request to group object data from a bucket in S3, the customer can drag and drop groups into the refinery studio. From this studio, one can shape, filter, join, and order the groupings into result sets and place the results into a new type of bucket which we call “virtual buckets”. Once a virtual bucket has been created, one can perform queries either through the Chaos Sumo graphical user-interface and/or the API.
Q: How does Chaos Sumo extend the S3 API?
Chaos Sumo is a fully compatible S3 RESTful API where the underlying object storage (e.g. S3) is the backing store. The Chaos Sumo service can be seen as an abstraction or virtualization layer providing discover, refinement, and queries functionality directly within object storage. These extensions are in keeping with the look and feel of S3 and easy to integration into your application and/or service.
Q: What’s the difference between the API and the Chaos Sumo console and when should I use each?
The Chaos Sumo API does everything the console does but allows for much more querying and analysis capability. You do not need to use the API to discover, refine and perform basic data analysis. To perform advanced data analysis or to build an application upon Chaos Sumo you will need to use the API.
Q: What do I need to do to make Chaos Sumo work with S3?
Q: How do I query data and what queries does Chaos Sumo support?
Chaos Sumo offers analytic queries via REST-based semantics in keeping with the S3 API experience. To perform analytic queries, one first discovers S3 buckets, then refines its content, to ultimately create “named” result-sets we call virtual buckets. This discover/refine process allows for the aggregation of data sources that can be shaped, joined and ordered. Once a virtual bucket has been created, queries can be execution via the graphical interface and/or S3 API analytic extensions. Chaos Sumo supports all standard relational and text search queries (see Quick Start Guide, Features and API documentation for actual Beta feature set).
Q: What do you mean by “Intelligent Data Platform”?
Chaos Sumo turns S3 into an Intelligent Data Platform that includes a programmable S3 compliant API with discover, refine and query extensions. You can code directly on S3 and build virtual data pipelines, transforms, and triggers. Data can be queried directly in Chaos Sumo through the studio interface, directly through the API, or using our command line interface. Data can also be prepared and packaged to be published to other services, like Redshift or Athena for analytics or Tableau or QuickSight for viewing.
Q: What do you mean by “code directly on S3”?
Chaos Sumo turns object storage into an application framework. The ability to not only store data, but organize, structure, transform, and ultimately query (all within S3) is metamorphic. With Chaos Sumo, developers can build web, mobile, or really any application around one smart object storage service.
Q: What do you mean by virtual data pipelines?
Chaos Sumo provides the ability to create data pipelines by configuring virtual bucket to be “live”. When a virtual bucket is live, any physical bucket update that is within an object group filter will update the virtual bucket result set.
Q: What type of data types can Chaos Sumo discover?
Chaos Sumo discovers just about all data types found in S3 (e.g. CSV, LOG, Text, XML/JSON, Images, Video).
Q: What type of data types can Chaos Sumo model?
Chaos Sumo is a service designed to model all date types. However, for the Beta period, only CSV and LOG files will be supported.This limitation will be removed during the Beta period at a later date. Known LOG types (i.e. schema) will be automatically discovered. For proprietary LOG files, the discovery user-interface allows for manual configuration.
Q: Does Chaos Sumo support images and/or video data?
Chaos Sumo Data Edging technology can discover, refine, and query any type of data such as images and/or videos. However, during the Beta test period, CSV, LOG or any delimited file formats will be discoverable.
Q: What AWS regions is Chaos Sumo available?
During the Beta testing period Chaos Sumo is only mapped to AWS us-east-1.
Q: What is the difference between Amazon’s EMR, Athena, Redshift, ElasticSearch?
Chaos Sumo is a fundamentally different service than what is currently in market. Chaos Sumo is at its base “intelligent cloud storage”. This intelligence manifests itself by introducing built-in services critical to successfully creating analytic solutions. Chaos Sumo offers three phases of data analytics: Discover, Refine, and Query. Amazon’s Athena, Redshift, and Elasticsearch offer query (but require much data wrangling and manipulation of S3 data prior to any query). Amazon’s EMR offers refine, but is extremely manual. None of these offer discovery. Without all three phases, data lakes quickly become data swamps.
Q: When should you use a full featured enterprise data warehouse instead?
Chaos Sumo is a cloud first service. If you are using or planning to use object storage such as AWS S3, Chaos Sumo is a good fit for your data warehousing needs. However, if you have not yet moved to the cloud and/or use advanced features of traditional data warehouse solution, Chaos Sumo might not be the right fit.
Q: Can I pump stream data into Chaos Sumo like Amazon’s Kinesis firehose?
Chaos Sumo turns object storage into a real-time streaming service. During the Beta period, this capability will be turned off. Until then, like AWS S3, Chaos Sumo can store a Kinesis stream. However, during General Availability this fall, Chaos Sumo can be a replacement of a Kinesis type service; further reducing your overall big data architecture scaffolding.
Q: I have huge quantities of log data in S3, can I use Chaos Sumo to query it?
Chaos Sumo is designed for simplicity and scale via its distributed architecture. During the Beta test period, gigabyte data sets are allowed (though, we recommend smaller datasets for the best experience). However, during General Availability, terabytes and petabytes will be supported.
Q: Can I use Amazon Redshift, QuickSight, Tableau or Looker with Chaos Sumo?
Chaos Sumo supports the entire S3 ecosystem. Chaos Sumo is S3 compatible object storage and as a result, visualization tools such as QuickSight, Tableau and Looker work seamlessly
Q: Is this like AWS Data Pipeline? What are the differences?
Chaos Sumo’s built-in and unified discover, refine, query services, is an internal data pipeline. The difference is that Chaos Sumo provides instant “virtual ETL” which requires no data movement or wrangling prior to query. This design increases efficiency by decreasing time-to-results and ultimately business insights while reducing complexity and cost.
Q: Is Chaos Sumo secure and durable?
Chaos Sumo is built on top of AWS, making your data highly available and durable. Amazon S3 provides durable infrastructure to store important data and is designed for durability of 99.999999999% of objects. Your data is redundantly stored across multiple facilities and multiple devices in each facility. Chaos Sumo uses S3 as its backing store with all the security and reliability you are familiar with. And since Chaos Sumo does not move your data, there is no change in its security.
Q: Can I provide cross-account access to someone else’s S3 bucket?
Chaos Sumo uses S3 and its role-base access paradigm. Cross-account access is not supported aspect during the Beta period. As Chaos Sumo moves from Beta to Beta, we plan to add to this capability with respects to data governance and automated policies.
Q: What does Chaos Sumo cost and how do I pay for it?
We haven’t determined pricing and packaging for the service yet. We’re looking for feedback during Beta so let us know what you think.