Data warehouses are valuable resources for business intelligence. Unlike Excel or other tools that might be used more for ad-hoc reporting, data warehouses give you the ability to capture significantly more data and process it for a number of different uses. From powering applications, to reports and charts on Wall Street, databases are the backbone of our technology infrastructure.
Like most of the technology industry, data warehousing has gone through a major shift. Cloud data warehouses are replacing the older on-premise systems. They’re easier to manage and upgrade, and several large SaaS warehouses from major providers have emerged. Businesses that need to analyze terabytes or petabytes of structured data are finding that a cloud service does the job best.
WHAT IS CLOUD DATA WAREHOUSING?
Any data warehouse has certain characteristics. It’s very large, generally in the terabyte range or higher. It holds structured data, much like a regular database. It’s optimized for batch updates and queries on vast amounts of data, using massively parallel processing (MPP). The data sources include information streams, business databases, and structured documents.
Unlike some other forms of big data, the contents of a data warehouse are validated and stored according to a schema. The two major approaches to data warehouse design are the dimensional and the normalized. Dimensional data warehouses organize data as “facts” such as sales figures and “dimensions” which identify their context. Normalized data warehouses store data in tables, much like an ordinary relational database.
Traditional data warehousing is based on appliances which consist of a number of parallel modules. Management can add modules as the amount of data grows. Cloud data warehousing provides the equivalent functionality on cloud servers. To expand, system managers only need to upgrade their plan. They don’t have to make a large up-front investment in hardware.
A cloud data warehouse can be integrated easily with other cloud services for ingesting and analyzing data. Getting an on-premises device to work well with data sources and BI tools is often a major task.
THREE MAJOR PLAYERS
Amazon Redshift is a leading competitor and part of the AWS family. It’s based on PostgreSQL and designed for large analytic workloads. It claims to be “the world’s fastest cloud data warehouse” and to offer “virtually unlimited concurrency.” The pricing scheme supports small organizations as well as Fortune 500 enterprises. It uses predictive techniques to optimize performance and caching to improve response times. A rich selection of AWS integrations is available. Because Redshift is an AWS product, it is hosted by AWS and does have an inclination towards working best with an AWS tech stack. Be prepared to go all in on Amazon products if your team plans to build on Redshift.
Snowflake is a Silicon Valley data warehousing company whose service runs on AWS, Google Cloud and Microsoft Azure. Snowflake separates the compute layer from storage, allowing users to completely customize their database to specific needs. They platform is built on “virtual warehouses” which are fully independent clusters of computing resources, allowing users to take advantage of performance when needed and shut down compute resources when systems are not utilized. A great example of this is reporting for the stock market. The system gets queried by millions of users throughout the working day and handles performance as needed, while at night scaling down the amount of resources used based on less traffic, saving the provider money. Storage and computing functions are separate, which simplifies scaling and is one of many reasons this provider recently received a $12 billion valuation.
Google BigQuery takes a different approach. Google calls the service “serverless,” meaning it provides a FaaS (function as a service) interface. Setup is very simple, with nothing to configure. It’s built on Google’s Dremel query system. Queries are issued in a form of SQL, but the results are returned in JSON format. Access can be through a REST interface, a client library, or a command-line interface. BigQuery includes a BI engine and machine learning, making it well suited for analytics. It’s available on the Google Cloud Platform.
Cloud data warehousing increases the value of the information a business has. It brings everything together to provide analysis and insights for better business decisions. The scalability and integrations which it offers have led to a massive shift from on-premises warehouses.
Choosing the best cloud data warehouse for your business requires considering the existing services you use, the types of analysis you want to do, as well as the cost. At Helios, we help businesses turn ideas into actions, by helping you select the best data warehouse for your application, and fully managing your data warehouse deployment. Connect with our team to learn more!