Yesterday’s data is predictable and easy. Our challenge as data engineers is getting ready for tomorrow’s unpredictable new data. With the push toward more flexible solutions, let’s discuss how you can set up your data engineering team for success by implementing flexible, scalable, and future-proof data architecture.
DATA IS BIG AND GROWING IN ALL DIRECTIONS. SCALE ISN’T THE PROBLEM. PREDICTABILITY IS.
Today’s businesses need to prepare for how to leverage new data to inform data driven decisions. Business can’t only rely on old varieties of data to make decisions. Businesses need access to advanced analytics run against new varieties of data such as computer logs, IoT, and clickstream data. This blog posting explores how new and changing sources of data are causing the old analytics database structures and foundations of yesterday to strain under the pressure of today’s new data.
Businesses lust for true data-driven decision making. The days of gut calls based on a hunch are long gone. Board room decisions are filled with charts and visualizations of analytical results. Today’s business world understands how to process familiar data it in order to make a decision. But what happens to data infrastructure, and thus to the analytics team, when real change comes in the form of new and unfamiliar data? Can traditional data warehouses handle new types of data? Will today’s data revolution push the limits of what was expected when the database structure was designed?
PRACTICAL EXAMPLE: A TRUCKING COMPANY GOES GREEN
As an example, how does a trucking company’s data warehouse respond when the company adds electric vehicles to the existing fleet? Currently, routes are optimized based on gas price analytics. The company’s old data warehouse made decisions driven by miles per gallon. Now the warehouse has to be configured to process a new dimension of information. The company needs to start loading and processing new data so new analytics can be run to optimize the feet in consideration of the electric vehicles. The existing database needs to load and process highly variable electricity costs, account for tax incentives, understand new maintenance sensors about energy usage stats… etc. But the current data warehouse isn’t built for this. There is no place for this new data to be accepted into the warehouse, and thus it cannot be processed for analytics. Without the right foundation for decision making, the company can’t leverage all this new data.
DATA WAREHOUSES CAN’T PIVOT QUICKLY OR EASILY
Traditional predictable data-driven decision making was straightforward: a team prepared a data warehouse infrastructure as the foundation for its data architecture; they’re well-suited for predictable quantities of familiar data. However, data variety has become unpredictable. New types of data are constantly being introduced and the traditional types of data are changing shape. Data warehouses aren’t fluid. Warehouses are built to specifications for the known future and they can’t pivot.
Warehouses can’t pivot.
BUSINESSES LEVERAGING YESTERDAY’S CLEAN DATA ARE ALREADY LOSING
Sensors are ubiquitous and are throwing off so much information that storage is merely a question of cost. Predictable analytical processes are often run on a single type of information and look for anomalies or outliers. These alerts trigger decisions, such as a recommendation for early maintenance or some general digital cry for human attention. These are not algorithms that drive business decisions, they request decisions from human oversight. Sometimes these processes can find new outliers, and the real challenge becomes deciding what buzzword to use. Should the team continue to label “if-then” statements as “machine learning”? Or should they stretch to reach the golden PowerPoint ring and say they’re now using “artificial intelligence.”
“Iceberg, dead ahead.”
These sorts of one- and two-dimensional analyses are usually impacted by increases in volumes of previously-know types data, but best way to prepare for this change is to ask, “but does it scale?” of your analytics infrastructure. As long as nothing is truly changing in your data, the vast majority of business decisions can still be made relying on a few join clauses and simple regression. Size may slow things down, but that’s about it. This is the comfortable process enjoyed by data warehouses whose foundation is built upon predictable data. However, as the data becomes more diverse and unpredictable, the rigid data warehouse construct becomes a limiting cage for the analytics team’s ability to use the new data… iceberg, dead ahead.
TODAY’S NEW DATA MUST BE CONSIDERED IN YESTERDAY’S ANALYSES
Data analysts or business intelligence teams are quick to produce beautiful graphics and interpretations of yesterday’s data that is comfortable. This is data that has been used before, and it has been checked and cleaned up. There won’t be loading issues, and there won’t be surprises with this data. In fact, the business insight in this data is often already widely known or understood by the decision makers in the company. The analytics team is merely being asked to produce a chart to support a hypothesis that has already been made obvious, just perhaps not officially proven. This approach is fine as long as there isn’t any new data that should be considered to answer the business’s question.
…but the data is better over here.
This phenomenon of repeatedly relying on yesterday’s clean and easy data that doesn’t fully leverage the available data is a technological echo of the 1920’s classic story of the man who was looking for his keys. When asked where he lost them, he responded “over there, but the light is better over here.” The business must embrace all the available data that is relevant to a decision rather than just that which is easy.
For true data-driven insight, the data needs to drive the decision-making, and the only way to enable this is through a future-proof data architecture. That means new information pushes a business to make new decisions. Today’s industries are generating massive quantities of data. The icebergs that are lying in wait are found in the way that this data is changing. It is not in a predictable format. It does not always behave like it did yesterday. New engines are being added to trucks, new types of sensors are being developed, and entirely new dimensions are being measured. This new data will cause the data engineering team to pause their current work and take an undeterminable amount of time to process and learn about the new data, how to store it, and how to prepare it for analytics. Rebuilding the data warehouse is repetitive and can grind the business insight process to a halt each time. To allow new data to lead a business to making rapid and timely decisions, the business needs to revolutionize the way it gets its hands around this valuable data in order to effectively drive insight.
DATA ENGINEERING PLATFORMS ARE THE ANSWER
A data engineering platform allows the user – the business – to accommodate today’s dynamic data and is a future-proof foundation for an organization’s flexible data architecture. A cloud-based data engineering platform can scale to accept all sorts of new data, including when data velocity increases and scale changes.
At Silectis, we are positioned to meet this need with Magpie, our data engineering platform. We have built a data management tool that embraces this new trend of data growth in every direction. Magpie welcome new types of data into a business’s environment, and to process and assess the new data as it comes into the business’s analytics space. Magpie allows data teams to start seeing value quickly. With Magpie, analysts can easily uncover new trends by leveraging new data. Users can build upon the work that has been done by other teams, rather than repeat old engineering work over and over again in search of a small refinement or correction.
This serves the business by allowing the entire organization to embrace new data as quickly as possible with minimal effort. Businesses need new data to make tomorrow’s decision. The old data was used to make yesterday’s decision.
Brendan Freehart is a Data Engineer at Silectis.
You can find him on LinkedIn.