Silectis used public building permit data from DC’s open data portal to determine which neighborhoods in Washington DC are proving they care about the future of solar energy.
To better understand data lakes, it is useful to think about how they evolved and to examine some of the earlier architectures used to support data management for analytics.
This blog post describes the main architectural approaches that have been used to support analytics in the enterprise and how they have changed over time.
By reflecting on the specific problems being solved and how each approach tries to address the shortcomings of earlier approaches, we can illustrate how to apply them today and bring the strengths of more modern architectures into sharper relief.
To some extent, data lakes are a happy accident born out of the availability distributed storage solutions and distributed computing capacity on commodity hardware, and helped along by the mainstreaming of the cloud. The Apache Hadoop project enabled scalability and variety in analytics that would have previously been impossible and the advent of cloud services like AWS made it possible to experiment with these tools without an enormous capital outlay up front.
Once these innovations took hold a supporting infrastructure for analytics that could scale more flexibly and handle a more diverse computational workload was a natural outcome.
The emergence of data lakes affords us the opportunity to rethink the architectural model for supporting analytics in enterprises. We can collectively elevate our game and address some of the flaws in previous generations of analytics architectures like data warehouses.
This post focuses on defining the data lake, describing the benefits and use cases, describing the technology components that power data lakes, and providing some guidance on how to get started from a technical perspective.
If you are new to the underlying technology that supports analytics, or just unfamiliar with some of the key concepts associated with data lakes, this blog post is for you. It is intended to be a short primer for both technical and non-technical audiences.
The term data lake carries a lot of baggage, being both a hot tech buzzword and being somewhat loosely defined. Early on, there was a strong focus on the underlying technology (Apache Hadoop) as the defining characteristic of data lakes, muddying the waters for the uninitiated. Fortunately, it isn’t that difficult to put together a working definition of a data lake that can be broadly understood.
At the core, data lakes are a means to an end. They are a mechanism for letting analytics within an organization happen efficiently, flexibly, and with the necessary controls in place. Data lakes work by bringing together information of all kinds together, in a central repository, surrounded by tools that let users prepare data and conduct analysis.
Here at Silectis, we spend a good deal of time thinking about how the landscape for analytics is changing. As a technology-focused company, the lion’s share of our energy goes to making the technology work effectively and giving data engineering and data science practitioners tools that make their lives easier.
However, we realize that our technology lives in a broader context. It has to dovetail with the processes and organizations that surround it. Ultimately, it needs to serve the consumers of the data, those generating the insight, and those applying it to drive decision-making. Getting value out of data requires a lot of work outside of just the technology.
To that end, we have put together a series of white papers that capture some of the observations that we have made over the course of our client work and expresses our point of view about the broader structure of a successful analytics program in today’s rapidly evolving environment.
The first in the series, EFFECTIVE INSIGHT DELIVERY is now available. We are starting with a focus on how analytics get delivered to stakeholders.
Changing user expectations, more complex analytics, and rapidly changing data require a fresh look at how data insights are packaged and made available. Moving beyond conventional reporting and BI, we examine the new modes of analytics delivery and their implications for upstream processes and technology.
Hopefully, you will find this insight helpful. We welcome your questions and comments.