Securing and Auditing the Data Lake

Securing and Auditing the Data Lake

Security is absolutely critical in data lakes, since a data lake oftentimes contains the majority of an organization’s data. Giving users access to the data they need to do their job should not necessarily mean giving them access to all of the data in the lake. Unfortunately, many data lake tools, especially custom do-it-yourself (DIY) implementations, do not provide an easy way to manage and audit data access.

In Magpie, security is a core concept, where every user action is verified and logged for future review. In this post, we’ll demonstrate how to secure your data lake quickly and effectively with Magpie.

From Zero to Data Lake with Silectis Magpie

From Zero to Data Lake with Silectis Magpie

More and more companies are turning to data lakes as a way to unify and get value out of their growing collections of data. However, it can be a challenging to navigate the ever-changing technology landscape around these lakes, set one up, and quickly get value from it.

In this post, we’ll walk through a technical tutorial of how Magpie can help companies get up and running with a data lake quickly. We’ll show how companies can easily configure and explore a set of enterprise data sources, enrich that enterprise data with third party sources, and perform initial analysis.

Filling the Data Lake - Job Management in Magpie

Filling the Data Lake - Job Management in Magpie

When first setting up a data lake, it is common for organizations to start with a static export of data. This enables users to immediately take advantage of the advanced analytics capabilities of the data lake without having to develop the sometimes-complex logic of periodically updating the data. While performing this initial analysis, users can pull in ad hoc data sources as needed to enrich their existing data and begin to crystalize a list of data sources and tables that will be useful moving forward.

Inevitably, though, the data in the lake will need to be updated in a repeatable manner, derived tables will need to be rebuilt, and the entire refresh process will need to be monitored. Out platform, Magpie, provides a robust job management infrastructure to guide organizations through this process.