As some of you are aware, we have been working on something pretty big for the last few years. We have been helping clients to build their data and analytics infrastructure, but that isn’t the entire story…
MAGPIE: THE DATA ENGINEERING PLATFORM
We are finally ready to lift the curtain on our cloud data engineering platform, Magpie.
Magpie helps companies organize their data so they can get to the exciting part, generating insights. Magpie is built to serve as the central hub for data engineering, enabling advanced analytics, supporting ETL, Data Lake, and Data Warehousing needs.
We have been working on it while bootstrapping our consulting business and helping customers solve hard data management and analytics problems every day.
Magpie addresses the real-world challenges of data integration, data organization, and data exploration. As practitioners, building data warehouses or pulling together complex business analytics, we were continuously frustrated by the need to pivot across multiple tools and jump through hoops to get the infrastructure we needed.
With Magpie we have pulled all of the core tools needed into one, cloud-based platform that readily plugs into existing architectures and lets organizations of all sizes participate in the revolution in data and analytics that is transforming science, business, and government.
POWERED BY METADATA
The driving principle behind Magpie is the need to understand and organize data before diving into analytics. To that end, much of our focus has been on building a robust, extensible, and powerful data catalog or metadata repository. The catalog makes it easy for us to capture the structure of data and then use that structure to build scalable pipelines for data integration and analysis.
It also allows us to quickly deliver new features that add value to data by capturing more and more knowledge about the meaning, content, and history of each piece of information.
As Magpie evolves, we will build on this foundation to allow users to collaborate more effectively around analysis, and to further automate learning from data. Our first step in this direction is the profiling available today that helps Magpie users understand the data content in any table with a one line command.
BUILT ON OPEN SOURCE AND OPEN STANDARDS
We have been able to build this platform in a short time and with very limited resources by leveraging the amazing work of the open source community.
In particular, by leveraging Apache Spark to build our core distributed computing engine, we can achieve world class performance and scalability. It has freed us to focus on adding value to the end-to-end analytics process instead of the core processing engine.
MADE TO SOLVE REAL WORLD PROBLEMS
Most of all, Magpie is grounded in the realities of managing a complex data environment. We are working to eliminate the tedium and frustration that comes from dealing with the alphabet soup of technologies needed to set up a “big” data infrastructure. With Magpie, we are shifting the focus to creating real meaning and value from the data.
We have built Magpie to allow data engineers, analysts, and data scientists ramp up quickly without having to learn a whole new language or new way of working. Most data transformations and jobs can be build using our SQL-like scripting language. Coupled with our support for Python and Scala, this means that there are skilled resources out there today that can get productive with Magpie in hours, not months.
Our process automation means that you can get from idea to operation quickly without needing to set up separate scheduling, logging, and monitoring. Our robust security layer let’s you keep control of your data while still providing flexible access to data engineers, data scientists and business analysts.
In the coming months, we will continue working with our customers and partners to enhance and expand the Magpie platform and deploy across more organizations. If you are interested in learning more about Magpie, feel free to email us at firstname.lastname@example.org.