You shouldn’t Start Without Metadata

notebook chicken.jpg

This blog post gives a brief overview of the importance of metadata in today’s world… with a nod to our school days.

METADATA: A SIMPLE EXPLANATION OF WHAT YOU CAN DO WITH IT, AND WHY YOU shouldn’t START WITHOUT IT

The Dewey Decimal System is an often forgotten example of the value of metadata. Though increasingly arcane, the system gave a library visitor information about a book. The system told a user where to find a book, who wrote it, and what it was about.

Side note for younger readers: In this context, a “Library” is a room built to house physical books where, after receiving minimal vetting, members of the public can come to learn from these physical volumes, but the books must be returned when learning is complete. This practice is fairly recent and not from the times of the Pharaohs. It is also not the inspiration for Amazon Prime free returns, though the parallels are surprising.

The Dewey Decimal System told the library and the users of the library all about the book: its title, who wrote it, its topics, if it was fiction or not, its length, etc. But to see the content – the words inside – one must read the book. And reading the book is an often time intensive process and might not be the best first step to trying to determine what the book is about. 

The value of such a data-driven system is that it allows the user to understand the entire catalogue without having to open or read a single book. This system functions no matter the number of books, the number of authors… the system can grow in different dimensions without requiring a reorganization. This is the value of metadata. It is the ideal first step. Metadata is information that describes and gives you data about other information.

In the big data world, the value of metadata is often overlooked, as the attention always goes to the important fact or the piece of information that makes the difference… the golden nugget. But without metadata, the whole team doesn’t know where to look… where to dig.

The entire data world nods knowingly, because without a proper understanding of what is in a data set, there is no other option but to start wading around in information. Looking at a table here and a row there, hoping to figure out the edges of what is contained in the data set. This is why analysts and scientists are so often looking at huge tables, hoping they can just piece together a big picture view of the data. Sometimes we are lucky, and the source of the data is so well documented that it can be relied upon to give us appropriate clues. But generally, this process is akin to relying on only a book’s title and a few random selected paragraphs of text to assess if a book may hold the answers to a given question. Without the Dewey Decimal System – without a proper metadata catalog – it would be very difficult indeed to learn how one does kill a mockingbird.

MAGPIE: POWERED BY METADATA

Silectis’s big data platform, Magpie, is built on an understanding of the importance of using and leveraging the power of metadata. Without understanding the big picture and organizing data, it is practically impossible to know where to start looking for answers, conclusions, tips, and discoveries. When the Silectis team gets new data, we start by running our custom processes to build an understanding of the data. We run a profile of the data so we can have a high-level perspective before we dig into the information. We can see and explain the potential in the data, and then decide how to add more, or slice and dice the information to find the value that has been shown by examining the metadata.

The availability of data in today’s world is overwhelming, and the amounts of data are growing, as is the seemingly never-ending arrival of yet another new type of data. Each new collection creates a new technique, and the resultant scale can overwhelm any user. Silectis feels this puts a heavy burden on the analyst, while the entire data operations process, from data loading to analysis, can benefit from the value hidden in metadata. By using Magpie and leveraging the power of the Silectis, your data team can harness the power of metadata to understand what is in a data set much in the same way we all grew up relying on the Dewey Decimal System.

To learn more about our data lake platform, Magpie, click here.

Brendan Freehart is a Data Engineer at Silectis.
You can find him on LinkedIn.