Big Data

Thinking about Analytics Holistically

Here at Silectis, we spend a good deal of time thinking about how the landscape for analytics is changing. As a technology-focused company, the lion’s share of our energy goes to making the technology work effectively and giving data engineering and data science practitioners tools that make their lives easier.

However, we realize that our technology lives in a broader context. It has to dovetail with the processes and organizations that surround it. Ultimately, it needs to serve the consumers of the data, those generating the insight, and those applying it to drive decision-making. Getting value out of data requires a lot of work outside of just the technology.

To that end, we have put together a series of white papers that capture some of the observations that we have made over the course of our client work and expresses our point of view about the broader structure of a successful analytics program in today’s rapidly evolving environment.

The first in the series, EFFECTIVE INSIGHT DELIVERY is now available. We are starting with a focus on how analytics get delivered to stakeholders.

Changing user expectations, more complex analytics, and rapidly changing data require a fresh look at how data insights are packaged and made available. Moving beyond conventional reporting and BI, we examine the new modes of analytics delivery and their implications for upstream processes and technology.

Hopefully, you will find this insight helpful. We welcome your questions and comments.

You can find it here.

Demetrios Kotsikopoulos is the CEO of Silectis. 
You can find him on LinkedIn and Twitter.

Introducing Magpie

edited.jpg

As some of you are aware, we have been working on something pretty big for the last few years. We have been helping clients to build their data and analytics infrastructure, but that isn’t the entire story…

THE MAGPIE PLATFORM

We are finally ready to lift the curtain on our cloud data management platform, Magpie

Magpie helps companies organize their data so they can get to the exciting part, generating insights. Magpie is built to serve as the central hub for analytics data, supporting ETL, Data Lake, and Data Warehousing needs.  

We have been working on it while bootstrapping our consulting business and helping customers solve hard data management and analytics problems every day.

Magpie addresses the real-world challenges of data integration, data organization, and data exploration. As practitioners, building data warehouses or pulling together complex business analytics, we were continuously frustrated by the need to pivot across multiple tools and jump through hoops to get the infrastructure we needed. 

With Magpie we have pulled all of the core tools needed into one, cloud-based platform that readily plugs into existing architectures and lets organizations of all sizes participate in the revolution in data and analytics that is transforming science, business, and government. 

POWERED BY METADATA

The driving principle behind Magpie is the need to understand and organize data before diving into analytics. To that end, much of our focus has been on building a robust, extensible, and powerful data catalog or metadata repository. The catalog makes it easy for us to capture the structure of data and then use that structure to build scalable pipelines for data integration and analysis.

It also allows us to quickly deliver new features that add value to data by capturing more and more knowledge about the meaning, content, and history of each piece of information.

As Magpie evolves, we will build on this foundation to allow users to  collaborate more effectively around analysis, and to further automate learning from data. Our first step in this direction is the profiling available today that helps Magpie users understand the data content in any table with a one line command.

BUILT ON OPEN SOURCE AND OPEN STANDARDS

We have been able to build this platform in a short time and with very limited resources by leveraging the amazing work of the open source community. 

In particular, by leveraging Apache Spark to build our core distributed computing engine, we can achieve world class performance and scalability. It has freed us to focus on adding value to the end-to-end analytics process instead of the core processing engine.

MADE TO SOLVE REAL WORLD PROBLEMS

Most of all, Magpie is grounded in the realities of managing a complex data environment. We are working to eliminate the tedium and frustration that comes from dealing with the alphabet soup of technologies needed to set up a “big” data infrastructure. With Magpie, we are shifting the focus to creating real meaning and value from the data.

We have built Magpie to allow data engineers, analysts, and data scientists ramp up quickly without having to learn a whole new language or new way of working. Most data transformations and jobs can be build using our SQL-like scripting language.  Coupled with our support for Python and Scala, this means that there are skilled resources out there today that can get productive with Magpie in hours, not months.

Our process automation means that you can get from idea to operation quickly without needing to set up separate scheduling, logging, and monitoring. Our robust security layer let’s you keep control of your data while still providing flexible access  to data engineers, data scientists and business analysts.

WHAT’S NEXT

In the coming months, we will continue working with our customers and partners to enhance and expand the Magpie platform and deploy across more organizations.  If you are interested in learning more about Magpie, feel free to email us at info@silect.is.

Demetrios Kotsikopoulos is the CEO of Silectis. 
You can find him on LinkedIn and Twitter.

Why Small Data Matters

Photo by webguzs/iStock / Getty Images

As I mentioned in my previous blog post, there seems to be a widespread tendency to focus on “Big Data” and advanced analytics before addressing the basics of managing data at smaller scales within an organization. Clearly, Big Data is becoming a more important part of the landscape, but in most cases, it is not the most pressing data issue within an enterprise.

Getting “Small Data” right is likely to deliver greater near-term value. There are number of reasons why Small Data is important and why it needs to be addressed separately from a broad Big Data initiative.

  • Small Data can often answer core strategic questions about your business that should drive the best application of Big Data and more advanced analytics.

  • Most organizations don’t have what can properly be categorized as big data yet, but all organizations have some data from which they can begin to gather insight.

  • Mastering Small Data is a critical step in the journey toward overall data management excellence within an organization.

  • Using “Big Data” approaches to handle Small Data problems can introduce unnecessary complexity.

What exactly does “small” mean?

Even businesses that generate billions of dollars in revenue often don’t have anything that can truly be called Big Data. Millions, and in some cases billions of records do not necessarily amount to Big Data.

Placing your data into the wrong category can be costly leading to complex technology, more challenging user experiences, and less stability. Big Data technology is inherently more temperamental than Small Data technology (think Hadoop as opposed to PostgreSQL) and there are fewer skilled technologists.

Let’s define Small Data as structured or unstructured information that is in the sub-terabyte range in scale. In most businesses, this can include all core sales information, operational performance data, or purchasing data over the course of several years. You should be asking yourself whether this core data is integrated, accessible, and useful before adding other, larger, less-structured data sets to the mix.

Small Data can drive big insights.

Small Data often provides the clarity and intuition about your business that complex analytical magic can’t necessarily provide. Big data and predictive analytics often help you do those things that you are already doing faster, more efficiently, or in a more targeted way. Small data can often tell you whether you are doing the right things in the first place. The kinds of questions Small Data answers go to the heart of strategic clarity and excellence in execution.

Until your business users are able to answer at least some of these questions on a continuous basis, more advanced analytics and exercises in optimization may simply be allowing you to do the wrong things faster and more efficiently.

Jeff Hassemer does a great job focusing on the more tactical uses of small data to target customers in his excellent Advertising Age article

Getting Small Data right is hard.

Although the questions above straightforward, it is often difficult to get at the underlying information needed to even begin building intuition around them. Scale isn’t necessarily the biggest problem around integrating data and making it available to a user base. Small Data presents its own set of challenges that are further complicated when Big Data is added to the mix.

My experience has been that often, data analysis is project-oriented with a number of different people spanning lines of business, and IT might need to get involved to pull together the answer to what seems like a simple question. An analyst might spend days joining and aggregating data through Excel wizardry. This slows down the decision process and burns valuable time that could used to take action rather than waiting for analysis.

So, what are some of the things preventing organizations from successfully getting value out of their small data? Based on my observations and informal discussions with peers and customers, there are a few recurring themes that are worth sharing:

Even small data sets present many of the same challenges as large data sets  The hard part tends to be getting the business meaning of the data right, linking it to reference data, and handling the exceptions. The scale of data doesn’t have much impact on how hard it is to integrate as one might think.

It can be difficult for business users to explore data because of technical challenges  IT organizations may not have the right specialized expertise to wring every drop of performance out of their data oriented systems, or they are so preoccupied with fire-fighting that they can’t focus on decision systems that are competing with core business applications for attention. This means that handling performance and usability problems is often deferred.

The business has trouble accessing the data because of the way the data is organized  The semantics of data often aren’t clear and it becomes challenging for the business to use the information effectively. This article from HBR does a great job of addressing both the role of small data and the need for agreement on business rules (semantics).

The tools provided to the business are still too hard to use  Self-service analytics tools often still rely on technical skills, or knowledge of database structure that business users aren’t going to have.  This leads to the emergence of new class of specialists that are charged with pulling data from the “self-service” tool and handing it off to others for analysis in Excel and presentation in PowerPoint.

What can we do about it?

There are a few key things that can help simplify getting more value out of Small Data and addressing some of the challenges identified above.

Think globally and create a visionI am not suggesting that you should pursue the holy grail of an “Enterprise Data Architecture”, but there should be a cohesive sense of where data will reside, how it will get there, the technology that will support it, and the semantics associated with that data. This vision gives incremental efforts something to aim for and can be adjusted, as the organization gets smarter about its data.

Start small and deliver value fast – Ensure that you are able to quickly deliver useful, impactful business insight. What does this mean? Within a few weeks of starting, have new capabilities in users’ hands. Do not try to build a complete enterprise data architecture and attendant processes before going to work with real data, solving real business problems.

Don’t focus on sunk costs – Transitioning away from a legacy environment can be daunting, but the tools and technology available have progressed rapidly. Don’t be afraid to walk away from past investments that are no longer working for the organization. Building a cloud-based infrastructure could be surprisingly low cost.

Don’t forget about organization and roles – Think about the separation of concerns across organizations. Don’t expect business analysts to also be data scientists. Don’t expect data scientists to be software engineers.  Be realistic about the skills you need and about where and how those skills can be sourced.                                                                                                              

From a more technical perspective, there are a number of established technologies that can help support making better use of small data. There are a few areas in particular that warrant attention:

Look to the cloud first – This is probably obvious at this point, but I would suggest that any organization that is looking to build new infrastructure in the next 12 months should look to the public cloud first. The economics have become very attractive at all but the very largest scales, and unless you have highly specialized regulatory or security requirements, there are few functional barriers.

Look for tools that work well at the scale you already have Don’t create complexity in anticipation of requirements that may never materialize. The cost and complexity of tools that operate well at Small Data scales has been dropping precipitously. Big Data tools like those included in the Hadoop stack can be useful at smaller scales, but not necessarily enough offset the added complexity until scale increases into the multi-terabyte range.

Partner for scarce skills – Most organizations will not be able to attract the specialized skills that they need to succeed in fully realizing the value of their data. Look to external providers to fill the gaps. Specialists should be favored over broad-based integrators/consultants who operate on a high-leverage model that could saddle you with expensive novices.

Look forward to scaling up when you need toDon’t completely ignore the potential needs of the future. Identify small pilot projects that can help your organization build expertise and confidence without committing to an infrastructure that you just don’t need yet. The cloud makes it very cost effective to set up and tear down infrastructure for experimentation without a substantial, long-term capital outlay.  Find opportunities to start leveraging a Big Data platform that can guide future investments.

Conclusions

While this post challenges the prevailing emphasis on Big Data and advanced analytics, these are both areas that are important to today’s enterprise and are becoming moreso.

However, it's clear that starting with core enterprise information, and delivering key analytics to the business, without over-engineering the process and infrastructure is an important step on the journey to a more comprehensive approach that encompasses both “Big” and “Small” Data.

Further Reading

There is a lot of great material out there that touches on this and related topics.  Here are just a few of the items that I have run across that expand on some of the ideas in this post:

    Demetrios Kotsikopoulos is the founder of Silectis.   You can find him on LinkedIn and Twitter.