Data Warehousing

Why Small Data Matters

Photo by webguzs/iStock / Getty Images

As I mentioned in my previous blog post, there seems to be a widespread tendency to focus on “Big Data” and advanced analytics before addressing the basics of managing data at smaller scales within an organization. Clearly, Big Data is becoming a more important part of the landscape, but in most cases, it is not the most pressing data issue within an enterprise.

Getting “Small Data” right is likely to deliver greater near-term value. There are number of reasons why Small Data is important and why it needs to be addressed separately from a broad Big Data initiative.

  • Small Data can often answer core strategic questions about your business that should drive the best application of Big Data and more advanced analytics.

  • Most organizations don’t have what can properly be categorized as big data yet, but all organizations have some data from which they can begin to gather insight.

  • Mastering Small Data is a critical step in the journey toward overall data management excellence within an organization.

  • Using “Big Data” approaches to handle Small Data problems can introduce unnecessary complexity.

What exactly does “small” mean?

Even businesses that generate billions of dollars in revenue often don’t have anything that can truly be called Big Data. Millions, and in some cases billions of records do not necessarily amount to Big Data.

Placing your data into the wrong category can be costly leading to complex technology, more challenging user experiences, and less stability. Big Data technology is inherently more temperamental than Small Data technology (think Hadoop as opposed to PostgreSQL) and there are fewer skilled technologists.

Let’s define Small Data as structured or unstructured information that is in the sub-terabyte range in scale. In most businesses, this can include all core sales information, operational performance data, or purchasing data over the course of several years. You should be asking yourself whether this core data is integrated, accessible, and useful before adding other, larger, less-structured data sets to the mix.

Small Data can drive big insights.

Small Data often provides the clarity and intuition about your business that complex analytical magic can’t necessarily provide. Big data and predictive analytics often help you do those things that you are already doing faster, more efficiently, or in a more targeted way. Small data can often tell you whether you are doing the right things in the first place. The kinds of questions Small Data answers go to the heart of strategic clarity and excellence in execution.

Until your business users are able to answer at least some of these questions on a continuous basis, more advanced analytics and exercises in optimization may simply be allowing you to do the wrong things faster and more efficiently.

Jeff Hassemer does a great job focusing on the more tactical uses of small data to target customers in his excellent Advertising Age article

Getting Small Data right is hard.

Although the questions above straightforward, it is often difficult to get at the underlying information needed to even begin building intuition around them. Scale isn’t necessarily the biggest problem around integrating data and making it available to a user base. Small Data presents its own set of challenges that are further complicated when Big Data is added to the mix.

My experience has been that often, data analysis is project-oriented with a number of different people spanning lines of business, and IT might need to get involved to pull together the answer to what seems like a simple question. An analyst might spend days joining and aggregating data through Excel wizardry. This slows down the decision process and burns valuable time that could used to take action rather than waiting for analysis.

So, what are some of the things preventing organizations from successfully getting value out of their small data? Based on my observations and informal discussions with peers and customers, there are a few recurring themes that are worth sharing:

Even small data sets present many of the same challenges as large data sets  The hard part tends to be getting the business meaning of the data right, linking it to reference data, and handling the exceptions. The scale of data doesn’t have much impact on how hard it is to integrate as one might think.

It can be difficult for business users to explore data because of technical challenges  IT organizations may not have the right specialized expertise to wring every drop of performance out of their data oriented systems, or they are so preoccupied with fire-fighting that they can’t focus on decision systems that are competing with core business applications for attention. This means that handling performance and usability problems is often deferred.

The business has trouble accessing the data because of the way the data is organized  The semantics of data often aren’t clear and it becomes challenging for the business to use the information effectively. This article from HBR does a great job of addressing both the role of small data and the need for agreement on business rules (semantics).

The tools provided to the business are still too hard to use  Self-service analytics tools often still rely on technical skills, or knowledge of database structure that business users aren’t going to have.  This leads to the emergence of new class of specialists that are charged with pulling data from the “self-service” tool and handing it off to others for analysis in Excel and presentation in PowerPoint.

What can we do about it?

There are a few key things that can help simplify getting more value out of Small Data and addressing some of the challenges identified above.

Think globally and create a visionI am not suggesting that you should pursue the holy grail of an “Enterprise Data Architecture”, but there should be a cohesive sense of where data will reside, how it will get there, the technology that will support it, and the semantics associated with that data. This vision gives incremental efforts something to aim for and can be adjusted, as the organization gets smarter about its data.

Start small and deliver value fast – Ensure that you are able to quickly deliver useful, impactful business insight. What does this mean? Within a few weeks of starting, have new capabilities in users’ hands. Do not try to build a complete enterprise data architecture and attendant processes before going to work with real data, solving real business problems.

Don’t focus on sunk costs – Transitioning away from a legacy environment can be daunting, but the tools and technology available have progressed rapidly. Don’t be afraid to walk away from past investments that are no longer working for the organization. Building a cloud-based infrastructure could be surprisingly low cost.

Don’t forget about organization and roles – Think about the separation of concerns across organizations. Don’t expect business analysts to also be data scientists. Don’t expect data scientists to be software engineers.  Be realistic about the skills you need and about where and how those skills can be sourced.                                                                                                              

From a more technical perspective, there are a number of established technologies that can help support making better use of small data. There are a few areas in particular that warrant attention:

Look to the cloud first – This is probably obvious at this point, but I would suggest that any organization that is looking to build new infrastructure in the next 12 months should look to the public cloud first. The economics have become very attractive at all but the very largest scales, and unless you have highly specialized regulatory or security requirements, there are few functional barriers.

Look for tools that work well at the scale you already have Don’t create complexity in anticipation of requirements that may never materialize. The cost and complexity of tools that operate well at Small Data scales has been dropping precipitously. Big Data tools like those included in the Hadoop stack can be useful at smaller scales, but not necessarily enough offset the added complexity until scale increases into the multi-terabyte range.

Partner for scarce skills – Most organizations will not be able to attract the specialized skills that they need to succeed in fully realizing the value of their data. Look to external providers to fill the gaps. Specialists should be favored over broad-based integrators/consultants who operate on a high-leverage model that could saddle you with expensive novices.

Look forward to scaling up when you need toDon’t completely ignore the potential needs of the future. Identify small pilot projects that can help your organization build expertise and confidence without committing to an infrastructure that you just don’t need yet. The cloud makes it very cost effective to set up and tear down infrastructure for experimentation without a substantial, long-term capital outlay.  Find opportunities to start leveraging a Big Data platform that can guide future investments.


While this post challenges the prevailing emphasis on Big Data and advanced analytics, these are both areas that are important to today’s enterprise and are becoming moreso.

However, it's clear that starting with core enterprise information, and delivering key analytics to the business, without over-engineering the process and infrastructure is an important step on the journey to a more comprehensive approach that encompasses both “Big” and “Small” Data.

Further Reading

There is a lot of great material out there that touches on this and related topics.  Here are just a few of the items that I have run across that expand on some of the ideas in this post:

    Demetrios Kotsikopoulos is the founder of Silectis.   You can find him on LinkedIn and Twitter.