• Data Blueprint

Where, Oh Where Should My Data Go?

So, you’ve got data – lots of data!  Data from customers, vendors, and employees.  Data about products, services, and prospects.  And you want to be able to use that data – to have that data support and drive your business decisions.  But it can be quite daunting to try to get your head around all the different types and purposes of all the information in your data landscape.  Where do you put it all?  And what about when you are given new data sources?  Do you need to develop different solutions for each data source?  What if your current data exists in different platforms or locations?  How can you possibly bring all of these disparate threads together into a cohesive solution?  AAAHHHH!!

First of all, take a breath.  It is critical to remember that data is only…. well…. data.  Whether it is small data or Big Data, SQL or NoSQL, data are simply pieces of information.  Any structures, systems, and governance surrounding that data are intended to help you, not hinder you.  The entire point is to get that data to work for your organization, not to satisfy the dictates of one approach or another.  Of course, there may be a need to change your DB architecture, processes, and policies in order to improve your overall data environment, but they should be changed to service your data needs.  It can simplify things greatly if you never forget that, though it may come in many different variations, it’s still all just data.

But now that we’ve avoided freaking out at the prospect of having to handle all your different types of data, we return to our original query – WHERE do we put it?  This is an interesting question, because there are obviously a lot of options.  However, it all boils down to a simple truth – for virtually all organizations, no matter what ways you slice it, there are only two types of data:  Operational and Analytical.

Operational Data is active data, which needs to be readily accessible.  It needs to be able to be used by the business for regular activities, both quickly and easily.  It needs to be able to be readily updated and is not greatly concerned with history or auditing of data.  Analytical Data is stored for reporting and historical purposes.  It needs to contain all present and past versions of data, so that reports can be generated over larger time scales.  It should allow for auditing data, tracking changes over time.

That’s it.  That’s the list.  So, what do you do next?  In order to treat data in an Enterprise approach (and you really should treat your data in an Enterprise approach!), your Operational Data should go into some version of a Master Data Repository.  This will allow you to have a Single Version of the Truth – a central location for operational systems and active business users to access, insert, and update data.  Your Analytical Data, on the other hand, should be stored in some version of a Data Warehouse, giving you a Single Version of the Facts – a central location for all current AND historical data, tracking changes over time and able to generate reports.  And of course, data from the Master Data Repository should be fed into the Data Warehouse, as the most recent version of data.

Keep in mind that these two data environments can be entirely conceptual!  Of course, you may have one physical database containing Operational Data and another physical database containing Analytical Data, but this is in no way a requirement.  For instance, you could have several schemas or even databases that make up your Operational Data environment.  And you could have both a SQL and NoSQL Analytical Data Environment.  The key is creating those conceptual boundaries, so that you can properly identify the types and usage of your data, and to properly apply data governance.

Therefore, as you are assessing your existing data sources, or if you are adding new data sources, the first and most important question is a simple one – is this Operational Data, Analytical Data, or possibly some of both?  Because once you determine this simple classification, you’ll know where to build out your data environment.  How you need to build that environment out is, of course, the next question, but “Where, Oh, Where?” will have already been answered!