Wednesday, November 09, 2011

When Big Data is a Big Con

I'm seeing a lot of 'Big Data' washing going on in the market. Some companies are looking at this volume explosion as part of a continuation of history, new technologies, new approaches but evolution not revolution. Yes Map Reduce is cool but its technically much harder than SQL and database design this means that it is far from a business panacea.  Yes the link between structured and unstructured data is rising and the ability of processing power to cut up things like video and audio has never been better.  But seriously lets step back.

Back in 2000 I worked at a place that spent literally MILLIONS on an EMC 5 TB disk set-up.  Yes it had geographical redundancy etc etc and back then 5TB was seen as a stratospheric amount of data for most businesses.  These days its the sort of thing we'd look to put into SSDs, its a bit beyond what people would do in straight RAM but give it a few years and we'll be doing that anyway.

Here is the point about Big Data:  95%+ of it is just about the on-going exponential increase in data which is matched, or at least tracked, by the increase in processing power and storage volumes.  Things like Teradata and Exadata (nice gag there Larry) are set up to handle this sort of volume out of the box and Yahoo apparently modified postgres to handle two PetaBytes which by anyones definition is 'big'.  Yes index tuning might be harder and yes you might shift stuff around onto SSDs but seriously this is just 'bigger' its not a fundamental shift.

Map Reduce is different because its a different way of thinking about data, querying data and manipulating data.  This makes it 'hard' for most IT estates as they aren't good at thinking in new ways and don't have the people who can do that.  In the same way as there aren't that many people who can properly think multi-threaded then there aren't that many people who can think Map Reduce.  Before you leap up and go 'I get it' do two things 1) compare two disparate data sets 2) Think how many people in your office could do it.

So what do we see in the market?  We see people using Big Data in the same way they used SOA, slapping on a logo and saying things like 'Hadoop integration' or 'Social media integration' or.... to put it another way.... 'we've built a connector'.  See how much less impressive the later looks?  Its just an old school EAI connector to a new source or a new ETL connector... WOW hold the front-page.

Big Data has issues of Data Gravity, process movement and lots of other very complex things.  So to find out whether its Big Data or Big Con ask the following

  1. Can you replace the phrase 'Big Data' with 'Big Database' if you can then its just an upgrade
  2. Do they have things that mean old school DBAs et al can handle Hadoop?
  3. Can the 'advance' be reduced to 'we've got an EAI connector'
  4. Is it basically the same product as 2009 with a sticker on it?
  5. Is there anything that solves the data gravity problem?
  6. Is there anything that moves process to data rather than shifting the data?
Finally do they describe Big Data in the same way that the Hitchhikers Guide to the Galaxy described space
"Space," it says, "is big. Really big. You just won't believe how vastly, hugely, mindbogglingly big it is. I mean, you may think it's a long way down the road to the chemist's, but that's just peanuts to space, listen...
Then you really know its Big Con.  Big Data is evolution not revolution and pretending otherwise doesn't help anyone.


Technorati Tags: ,

No comments: