29 Jan

In preparing my inaugural speech I needed some numbers to illustrate the size of big data. During my search I encountered a paper by McAfee & Brynjolfsson from 2012. In this paper they mention that the company Wal-Mart processes a whopping 2.5 Petabytes of data on customers transactions every hour. That sounded as a great illustration of big data because the number is huge. But then if you think about it is a huge amount of data if you consider that Wal-Mart has 11,000 stores and that they serve 245 million customers each week(1). Per hour that means that they would serve roughly 2.9 million customers considering they are open 12 hours a day and 7 days a week. But how can 2.9 million customers per hour generate 2.5 petabytes of data? 2.5 petabytes is roughly 2.6 million gigabytes (i.e. 2.5 x 1024 x 1024). So each customer would generate roughly 1 gigabyte of data from its customer transactions. If customer transactions is the list of items that you bought plus some payment and customer loyalty details then this number just cannot be true.

So I went searching for the source and at first it seemed it was an article from The Economist in 2010 reporting the same number (2). But there is no source or explained how they got to that number. Then I found an article in ComputerWorld from 2008 that reports that the databases of Wal-Mart contain 2.5 petabytes of data (3). It is interesting that this number is exactly the same as the number from the economist but instead of the contents of the whole database it transformed into data generated per hour. Continuing my search I also found a blog post that mentioned this same number and use a report from SAS as its source (4)(5). Unfortunately the SAS report has no reference as to where the number comes from. But the ComputerWorld article shows that it comes from TeraData who provided a solution to Wal-Mart for storing its data. Hence I consider this the most reliable source and do not consider it very likely that Wal-Mart handles 2.5 petabyts of data every hour from its customer transactions. Just to be sure I also asked @walmartlabs and will let you know about the outcome.

In the meantime I can still use a good example to illustrate the size of big data. So please let me know if you have one and do not forget to mention the source 😉

Update 30/1/15: On the Wal-Mart website I found a quote from 2014 by Wal-Mart’s CEO saying that they store around 30 petabytes of shopping information (6). As Wal-Mart is the source themselves I consider this one the most reliable. Note that in 2008 they ‘only’ stored 2.5 petabytes and were among the few Teradata customers, amongst for instance e-Bay and Bank of America, that crossed the 1 petabyte mark (3).


2 Responses to “A little bit too much big data”

  1. Scott April 2, 2016 at 8:17 pm #

    I am so glad I found this post, as I was about to quote this utterly incomprehensible number in an important work (my degree capstone). The more I considered it, though, the more I just couldn’t believe it to hold true. It took some clicking to find another critical thinker out there, but here I finally arrived…

    • rwhelms April 2, 2016 at 8:30 pm #

      I am glad it was helpful, good luck on your project!

