As Big Data becomes more popular, the question often asked is ‘What is Big Data?’ While there are lots of ways to answer, one part of the answer inevitably includes quantity. While there is no precise definition of how much is necessary to be considered Big, most accept that amount of data keeps getting bigger. In order to understand the Big Data discussion, it is useful to understand the units of measures. Each unit is 1000 times the previous major unit. That is a Terabyte is 1000 Gigabytes – three orders of magnitude greater than the previous unit of measure.
In the 1980’s Gigabyte databases were considered huge and database analysts theorized about managing Terabytes of data. Perhaps the original Greek translation of tera – monster, reflects the early Big Data experts fear of the growing amounts of data. Today, many users have a Terabyte of storage on their personal devices. Internet monthly traffic is currently measured in Exabytes (about 21 exabytes/month). According to International Data Corporation, the total amount of global data grew to 2.7 Zettabytes during 2012. As this amount is in use, the greater units have been defined. The next level inspired thoughts of universal force. Yottabytes are named affectionally after the Star Wars master. While not infinite in size, 1000 Zettabytes is huge.
Data units of measure*
*each incremental measure is 1000 times the previous unit
No matter exactly where one declares Big Data to occur, there is a lot of data to process and new techniques and technologies will come into play to best utilize the data. As the amount of data gets bigger, the dashboards, analytics and data transformation require special tools and techniques. Having the right tools and processes are necessary. Our toolset is Pentaho Data Integration and CTools.