What does Facebook consider an average day’s worth of data?

Well according to this article from gigaom.com. The average day looks something like this.

  • 2.5 billion content items shared per day (status updates + wall posts + photos + videos + comments)
  • 2.7 billion Likes per day
  • 300 million photos uploaded per day
  • 100+ petabytes of disk space in one of FB’s largest Hadoop (HDFS) clusters
  • 105 terabytes of data scanned via Hive, Facebook’s Hadoop query language, every 30 minutes
  • 70,000 queries executed on these databases per day
  • 500+terabytes of new data ingested into the databases every day

I also love this quote from the VP of Infrastructure.

“If you aren’t taking advantage of big data, then you don’t have big data, you have just a pile of data,” said Jay Parikh, VP of infrastructure at Facebook on Wednesday. “Everything is interesting to us.”

Big Data for Small Business

I have said it before and will say it again you don’t have to be fortune 500 company to use Big Data. Big Data is more about understanding your data, then it is about how big it is and understanding all your different data sources and gathering them into one place, so that you can analyze and understand it better.

http://www.pcworld.com/article/2047486/how-small-businesses-can-mine-big-data.html

Wrangling Customer Usage Data with Hadoop

Here is our session from the Hadoop Summit 2013.

 

Title: Wrangling Customer Usage Data with Hadoop

Slides: http://www.slideshare.net/Hadoop_Summit/hall-johnson-june271100amroom211v2

Description:

At Clearwire we have a big data challenge: Processing millions of unique usage records comprising terabytes of data for millions of customers every week. Historically, massive purpose-built database solutions were used to process data, but weren’t particularly fast, nor did they lend themselves to analysis. As mobile data volumes increase exponentially, we needed a scalable solution that could process usage data for billing, provide a data analysis platform, and inexpensively store the data indefinitely. The solution? A Hadoop-based platform allowed us to architect and deploy an end-to-end solution based on a combination of physical data nodes and virtual edge nodes in less than six months. This solution allowed us to turn off our legacy usage processing solution and reduce processing times from hours to as little as 15-min. This improvement has enabled Clearwire to deliver actionable usage data to partners faster and more predictably than ever before. Usage processing was just the beginning; we’re now turing to the raw data stored in Hadoop, adding new data sources, and starting to anlyze the data. Clearwire is now able to put multiple data sources in the hands of our analysts for further discovery and actionable intelligence.

 

Windows Azure HDInsight ( Hadoop on Windows )

Lately I have been asked by a lot of my co-workers, if Hadoop runs on Windows. After going to the Hadoop Summit last month, I have been able to tell them about Azure HDInsight. Which is basically Apache Hadoop running on Windows Azure.

It appears that Microsoft has been working with Hortonworks to bring Apache Hadoop to Windows and here is the end produce.

<a href="http://www xenical medication.windowsazure.com/en-us/documentation/services/hdinsight/”>http://www.windowsazure.com/en-us/documentation/services/hdinsight/

So if you are interest in Hadoop on Windows check it out.