Skip to content

Roger Hosto

Good Talk

Menu
  • Home
  • Blogs
    • Databases Administration
      • MySQL
      • NoSQL
    • Development
    • Open Source Software
    • System Administration
  • Resume
  • About
Menu

Databases Administration

Covering NoSQL, Relational Database, Data Visualization, and Reporting.

Elasticsearch

Posted on February 13, 2014September 20, 2025 by webgeek

Elasticsearch is a distributed restful search and analytic tool that is built on the top of Apache Lucene for high performance. Elasticsearch features include: Real-Time Data Indexing Scalability High Availability Multi-Tenancy Full Text Search Document Orientation The flow of data never stops so the question is how quickly can that data become available. Elasticsearch indexes…

Read more

What does Facebook consider an average day's worth of data?

Posted on September 8, 2013September 20, 2025 by webgeek

Well according to this article from gigaom.com. The average day looks something like this. 2.5 billion content items shared per day (status updates + wall posts + photos + videos + comments) 2.7 billion Likes per day 300 million photos uploaded per day 100+ petabytes of disk space in one of FB’s largest Hadoop (HDFS)…

Read more

Big Data for Small Business

Posted on August 29, 2013September 20, 2025 by webgeek

I have said it before and will say it again you don’t have to be fortune 500 company to use Big Data. Big Data is more about understanding your data, then it is about how big it is and understanding all your different data sources and gathering them into one place, so that you can…

Read more

What's New with MongoDB Hadoop Integration.

Posted on August 9, 2013September 20, 2025 by webgeek

I attend this Webinar yesterday and it was pretty good. I like how straight they interaction between MongoDB and Hadoop. Check it out if you get a chance. http://www.10gen.com/presentations/webinar-whats-new-mongodb-hadoop-integration

Read more

Wrangling Customer Usage Data with Hadoop

Posted on July 22, 2013 by webgeek

Here is our session from the Hadoop Summit 2013.   Title: Wrangling Customer Usage Data with Hadoop Slides: http://www.slideshare.net/Hadoop_Summit/hall-johnson-june271100amroom211v2 Description: At Clearwire we have a big data challenge: Processing millions of unique usage records comprising terabytes of data for millions of customers every week. Historically, massive purpose-built database solutions were used to process data, but weren’t particularly…

Read more

Windows Azure HDInsight ( Hadoop on Windows )

Posted on June 12, 2013 by webgeek

Lately I have been asked by a lot of my co-workers, if Hadoop runs on Windows. After going to the Hadoop Summit last month, I have been able to tell them about Azure HDInsight. Which is basically Apache Hadoop running on Windows Azure. It appears that Microsoft has been working with Hortonworks to bring Apache…

Read more

Hadoop to Hadoop Copy

Posted on March 1, 2013 by webgeek

Here recently I need to copy the content of one hadoop cluster to another for geo redundancy. Thankfully instead of have to write something to do it, Hadoop supply a hand tool to do it “DistCp (distributed copy)”.   DistCp is a tool used for large inter/intra-cluster copying. It uses Map/Reduce to effect its distribution,…

Read more

Hortonworks Road Show "Big Business Value from Big Data and Hadoop"

Posted on September 19, 2012 by webgeek

This morning I went to the Hortonworks Road Show. It’s wasn’t Bad. I have to say out of the Hadoop Vendor I have talked to, I like Hortonworks business model the best. The fact that they are a large committer to the Apache Hadoop Project, along with several other sub projects such as Apache Ambari…

Read more

Make way for Hadoop in the 'Big Data' craze

Posted on June 26, 2012 by webgeek

Interesting bit on Hadoop a little over hyped if you ask me. http://www.marketwatch.com/story/make-way-for-hadoop-in-the-big-data-craze-2012-06-26?link=MW_latest_news –Regards

Read more

Working with Hadoop Streaming

Posted on June 22, 2012 by webgeek

Hadoop streaming is a utility that comes with the Hadoop distribution. The utility allows you to create and run map/reduce jobs with any executable or script as the mapper and/or the reducer. For example: shell> $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar  -input myInputDirs -output myOutputDir -mapper /bin/cat -reducer /bin/wc If you using the tar package from Apache Hadoop. You can find the…

Read more
  • Previous
  • 1
  • 2
  • 3
  • 4
  • 5
  • Next
  • Back to Basics: ORM and Its Impact on Database and Data Architecture
  • MySQL Error: 1062 'Duplicate entry' Error
  • Installing MariaDB 10.1 on CentOS 6.8
  • Linux Mint
  • Querying Apache Hadoop Resource Manager with Python.
  • LinkedIn
© 2026 Roger Hosto | Powered by Minimalist Blog WordPress Theme