Elasticsearch

Elasticsearch is a distributed restful search and analytic tool that is built on the top of Apache Lucene for high performance.

Elasticsearch features include:

Real-Time Data Indexing
Scalability
High Availability
Multi-Tenancy
Full Text Search
Document Orientation

The flow of data never stops so the question is how quickly can that data become available. Elasticsearch indexes the data in real-time allowing the for data to be made available as quickly as it can for real-time analytics.

Elasticsearch is horizontally scalable. An organization can simply add nodes to expand the cluster capacity. Elasticsearch is resilient in detecting and removing failed nodes and rebalancing itself to ensure that an organization’s data is safe and accessible.

Elasticsearch can also host multiple indices that can be queried independently or as a group. Elasticsearch structures it’s data in JSON documents. All fields are indexed by default and all the indices can be used in a single query.

Clients can connect to Elasticsearch by using a standard HTTP REST library. This gives any programming language the ability to connect to Elasticsearch.

Elasticsearch has been used to query 24 billion records in 900ms. It’s currently being used by companies such as Github, Foursquare, and Xing.

website: http://www.elasticsearch.org/

2014 AT&T Developer Summit

I will be attending the AT&T Developer Summit in Las Vegas. I will also be taking part in the Summit Hackathon.

“The AT&T Summit Hackathon is the premier hackathon of the year for the AT&T Developer Program. This year will be focused on wearable technologies and participants will be able to choose between a Wearables Track and an AT&T API Track. Finalists from each track will be featured in live fast pitches on stage with our executives during the keynote at the AT&T Developer Summit on January 6th. In addition, competitors will also have the ability to complete in accelerator challenges, details to be announced, which will offer prizes of up to $10,000 for eligible teams”

more >>

Work Blog: Managing Your Linux Deployments with Spacewalk

I have been using Spacewalk for a while now and really like a lot of the built-in functionality. I have been using it to build out and manage a lot of my Red Hat, and CentOS installations.

The latest thing I have been using it for it to manage is my Hadoop cluster build out and configuration updates. I think that it helps to be able to control as much of it as possible from one management system. I know there are applications like Ambari out there, but to be honest who wants to add another tool if they don’t have to.

Here’s the link to my work blog about it.

http://gotomojo.com/managing-your-linux-deployments-with-spacewalk/

What does Facebook consider an average day’s worth of data?

Well according to this article from gigaom.com. The average day looks something like this.

  • 2.5 billion content items shared per day (status updates + wall posts + photos + videos + comments)
  • 2.7 billion Likes per day
  • 300 million photos uploaded per day
  • 100+ petabytes of disk space in one of FB’s largest Hadoop (HDFS) clusters
  • 105 terabytes of data scanned via Hive, Facebook’s Hadoop query language, every 30 minutes
  • 70,000 queries executed on these databases per day
  • 500+terabytes of new data ingested into the databases every day

I also love this quote from the VP of Infrastructure.

“If you aren’t taking advantage of big data, then you don’t have big data, you have just a pile of data,” said Jay Parikh, VP of infrastructure at Facebook on Wednesday. “Everything is interesting to us.”