Disable 70-persistent-net.rules generation on CentOS 6 VM

If you’re like me you probably have an environment that is running on some virtual platform and like everyone else you have built a template to spin Linux systems. One of the things lately we were running into was the “70-persistent-net.rules”, which associated MAC address to Network interfaces.

The easiest way I have found to disable this was to do the following, it’s not pretty but works.

rm /etc/udev/rules.d/70-persistent-net.rules

echo “#” > /lib/udev/rules.d/75-persistent-net-generator.rules

Happy hacking.

Work Blog: Managing Your Linux Deployments with Spacewalk

I have been using Spacewalk for a while now and really like a lot of the built-in functionality. I have been using it to build out and manage a lot of my Red Hat, and CentOS installations.

The latest thing I have been using it for it to manage is my Hadoop cluster build out and configuration updates. I think that it helps to be able to control as much of it as possible from one management system. I know there are applications like Ambari out there, but to be honest who wants to add another tool if they don’t have to go to my site.

Here’s the link to my work blog about it.

http://gotomojo.com/managing-your-linux-deployments-with-spacewalk/

What does Facebook consider an average day’s worth of data?

Well according to this article from gigaom.com. The average day looks something like this.

  • 2.5 billion content items shared per day (status updates + wall posts + photos + videos + comments)
  • 2.7 billion Likes per day
  • 300 million photos uploaded per day
  • 100+ petabytes of disk space in one of FB’s largest Hadoop (HDFS) clusters
  • 105 terabytes of data scanned via Hive, Facebook’s Hadoop query language, every 30 minutes
  • 70,000 queries executed on these databases per day
  • 500+terabytes of new data ingested into the databases every day

I also love this quote from the VP of Infrastructure.

“If you aren’t taking advantage of big data, then you don’t have big data, you have just a pile of data,” said Jay Parikh, VP of infrastructure at Facebook on Wednesday. “Everything is interesting to us.”

CentOS 6.4 service virt-who won’t start – work around

Here is the problem.

[root@bob ~]# service virt-who start

Démarrage de virt-who : Traceback (most recent call last):

File “/usr/share/virt-who/virt-who.py”, line 33, in <module>

from subscriptionmanager import SubscriptionManager, SubscriptionManagerError

File “/usr/share/virt-who/subscriptionmanager.py”, line 24, in <module>

import rhsm.connection as rhsm_connection

ImportError: No module named rhsm.connection

[FAILED]

 

There is a simple work around. Install the Scientific Linux 6 python-rhsm package.

 

Name : python-rhsm

Version : 1.1.8 Vendor : Scientific Linux

Release : 1.el6 Date : 2013-02-22 01:54:26

Group : Development/Libraries Source RPM : python-rhsm-1.1.8-1.el6.src.rpm

Size : 0.27 MB

Packager : Scientific Linux

Summary : A Python library to communicate with a Red Hat Unified Entitlement Platform

Description :

A small library for communicating with the REST interface of a Red Hat Unified

Entitlement Platform. This interface is used for the management of system

entitlements, certificates, and access to content.

 

First install python-simplejson

 

[root@bob ~]# yum install python-simplejson

 

Then pick a mirror from http://rpm.pbone.net/index.php3/stat/4/idpl/20813982/dir/scientific_linux_6/com/python-rhsm-1.1.8-1.el6.x86_64.rpm.html and download python-rhsm-1.1.8-1.el6.x86_64.rpm and install it

 

[root@bob ~]# rpm –install python-rhsm-1.1.8-1.el6.x86_64.rpm

 

Then start virt-who

 

[root@bob ~]# service virt-who start

Hadoop to Hadoop Copy

Here recently I need to copy the content of one hadoop cluster to another for geo redundancy. Thankfully instead of have to write something to do it, Hadoop supply a hand tool to do it “DistCp (distributed copy)”.

 

DistCp is a tool used for large inter/intra-cluster copying. It uses Map/Reduce to effect its distribution, error handling and recovery, and reporting. It expands a list of files and directories into input to map tasks, each of which will copy a partition of the files specified in the source list. Its Map/Reduce pedigree has endowed it with some quirks in both its semantics and execution. The purpose of this document is to offer guidance for common tasks and to elucidate its model.

 

Here are the basic for using:

 

bash$ hadoop distcp hdfs://nn1:8020/foo/bar \

hdfs://nn2:8020/bar/foo

 

This will expand the namespace under /foo/bar on nn1 into a temporary file, partition its contents among a set of map tasks, and start a copy on each TaskTracker from nn1 to nn2. Note that DistCp expects absolute paths.

 

Here is how you can handle multiple source directories on the command line:

 

bash$ hadoop distcp hdfs://nn1:8020/foo/a \

hdfs://nn1:8020/foo/b \

hdfs://nn2:8020/bar/foo