Querying Apache Hadoop Resource Manager with Python.

Querying Apache Hadoop Resource Manager with Python.

I was recently asked to write a script that would monitor the running application on the Apache Hadoop Resource Manager.

I wonder over to the Apache Hadoop Cluster Application Statistics API. The API allows to query most of the information that you see in the WEB UI. Information such as status on the cluster, metrics on the cluster, scheduler information, information about nodes in the cluster, and information about applications on the cluster.

I first start by querying the cluster info.

import urllib2
import json

resource_manager = 'http://resourcemanager:8088'

info_url = resource_manager+"/ws/v1/cluster/info"

request = urllib2.Request(info_url)

'''
If you prefer to work with xml replace json below with xml
'''
request.add_header('Accept', 'application/json')

response = urllib2.urlopen(request)
data = json.loads(response.read())

print json.dumps(data, sort_keys=True, indent=4, separators=(',', ': '))

returns the following:

{
"clusterInfo": {
"haState": "ACTIVE",
"hadoopBuildVersion": "2.6.0-cdh5.7.0 from c00978c67b0d3fe9f3b896b5030741bd40bf541a by jenkins source checksum b2eabfa328e763c88cb14168f9b372",
"hadoopVersion": "2.6.0-cdh5.7.0",
"hadoopVersionBuiltOn": "2016-03-23T18:36Z",
"id": 1478120586043,
"resourceManagerBuildVersion": "2.6.0-cdh5.7.0 from c00978c67b0d3fe9f3b896b5030741bd40bf541a by jenkins source checksum deb0fdfede32bbbb9cfbda6aa7e380",
"resourceManagerVersion": "2.6.0-cdh5.7.0",
"resourceManagerVersionBuiltOn": "2016-03-23T18:43Z",
"rmStateStoreName": "org.apache.hadoop.yarn.server.resourcemanager.recovery.NullRMStateStore",
"startedOn": 1478120586043,
"state": "STARTED"
}
}

Now onto what I need to do, querying the Resource Manager about running applications. The Cluster Applications API allow you to collect information on resources, which represents an application. There are multiple parameters that can be specified to retrieve data. For a list of parameters go to Cluster_Applications_API

I however just need the information on running applications. Which looks something like.

import urllib2
import json

resource_manager = 'http://dvcdhnn02:8088'

info_url = resource_manager+"/ws/v1/cluster/apps?states=running"

request = urllib2.Request(info_url)

'''
If you prefer to work with xml replace json below with xml
'''
request.add_header('Accept', 'application/json')

response = urllib2.urlopen(request)
data = json.loads(response.read())

print json.dumps(data, sort_keys=True, indent=4, separators=(',', ': '))

which returns something like:

{
"apps": {
"app": [
{
"allocatedMB": 24576,
"allocatedVCores": 3,
"amContainerLogs": "http://resourcemanager:8042/node/containerlogs/container_1478120586043_15232_01_000001/hdfs",
"amHostHttpAddress": "resourcemanager:8042",
"applicationTags": "",
"applicationType": "MAPREDUCE",
"clusterId": 1478120586043,
"diagnostics": "",
"elapsedTime": 18009,
"finalStatus": "UNDEFINED",
"finishedTime": 0,
"id": "application_1478120586043_15232",
"logAggregationStatus": "NOT_START",
"memorySeconds": 431865,
"name": "SELECT 1 AS `number_of_records...TIMESTAMP))(Stage-1)",
"numAMContainerPreempted": 0,
"numNonAMContainerPreempted": 0,
"preemptedResourceMB": 0,
"preemptedResourceVCores": 0,
"progress": 54.07485,
"queue": "root.hdfs",
"runningContainers": 3,
"startedTime": 1479156085020,
"state": "RUNNING",
"trackingUI": "ApplicationMaster",
"trackingUrl": "http://resourcemanager:8088/proxy/application_1478120586043_15232/",
"user": "hdfs",
"vcoreSeconds": 51
}
]
}
}

straight forward and simple to use.

8 responses to “Querying Apache Hadoop Resource Manager with Python.”

  1. 강남호빠 Avatar

    I had such a wonderful evening at 강남호빠—elegant and relaxing.

  2. 강남여성전용마사지 Avatar

    딱딱하게 굳은 마음까지 따뜻하게 풀어주는 강남여성전용마사지, 정말 소중한 경험이었어요.

  3. 토닥이후기 Avatar

    토닥이 just became my new favorite place.

  4. 인천여성전용마사지 Avatar

    스트레스가 사라지는 기분이었어요. 인천여성전용마사지 덕분이에요.

  5. 인천여성전용마사지 Avatar

    차분한 공간과 정성 가득한 손길이 어우러진
    인천여성전용마사지는 제게 진짜 쉼이 무엇인지 알려줬어요.

  6. 여성전용마사지 밤밤 Avatar

    모든 여성에게 추천하고 싶은 여성전용 마사지.

  7. 서울토닥이 Avatar

    나만의 시간을 갖고 싶을 때, 언제든 환영해주는 토닥이에서 따뜻한 케어를 받아보세요.
    지금 바로 경험해보세요.

Leave a Reply