Skip to content

Roger Hosto

Good Talk

Menu
  • Home
  • Blogs
    • Databases Administration
      • MySQL
      • NoSQL
    • Development
    • Open Source Software
    • System Administration
  • Resume
  • About
Menu

Querying Apache Hadoop Resource Manager with Python.

Posted on November 14, 2016September 20, 2025 by webgeek

Querying Apache Hadoop Resource Manager with Python.

I was recently asked to write a script that would monitor the running application on the Apache Hadoop Resource Manager.

I wonder over to the Apache Hadoop Cluster Application Statistics API. The API allows to query most of the information that you see in the WEB UI. Information such as status on the cluster, metrics on the cluster, scheduler information, information about nodes in the cluster, and information about applications on the cluster.

I first start by querying the cluster info.

import urllib2
import json

resource_manager = 'http://resourcemanager:8088'

info_url = resource_manager+"/ws/v1/cluster/info"

request = urllib2.Request(info_url)

'''
If you prefer to work with xml replace json below with xml
'''
request.add_header('Accept', 'application/json')

response = urllib2.urlopen(request)
data = json.loads(response.read())

print json.dumps(data, sort_keys=True, indent=4, separators=(',', ': '))

returns the following:

{
"clusterInfo": {
"haState": "ACTIVE",
"hadoopBuildVersion": "2.6.0-cdh5.7.0 from c00978c67b0d3fe9f3b896b5030741bd40bf541a by jenkins source checksum b2eabfa328e763c88cb14168f9b372",
"hadoopVersion": "2.6.0-cdh5.7.0",
"hadoopVersionBuiltOn": "2016-03-23T18:36Z",
"id": 1478120586043,
"resourceManagerBuildVersion": "2.6.0-cdh5.7.0 from c00978c67b0d3fe9f3b896b5030741bd40bf541a by jenkins source checksum deb0fdfede32bbbb9cfbda6aa7e380",
"resourceManagerVersion": "2.6.0-cdh5.7.0",
"resourceManagerVersionBuiltOn": "2016-03-23T18:43Z",
"rmStateStoreName": "org.apache.hadoop.yarn.server.resourcemanager.recovery.NullRMStateStore",
"startedOn": 1478120586043,
"state": "STARTED"
}
}

Now onto what I need to do, querying the Resource Manager about running applications. The Cluster Applications API allow you to collect information on resources, which represents an application. There are multiple parameters that can be specified to retrieve data. For a list of parameters go to Cluster_Applications_API

I however just need the information on running applications. Which looks something like.

import urllib2
import json

resource_manager = 'http://dvcdhnn02:8088'

info_url = resource_manager+"/ws/v1/cluster/apps?states=running"

request = urllib2.Request(info_url)

'''
If you prefer to work with xml replace json below with xml
'''
request.add_header('Accept', 'application/json')

response = urllib2.urlopen(request)
data = json.loads(response.read())

print json.dumps(data, sort_keys=True, indent=4, separators=(',', ': '))

which returns something like:

{
"apps": {
"app": [
{
"allocatedMB": 24576,
"allocatedVCores": 3,
"amContainerLogs": "http://resourcemanager:8042/node/containerlogs/container_1478120586043_15232_01_000001/hdfs",
"amHostHttpAddress": "resourcemanager:8042",
"applicationTags": "",
"applicationType": "MAPREDUCE",
"clusterId": 1478120586043,
"diagnostics": "",
"elapsedTime": 18009,
"finalStatus": "UNDEFINED",
"finishedTime": 0,
"id": "application_1478120586043_15232",
"logAggregationStatus": "NOT_START",
"memorySeconds": 431865,
"name": "SELECT 1 AS `number_of_records...TIMESTAMP))(Stage-1)",
"numAMContainerPreempted": 0,
"numNonAMContainerPreempted": 0,
"preemptedResourceMB": 0,
"preemptedResourceVCores": 0,
"progress": 54.07485,
"queue": "root.hdfs",
"runningContainers": 3,
"startedTime": 1479156085020,
"state": "RUNNING",
"trackingUI": "ApplicationMaster",
"trackingUrl": "http://resourcemanager:8088/proxy/application_1478120586043_15232/",
"user": "hdfs",
"vcoreSeconds": 51
}
]
}
}

straight forward and simple to use.

Category: Databases Administration, Development
  • Back to Basics: ORM and Its Impact on Database and Data Architecture
  • MySQL Error: 1062 'Duplicate entry' Error
  • Installing MariaDB 10.1 on CentOS 6.8
  • Linux Mint
  • Querying Apache Hadoop Resource Manager with Python.
  • LinkedIn
© 2026 Roger Hosto | Powered by Minimalist Blog WordPress Theme