Giter Site home page Giter Site logo

microsoft / pai Goto Github PK

View Code? Open in Web Editor NEW
2.6K 105.0 544.0 72.26 MB

Resource scheduling and cluster management for AI

Home Page: https://openpai.readthedocs.io

License: MIT License

Python 18.75% Batchfile 0.13% Shell 7.53% Java 22.99% JavaScript 44.71% Dockerfile 1.06% TypeScript 1.20% Go 1.16% Ruby 0.50% SCSS 0.26% Lua 0.33% EJS 1.05% Jinja 0.34%
kubernetes resource-management scheduling machine-learning tensorflow cluster-manager gpu model-training ai artificial-intelligence

pai's Introduction

Open Platform for AI (OpenPAI) alt text

Build Status Join the chat at https://gitter.im/Microsoft/pai Version

After the release of v1.8.1 , OpenPAI has entered stable mode with no major feature release planned. In order to save maintenance efforts, we changed the repo to read only mode. For collaboration, please contact repo admin directly.

With the release of v1.0, OpenPAI is switching to a more robust, more powerful and lightweight architecture. OpenPAI is also becoming more and more modular so that the platform can be easily customized and expanded to suit new needs. OpenPAI also provides many AI user-friendly features, making it easier for end users and administrators to complete daily AI tasks.

                                                                                                                                                                                     
Marketplace Logo
 Web Portal VScode SDK
API
Services
User Authentication User/Group Management
Storage Management Cluster/Job Monitoring
Job Orchestration Job Scheduling
Job Runtime Job Error Analysis
Kubernetes Cluster Management
CPU/GPU/FPGA/InfiniBand

Table of Contents

When to consider OpenPAI

  1. When your organization needs to share powerful AI computing resources (GPU/FPGA farm, etc.) among teams.
  2. When your organization needs to share and reuse common AI assets like Model, Data, Environment, etc.
  3. When your organization needs an easy IT ops platform for AI.
  4. When you want to run a complete training pipeline in one place.

Why choose OpenPAI

The platform incorporates the mature design that has a proven track record in Microsoft's large-scale production environment.

Support on-premises and easy to deploy

OpenPAI is a full stack solution. OpenPAI not only supports on-premises, hybrid, or public Cloud deployment but also supports single-box deployment for trial users.

Support popular AI frameworks and heterogeneous hardware

Pre-built docker for popular AI frameworks. Easy to include heterogeneous hardware. Support Distributed training, such as distributed TensorFlow.

Most complete solution and easy to extend

OpenPAI is a most complete solution for deep learning, support virtual cluster, compatible with Kubernetes eco-system, complete training pipeline at one cluster etc. OpenPAI is architected in a modular way: different module can be plugged in as appropriate. Here is the architecture of OpenPAI, highlighting technical innovations of the platform.

Get started

OpenPAI manages computing resources and is optimized for deep learning. Through docker technology, the computing hardware are decoupled with software, so that it's easy to run distributed jobs, switch with different deep learning frameworks, or run other kinds of jobs on consistent environments.

As OpenPAI is a platform, there are typically two different roles:

  • Cluster users are the consumers of the cluster's computing resources. According to the deployment scenarios, cluster users could be researchers of Machine Learning and Deep Learning, data scientists, lab teachers, students and so on.
  • Cluster administrators are the owners and maintainers of computing resources. The administrators are responsible for the deployment and availability of the cluster.

OpenPAI provides end-to-end manuals for both cluster users and administrators.

For cluster administrators

The admin manual is a comprehensive guide for cluster administrators, it covers (but not limited to) the following contents:

  • Installation and upgrade. The installation is based on Kubespray, and here is the system requirements. OpenPAI provides an installation guide to facilitate the installation.

    If you are considering upgrade from older version to the latest v1.0.0, please refer to the table below for a brief comparison between v0.14.0 and the v1.0.0. More detail about the upgrade considerations can be found upgrade guide.

    v0.14.0 v1.0.0
    Architecture Kubernetes + Hadoop YARN Kubernetes
    Scheduler YARN Scheduler HiveD / K8S default
    Job Orchestrating YARN Framework Launcher Framework Controller
    RESTful API v1 + v2 pure v2
    Storage Team-wise storage plugin PV/PVC storage sharing
    Marketplace Marketplace v2 openpaimarketplace
    SDK Python JavaScript / TypeScript

    If there is any question during deployment, please check installation FAQs and troubleshooting first. If it is not covered yet, refer to here to ask question or submit an issue.

  • Basic cluster management. Through the Web-portal and a command-line tool paictl, administrators could complete cluster managements, such as adding (or removing) nodes, monitoring nodes and services, and storages setup and permission control.

  • Users and groups management. Administrators could manage the users and groups easily.

  • Alerts management. Administrators could customize alerts rules and actions.

  • Customization. Administrators could customize the cluster by plugins. Administrators could also upgrade (or downgrade) a single component (e.g. rest servers) to address customized application demands.

For cluster users

The user manual is a guidance for cluster users, who could train and serve deep learning (and other) tasks on OpenPAI.

  • Job submission and monitoring. The quick start tutorial is a good start for learning how to train models on OpenPAI. And more examples and supports to multiple mainstream frameworks (out-of-the-box docker images) are in here. OpenPAI also provides supports for good debuggability and advanced job functionalities.

  • Data managements. Users could use cluster provisioned storages and custom storages in their jobs. The cluster provisioned storages are well integrated and easy to configure in a job (refer to here).

  • Collaboration and sharing. OpenPAI provides facilities for collaboration in teams and organizations. The cluster provisioned storages are organized by teams (groups). And users could easily share their works (e.g. jobs) in the marketplace, where others could discover and reproduce (clone) by one-click.

Besides the webportal, OpenPAI provides VS Code extension and command line tool (preview). The VS Code extension is a friendly, GUI based client tool of OpenPAI, and it's highly recommended. It's an extension of Visual Studio Code. It can submit job, simulate jobs locally, manage multiple OpenPAI environments, and so on.

Standalone Components

With the v1.0.0 release, OpenPAI starts using a more modularized component design and re-organize the code structure to 1 main repo together with 7 standalone key component repos. pai is the main repo, and the 7 component repos are:

  • hivedscheduler is a Kubernetes Scheduler Extender for Multi-Tenant GPU clusters, which provides various advantages over standard k8s scheduler.
  • frameworkcontroller is built to orchestrate all kinds of applications on Kubernetes by a single controller.
  • openpai-protocol is the specification of OpenPAI job protocol.
  • openpai-runtime provides runtime support which is necessary for the OpenPAI protocol.
  • openpaisdk is a JavaScript SDK designed to facilitate the developers of OpenPAI to offer more user-friendly experience.
  • openpaimarketplace is a service which stores examples and job templates. Users can use it from webportal plugin to share their jobs or run-and-learn others' sharing job.
  • openpaivscode is a VSCode extension, which makes users connect OpenPAI clusters, submit AI jobs, simulate jobs locally and manage files in VSCode easily.

Reference

Related Projects

Targeting at openness and advancing state-of-art technology, Microsoft Research (MSR) and Microsoft Software Technology Center Asia (STCA) had also released few other open source projects.

  • NNI : An open source AutoML toolkit for neural architecture search and hyper-parameter tuning. We encourage researchers and students leverage these projects to accelerate the AI development and research.
  • MMdnn : A comprehensive, cross-framework solution to convert, visualize and diagnose deep neural network models. The "MM" in MMdnn stands for model management and "dnn" is an acronym for deep neural network.
  • NeuronBlocks : An NLP deep learning modeling toolkit that helps engineers to build DNN models like playing Lego. The main goal of this toolkit is to minimize developing cost for NLP deep neural network model building, including both training and inference stages.
  • SPTAG : Space Partition Tree And Graph (SPTAG) is an open source library for large scale vector approximate nearest neighbor search scenario.

Get involved

How to contribute

Contributor License Agreement

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Call for contribution

We are working on a set of major features improvement and refactor, anyone who is familiar with the features is encouraged to join the design review and discussion in the corresponding issue ticket.

Who should consider contributing to OpenPAI

  • Folks who want to add support for other ML and DL frameworks
  • Folks who want to make OpenPAI a richer AI platform (e.g. support for more ML pipelines, hyperparameter tuning)
  • Folks who want to write tutorials/blog posts showing how to use OpenPAI to solve AI problems

Contributors

One key purpose of OpenPAI is to support the highly diversified requirements from academia and industry. OpenPAI is completely open: it is under the MIT license. This makes OpenPAI particularly attractive to evaluate various research ideas, which include but not limited to the components.

OpenPAI operates in an open model. It is initially designed and developed by Microsoft Research (MSR) and Microsoft Software Technology Center Asia (STCA) platform team. We are glad to have Peking University, Xi'an Jiaotong University, Zhejiang University, University of Science and Technology of China and SHANGHAI INESA AI INNOVATION CENTER (SHAIIC) joined us to develop the platform jointly. Contributions from academia and industry are all highly welcome.

pai's People

Contributors

abuccts avatar asakuri avatar binyang2014 avatar debuggy avatar dependabot[bot] avatar dongzhaoyu avatar fanyangcs avatar gerhut avatar hao1939 avatar hrwhisper avatar hwuu avatar hzy46 avatar mslichao avatar mzmssg avatar qinchen123 avatar qyyy avatar scarlett2018 avatar shishaochen avatar suiguoxin avatar sunqinzheng avatar wangcan0329 avatar wangdian avatar xudifsd avatar yanjiegao avatar yanli2017 avatar ydye avatar yitongfeng avatar yitongfeng-git avatar yiyione avatar yqwang-ms avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pai's Issues

Etcd pod restarting failed.

Once etcd pod failed, it can't restart.

Pod on 179


2018-01-03 05:40:58.290096 I | etcdmain: etcd Version: 2.2.5
2018-01-03 05:40:58.290141 I | etcdmain: Git SHA: bc9ddf2
2018-01-03 05:40:58.290144 I | etcdmain: Go Version: go1.5.3
2018-01-03 05:40:58.290152 I | etcdmain: Go OS/Arch: linux/amd64
2018-01-03 05:40:58.290157 I | etcdmain: setting maximum number of CPUs to 6, total number of available CPUs is 6
2018-01-03 05:40:58.290272 I | etcdmain: listening for peers on http://10.xxx.xxx.179:2380
2018-01-03 05:40:58.290323 I | etcdmain: listening for client requests on http://10.xxx.xxx.179:4001
2018-01-03 05:40:58.291744 I | etcdmain: stopping listening for client requests on http://10.xxx.xxx.179:4001
2018-01-03 05:40:58.291763 I | etcdmain: stopping listening for peers on http://10.xxx.xxx.179:2380
2018-01-03 05:40:58.291769 C | etcdmain: member ea44bdd6a3978db0 has already been bootstrapped

Another pod on 180

2018-01-03 05:30:02.791426 I | raft: raft.node: 23b7e242ba479253 elected leader 39fdf5236256c15f at term 257
2018-01-03 05:30:04.571334 E | rafthttp: failed to dial ea44bdd6a3978db0 on stream MsgApp v2 (dial tcp 10.xxx.xxx.179:2380: i/o timeout)
2018-01-03 05:30:04.671816 E | rafthttp: failed to dial ea44bdd6a3978db0 on stream Message (dial tcp 10.xxx.xxx.179:2380: i/o timeout)
2018-01-03 05:30:04.671957 E | rafthttp: failed to dial ea44bdd6a3978db0 on stream MsgApp v2 (dial tcp 10.xxx.xxx.179:2380: getsockopt: connection refused)
2018-01-03 05:30:04.772428 E | rafthttp: failed to dial ea44bdd6a3978db0 on stream Message (dial tcp 10.xxx.xxx.179:2380: getsockopt: connection refused)
2018-01-03 05:30:13.431006 E | rafthttp: failed to dial ea44bdd6a3978db0 on stream Message (dial tcp 10.xxx.xxx.179:2380: i/o timeout)
2018-01-03 05:30:13.531899 E | rafthttp: failed to dial ea44bdd6a3978db0 on stream Message (dial tcp 10.xxx.xxx.179:2380: getsockopt: connection refused)
2018-01-03 05:30:13.717462 E | rafthttp: failed to dial ea44bdd6a3978db0 on stream MsgApp v2 (dial tcp 10.xxx.xxx.179:2380: i/o timeout)
2018-01-03 05:30:13.818156 E | rafthttp: failed to dial ea44bdd6a3978db0 on stream MsgApp v2 (dial tcp 10.xxx.xxx.179:2380: getsockopt: connection refused)
2018-01-03 05:30:25.657743 E | rafthttp: failed to dial ea44bdd6a3978db0 on stream Message (dial tcp 10.xxx.xxx.179:2380: i/o timeout)
2018-01-03 05:30:25.758471 E | rafthttp: failed to dial ea44bdd6a3978db0 on stream Message (dial tcp 10.xxx.xxx.179:2380: getsockopt: connection refused)
2018-01-03 05:30:27.986369 W | rafthttp: the connection to peer ea44bdd6a3978db0 is unhealthy
2018-01-03 05:30:35.639031 E | rafthttp: failed to dial ea44bdd6a3978db0 on stream MsgApp v2 (dial tcp 10.xxx.xxx.179:2380: i/o timeout)
2018-01-03 05:30:35.739738 E | rafthttp: failed to dial ea44bdd6a3978db0 on stream Message (dial tcp 10.xxx.xxx.179:2380: i/o timeout)
2018-01-03 05:30:35.740829 E | rafthttp: failed to dial ea44bdd6a3978db0 on stream MsgApp v2 (dial tcp 10.xxx.xxx.179:2380: getsockopt: connection refused)
2018-01-03 05:30:36.790189 E | rafthttp: failed to dial ea44bdd6a3978db0 on stream Message (dial tcp 10.xxx.xxx.179:2380: getsockopt: connection refused)
2018-01-03 05:30:49.107142 E | rafthttp: failed to dial ea44bdd6a3978db0 on stream MsgApp v2 (dial tcp 10.xxx.xxx.179:2380: i/o timeout)
2018-01-03 05:30:49.207806 E | rafthttp: failed to dial ea44bdd6a3978db0 on stream MsgApp v2 (dial tcp 10.xxx.xxx.179:2380: getsockopt: connection refused)
2018-01-03 05:30:57.986633 W | rafthttp: the connection to peer ea44bdd6a3978db0 is unhealthy
2018-01-03 05:30:58.370051 E | rafthttp: failed to dial ea44bdd6a3978db0 on stream Message (dial tcp 10.xxx.xxx.179:2380: i/o timeout)
2018-01-03 05:30:59.535251 E | rafthttp: failed to dial ea44bdd6a3978db0 on stream Message (dial tcp 10.xxx.xxx.179:2380: getsockopt: connection refused)
2018-01-03 05:31:10.329848 E | rafthttp: failed to dial ea44bdd6a3978db0 on stream MsgApp v2 (dial tcp 10.xxx.xxx.179:2380: i/o timeout)
2018-01-03 05:31:10.430787 E | rafthttp: failed to dial ea44bdd6a3978db0 on stream MsgApp v2 (dial tcp 10.xxx.xxx.179:2380: getsockopt: connection refused)
2018-01-03 05:31:27.986816 W | rafthttp: the connection to peer ea44bdd6a3978db0 is unhealthy
2018-01-03 05:31:50.757424 E | rafthttp: failed to dial ea44bdd6a3978db0 on stream MsgApp v2 (dial tcp 10.xxx.xxx.179:2380: i/o timeout)
2018-01-03 05:31:50.847790 E | rafthttp: failed to dial ea44bdd6a3978db0 on stream Message (dial tcp 10.xxx.xxx.179:2380: i/o timeout)
2018-01-03 05:31:50.858000 E | rafthttp: failed to dial ea44bdd6a3978db0 on stream MsgApp v2 (dial tcp 10.xxx.xxx.179:2380: getsockopt: connection refused)
2018-01-03 05:31:50.948343 E | rafthttp: failed to dial ea44bdd6a3978db0 on stream Message (dial tcp 10.xxx.xxx.179:2380: getsockopt: connection refused)
2018-01-03 05:31:57.986936 W | rafthttp: the connection to peer ea44bdd6a3978db0 is unhealthy
2018-01-03 05:32:04.053460 E | rafthttp: failed to dial ea44bdd6a3978db0 on stream Message (dial tcp 10.xxx.xxx.179:2380: i/o timeout)
2018-01-03 05:32:04.603290 E | rafthttp: failed to dial ea44bdd6a3978db0 on stream Message (dial tcp 10.xxx.xxx.179:2380: getsockopt: connection refused)
2018-01-03 05:32:27.987055 W | rafthttp: the connection to peer ea44bdd6a3978db0 is unhealthy
2018-01-03 05:32:45.655897 E | rafthttp: failed to dial ea44bdd6a3978db0 on stream Message (dial tcp 10.xxx.xxx.179:2380: i/o timeout)
2018-01-03 05:32:45.756450 E | rafthttp: failed to dial ea44bdd6a3978db0 on stream Message (dial tcp 10.xxx.xxx.179:2380: getsockopt: connection refused)
2018-01-03 05:32:48.082272 E | rafthttp: failed to dial ea44bdd6a3978db0 on stream MsgApp v2 (dial tcp 10.xxx.xxx.179:2380: i/o timeout)
2018-01-03 05:32:48.182966 E | rafthttp: failed to dial ea44bdd6a3978db0 on stream MsgApp v2 (dial tcp 10.xxx.xxx.179:2380: getsockopt: connection refused)
2018-01-03 05:32:48.566593 E | rafthttp: failed to dial ea44bdd6a3978db0 on stream Message (dial tcp 10.xxx.xxx.179:2380: i/o timeout)
2018-01-03 05:32:48.667225 E | rafthttp: failed to dial ea44bdd6a3978db0 on stream Message (dial tcp 10.xxx.xxx.179:2380: getsockopt: connection refused)
2018-01-03 05:32:57.987190 W | rafthttp: the connection to peer ea44bdd6a3978db0 is unhealthy
2018-01-03 05:33:00.512185 E | rafthttp: failed to dial ea44bdd6a3978db0 on stream Message (dial tcp 10.xxx.xxx.179:2380: i/o timeout)
2018-01-03 05:33:00.612883 E | rafthttp: failed to dial ea44bdd6a3978db0 on stream Message (dial tcp 10.xxx.xxx.179:2380: getsockopt: connection refused)
2018-01-03 05:33:15.628634 E | rafthttp: failed to dial ea44bdd6a3978db0 on stream MsgApp v2 (dial tcp 10.xxx.xxx.179:2380: i/o timeout)

Launcher cannot work with Java 9

Launcher cannot work with Java 9 due to hibernate validator's getJavaRelease.

Caused by: java.lang.ArrayIndexOutOfBoundsException: 1 at org.hibernate.validator.internal.util.Version.getJavaRelease(Version.java:36) at org.hibernate.validator.internal.engine.ConfigurationImpl.<init>(ConfigurationImpl.java:119) at org.hibernate.validator.internal.engine.ConfigurationImpl.<init>(ConfigurationImpl.java:95) at org.hibernate.validator.HibernateValidator.createGenericConfiguration(HibernateValidator.java:31) at javax.validation.Validation$GenericBootstrapImpl.configure(Validation.java:296) at javax.validation.Validation.buildDefaultValidatorFactory(Validation.java:103) at com.microsoft.frameworklauncher.common.ModelValidation.<clinit>(ModelValidation.java:32) at com.microsoft.frameworklauncher.common.model.FrameworkDescriptor.setTaskRoles(FrameworkDescriptor.java:103) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.base/java.lang.reflect.Method.invoke(Unknown Source) at com.fasterxml.jackson.databind.deser.impl.MethodProperty.set(MethodProperty.java:115) ... 50 more

Kubernetes may occupy PAI's service port

An Issue come across:

core@xxxxxxxxx:/datastorage/hdfs$ sudo lsof -i:50070
COMMAND     PID USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
kube-apis 85075 root   23u  IPv4 4573704      0t0  TCP 10.xxx.xxx.xxx:50070->10.xxx.xxx.xxx:4001 (ESTABLISHED)
etcd      85309 root   19u  IPv4 4599281      0t0  TCP 10.xxx.xxx.xxx:4001->10.xxx.xxx.xxx:50070 (ESTABLISHED)

Kubernetes may occupy our service port. Watch the example above. 50070 is hdfs' port. If this port is occupied, namenode will be failed to start up.

Too much application instances in hadoop for the same job.

When there is no enough resources for a job, The job will schedule fail and retry, each time re-try, Framework launcher will create a application instances in hadoop in each ~30 minutes, this cause the hadoop application is full of duplicated failed job information.

Webportal / hardware page: Refine implementation

According to Yifan's suggestion, it is better to separate the table content in a .ejs file from other parts of the whole HTML page. This enhancement can potentially increase the ease of maintenance of the code.

org.apache.hadoop.ipc.RemoteException in RM pod

RM cannot communicate with the datanode. Cleaning up hdfs and re-deploying doesn't work.
Here's the log of RM pod (I replaced the host ip with xxx.xxx.xxx.xxx):

17/12/24 12:07:32 INFO resourcemanager.ResourceManager: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting ResourceManager
STARTUP_MSG:   host = master/xxx.xxx.xxx.xxx
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 2.7.2
STARTUP_MSG:   classpath = /usr/local/hadoop/etc/hadoop/:/usr/local/hadoop/etc/hadoop/:/usr/local/hadoop/etc/hadoop/:/usr/local/hadoop/share/hadoop/common/lib/apacheds-kerberos-codec-2.0.0-M15.jar:/usr/local/hadoop/share/hadoop/common/lib/java-xmlbuilder-0.4.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-configuration-1.6.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-net-3.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jersey-server-1.9.jar:/usr/local/hadoop/share/hadoop/common/lib/jsr305-3.0.0.jar:/usr/local/hadoop/share/hadoop/common/lib/xz-1.0.jar:/usr/local/hadoop/share/hadoop/common/lib/guava-11.0.2.jar:/usr/local/hadoop/share/hadoop/common/lib/hamcrest-core-1.3.jar:/usr/local/hadoop/share/hadoop/common/lib/jetty-6.1.26.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-jaxrs-1.9.13.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-logging-1.1.3.jar:/usr/local/hadoop/share/hadoop/common/lib/jetty-util-6.1.26.jar:/usr/local/hadoop/share/hadoop/common/lib/xmlenc-0.52.jar:/usr/local/hadoop/share/hadoop/common/lib/api-asn1-api-1.0.0-M20.jar:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar:/usr/local/hadoop/share/hadoop/common/lib/hadoop-annotations-2.7.2.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-beanutils-core-1.8.0.jar:/usr/local/hadoop/share/hadoop/common/lib/curator-recipes-2.7.1.jar:/usr/local/hadoop/share/hadoop/common/lib/stax-api-1.0-2.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-cli-1.2.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-httpclient-3.1.jar:/usr/local/hadoop/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop/share/hadoop/common/lib/curator-framework-2.7.1.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-digester-1.8.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-codec-1.4.jar:/usr/local/hadoop/share/hadoop/common/lib/jsch-0.1.42.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-beanutils-1.7.0.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-compress-1.4.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jets3t-0.9.0.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/common/lib/jersey-core-1.9.jar:/usr/local/hadoop/share/hadoop/common/lib/servlet-api-2.5.jar:/usr/local/hadoop/share/hadoop/common/lib/hadoop-auth-2.7.2.jar:/usr/local/hadoop/share/hadoop/common/lib/apacheds-i18n-2.0.0-M15.jar:/usr/local/hadoop/share/hadoop/common/lib/jaxb-api-2.2.2.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-math3-3.1.1.jar:/usr/local/hadoop/share/hadoop/common/lib/httpcore-4.2.5.jar:/usr/local/hadoop/share/hadoop/common/lib/jettison-1.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-xc-1.9.13.jar:/usr/local/hadoop/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/usr/local/hadoop/share/hadoop/common/lib/jersey-json-1.9.jar:/usr/local/hadoop/share/hadoop/common/lib/netty-3.6.2.Final.jar:/usr/local/hadoop/share/hadoop/common/lib/paranamer-2.3.jar:/usr/local/hadoop/share/hadoop/common/lib/activation-1.1.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-lang-2.6.jar:/usr/local/hadoop/share/hadoop/common/lib/mockito-all-1.8.5.jar:/usr/local/hadoop/share/hadoop/common/lib/gson-2.2.4.jar:/usr/local/hadoop/share/hadoop/common/lib/zookeeper-3.4.6.jar:/usr/local/hadoop/share/hadoop/common/lib/avro-1.7.4.jar:/usr/local/hadoop/share/hadoop/common/lib/httpclient-4.2.5.jar:/usr/local/hadoop/share/hadoop/common/lib/slf4j-api-1.7.10.jar:/usr/local/hadoop/share/hadoop/common/lib/junit-4.11.jar:/usr/local/hadoop/share/hadoop/common/lib/log4j-1.2.17.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-collections-3.2.2.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/common/lib/curator-client-2.7.1.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-io-2.4.jar:/usr/local/hadoop/share/hadoop/common/lib/asm-3.2.jar:/usr/local/hadoop/share/hadoop/common/lib/jsp-api-2.1.jar:/usr/local/hadoop/share/hadoop/common/lib/api-util-1.0.0-M20.jar:/usr/local/hadoop/share/hadoop/common/lib/snappy-java-1.0.4.1.jar:/usr/local/hadoop/share/hadoop/common/lib/htrace-core-3.1.0-incubating.jar:/usr/local/hadoop/share/hadoop/common/hadoop-common-2.7.2.jar:/usr/local/hadoop/share/hadoop/common/hadoop-common-2.7.2-tests.jar:/usr/local/hadoop/share/hadoop/common/hadoop-nfs-2.7.2.jar:/usr/local/hadoop/share/hadoop/hdfs:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/netty-all-4.0.23.Final.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jersey-server-1.9.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jsr305-3.0.0.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/guava-11.0.2.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/leveldbjni-all-1.8.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jetty-6.1.26.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-logging-1.1.3.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jetty-util-6.1.26.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/xmlenc-0.52.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-cli-1.2.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-codec-1.4.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jersey-core-1.9.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/servlet-api-2.5.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/xercesImpl-2.9.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/netty-3.6.2.Final.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-lang-2.6.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/xml-apis-1.3.04.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/log4j-1.2.17.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-io-2.4.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/asm-3.2.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/htrace-core-3.1.0-incubating.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-2.7.2-tests.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-2.7.2.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-nfs-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-server-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jsr305-3.0.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/xz-1.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/guava-11.0.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/leveldbjni-all-1.8.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jetty-6.1.26.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jackson-jaxrs-1.9.13.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-logging-1.1.3.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jetty-util-6.1.26.jar:/usr/local/hadoop/share/hadoop/yarn/lib/stax-api-1.0-2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-cli-1.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-codec-1.4.jar:/usr/local/hadoop/share/hadoop/yarn/lib/zookeeper-3.4.6-tests.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-guice-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-compress-1.4.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/yarn/lib/javax.inject-1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-core-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/servlet-api-2.5.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jaxb-api-2.2.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/guice-servlet-3.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jettison-1.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jackson-xc-1.9.13.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jaxb-impl-2.2.3-1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-json-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/netty-3.6.2.Final.jar:/usr/local/hadoop/share/hadoop/yarn/lib/activation-1.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-lang-2.6.jar:/usr/local/hadoop/share/hadoop/yarn/lib/guice-3.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/zookeeper-3.4.6.jar:/usr/local/hadoop/share/hadoop/yarn/lib/aopalliance-1.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/log4j-1.2.17.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-collections-3.2.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-client-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-io-2.4.jar:/usr/local/hadoop/share/hadoop/yarn/lib/asm-3.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-tests-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-applicationhistoryservice-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-sharedcachemanager-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-client-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-api-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-web-proxy-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-registry-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-common-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-common-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-nodemanager-2.7.2.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/jersey-server-1.9.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/xz-1.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/leveldbjni-all-1.8.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/hamcrest-core-1.3.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/hadoop-annotations-2.7.2.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/jersey-guice-1.9.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/commons-compress-1.4.1.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/javax.inject-1.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/jersey-core-1.9.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/guice-servlet-3.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/netty-3.6.2.Final.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/paranamer-2.3.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/guice-3.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/avro-1.7.4.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/aopalliance-1.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/junit-4.11.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/log4j-1.2.17.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/commons-io-2.4.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/asm-3.2.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-plugins-2.7.2.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.2.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-app-2.7.2.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.7.2.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.7.2.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-2.7.2.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.2-tests.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-2.7.2.jar:/usr/local/hadoop/contrib/capacity-scheduler/*.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-tests-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-applicationhistoryservice-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-sharedcachemanager-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-client-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-api-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-web-proxy-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-registry-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-common-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-common-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-nodemanager-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-server-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jsr305-3.0.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/xz-1.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/guava-11.0.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/leveldbjni-all-1.8.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jetty-6.1.26.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jackson-jaxrs-1.9.13.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-logging-1.1.3.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jetty-util-6.1.26.jar:/usr/local/hadoop/share/hadoop/yarn/lib/stax-api-1.0-2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-cli-1.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-codec-1.4.jar:/usr/local/hadoop/share/hadoop/yarn/lib/zookeeper-3.4.6-tests.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-guice-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-compress-1.4.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/yarn/lib/javax.inject-1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-core-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/servlet-api-2.5.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jaxb-api-2.2.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/guice-servlet-3.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jettison-1.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jackson-xc-1.9.13.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jaxb-impl-2.2.3-1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-json-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/netty-3.6.2.Final.jar:/usr/local/hadoop/share/hadoop/yarn/lib/activation-1.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-lang-2.6.jar:/usr/local/hadoop/share/hadoop/yarn/lib/guice-3.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/zookeeper-3.4.6.jar:/usr/local/hadoop/share/hadoop/yarn/lib/aopalliance-1.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/log4j-1.2.17.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-collections-3.2.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-client-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-io-2.4.jar:/usr/local/hadoop/share/hadoop/yarn/lib/asm-3.2.jar:/usr/local/hadoop/etc/hadoop//rm-config/log4j.properties
STARTUP_MSG:   build = https://github.com/apache/hadoop.git -r b165c4fe8a74265c792ce23f546c64604acf0e41; compiled by 'root' on 2017-12-23T14:11Z
STARTUP_MSG:   java = 1.8.0_151
************************************************************/
17/12/24 12:07:32 INFO resourcemanager.ResourceManager: registered UNIX signal handlers for [TERM, HUP, INT]
17/12/24 12:07:32 INFO conf.Configuration: found resource core-site.xml at file:/usr/local/hadoop-2.7.2/etc/hadoop/core-site.xml
17/12/24 12:07:32 INFO security.Groups: clearing userToGroupsMap cache
17/12/24 12:07:32 INFO conf.Configuration: found resource yarn-site.xml at file:/usr/local/hadoop-2.7.2/etc/hadoop/yarn-site.xml
17/12/24 12:07:32 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.RMFatalEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMFatalEventDispatcher
17/12/24 12:07:32 INFO security.NMTokenSecretManagerInRM: NMTokenKeyRollingInterval: 86400000ms and NMTokenKeyActivationDelay: 900000ms
17/12/24 12:07:32 INFO security.RMContainerTokenSecretManager: ContainerTokenKeyRollingInterval: 86400000ms and ContainerTokenKeyActivationDelay: 900000ms
17/12/24 12:07:32 INFO security.AMRMTokenSecretManager: AMRMTokenKeyRollingInterval: 86400000ms and AMRMTokenKeyActivationDelay: 900000 ms
17/12/24 12:07:32 INFO recovery.RMStateStoreFactory: Using RMStateStore implementation - class org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore
17/12/24 12:07:32 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStoreEventType for class org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler
17/12/24 12:07:32 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.NodesListManagerEventType for class org.apache.hadoop.yarn.server.resourcemanager.NodesListManager
17/12/24 12:07:32 INFO resourcemanager.ResourceManager: Using Scheduler: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
17/12/24 12:07:32 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.SchedulerEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher
17/12/24 12:07:32 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher
17/12/24 12:07:32 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher
17/12/24 12:07:32 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$NodeEventDispatcher
17/12/24 12:07:32 INFO impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
17/12/24 12:07:32 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
17/12/24 12:07:32 INFO impl.MetricsSystemImpl: ResourceManager metrics system started
17/12/24 12:07:32 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.RMAppManagerEventType for class org.apache.hadoop.yarn.server.resourcemanager.RMAppManager
17/12/24 12:07:32 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncherEventType for class org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher
17/12/24 12:07:32 INFO resourcemanager.RMNMInfo: Registered RMNMInfo MBean
17/12/24 12:07:32 INFO security.YarnAuthorizationProvider: org.apache.hadoop.yarn.security.ConfiguredYarnAuthorizer is instiantiated.
17/12/24 12:07:32 INFO util.HostsFileReader: Refreshing hosts (include/exclude) list
17/12/24 12:07:32 INFO conf.Configuration: found resource capacity-scheduler.xml at file:/usr/local/hadoop-2.7.2/etc/hadoop/capacity-scheduler.xml
17/12/24 12:07:32 INFO capacity.CapacitySchedulerConfiguration: max alloc mb per queue for root is undefined
17/12/24 12:07:32 INFO capacity.CapacitySchedulerConfiguration: max alloc vcore per queue for root is undefined
17/12/24 12:07:32 INFO capacity.CapacitySchedulerConfiguration: max alloc GPU per queue for root is undefined
17/12/24 12:07:32 INFO capacity.ParentQueue: root, capacity=1.0, asboluteCapacity=1.0, maxCapacity=1.0, asboluteMaxCapacity=1.0, state=RUNNING, acls=ADMINISTER_QUEUE:*SUBMIT_APP:*, labels=*,
, reservationsContinueLooking=true
17/12/24 12:07:32 INFO capacity.ParentQueue: Initialized parent-queue root name=root, fullname=root
17/12/24 12:07:32 INFO capacity.CapacitySchedulerConfiguration: max alloc mb per queue for root.default is undefined
17/12/24 12:07:32 INFO capacity.CapacitySchedulerConfiguration: max alloc vcore per queue for root.default is undefined
17/12/24 12:07:32 INFO capacity.CapacitySchedulerConfiguration: max alloc GPU per queue for root.default is undefined
17/12/24 12:07:32 INFO capacity.LeafQueue: Initializing default
capacity = 0.9 [= (float) configuredCapacity / 100 ]
asboluteCapacity = 0.9 [= parentAbsoluteCapacity * capacity ]
maxCapacity = 1.0 [= configuredMaxCapacity ]
absoluteMaxCapacity = 1.0 [= 1.0 maximumCapacity undefined, (parentAbsoluteMaxCapacity * maximumCapacity) / 100 otherwise ]
userLimit = 100 [= configuredUserLimit ]
userLimitFactor = 1.0 [= configuredUserLimitFactor ]
maxApplications = 9000 [= configuredMaximumSystemApplicationsPerQueue or (int)(configuredMaximumSystemApplications * absoluteCapacity)]
maxApplicationsPerUser = 9000 [= (int)(maxApplications * (userLimit / 100.0f) * userLimitFactor) ]
usedCapacity = 0.0 [= usedResourcesMemory / (clusterResourceMemory * absoluteCapacity)]
absoluteUsedCapacity = 0.0 [= usedResourcesMemory / clusterResourceMemory]
maxAMResourcePerQueuePercent = 0.9 [= configuredMaximumAMResourcePercent ]
minimumAllocationFactor = 1.0 [= (float)(maximumAllocationMemory - minimumAllocationMemory) / maximumAllocationMemory ]
maximumAllocation = <memory:32768, vCores:32, GPUs:8, GPUAttribute:0> [= configuredMaxAllocation ]
numContainers = 0 [= currentNumContainers ]
state = RUNNING [= configuredState ]
acls = ADMINISTER_QUEUE:*SUBMIT_APP:* [= configuredAcls ]
nodeLocalityDelay = 40
labels=persistent,
nodeLocalityDelay = 40
reservationsContinueLooking = true
preemptionDisabled = true

17/12/24 12:07:32 INFO capacity.CapacityScheduler: Initialized queue: default: capacity=0.9, absoluteCapacity=0.9, usedResources=<memory:0, vCores:0, GPUs:0, GPUAttribute:0>, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=0, numContainers=0
17/12/24 12:07:32 INFO capacity.CapacitySchedulerConfiguration: max alloc mb per queue for root.dev is undefined
17/12/24 12:07:32 INFO capacity.CapacitySchedulerConfiguration: max alloc vcore per queue for root.dev is undefined
17/12/24 12:07:32 INFO capacity.CapacitySchedulerConfiguration: max alloc GPU per queue for root.dev is undefined
17/12/24 12:07:32 INFO capacity.LeafQueue: Initializing dev
capacity = 0.1 [= (float) configuredCapacity / 100 ]
asboluteCapacity = 0.1 [= parentAbsoluteCapacity * capacity ]
maxCapacity = 1.0 [= configuredMaxCapacity ]
absoluteMaxCapacity = 1.0 [= 1.0 maximumCapacity undefined, (parentAbsoluteMaxCapacity * maximumCapacity) / 100 otherwise ]
userLimit = 100 [= configuredUserLimit ]
userLimitFactor = 1.0 [= configuredUserLimitFactor ]
maxApplications = 1000 [= configuredMaximumSystemApplicationsPerQueue or (int)(configuredMaximumSystemApplications * absoluteCapacity)]
maxApplicationsPerUser = 1000 [= (int)(maxApplications * (userLimit / 100.0f) * userLimitFactor) ]
usedCapacity = 0.0 [= usedResourcesMemory / (clusterResourceMemory * absoluteCapacity)]
absoluteUsedCapacity = 0.0 [= usedResourcesMemory / clusterResourceMemory]
maxAMResourcePerQueuePercent = 1.0 [= configuredMaximumAMResourcePercent ]
minimumAllocationFactor = 1.0 [= (float)(maximumAllocationMemory - minimumAllocationMemory) / maximumAllocationMemory ]
maximumAllocation = <memory:32768, vCores:32, GPUs:8, GPUAttribute:0> [= configuredMaxAllocation ]
numContainers = 0 [= currentNumContainers ]
state = RUNNING [= configuredState ]
acls = ADMINISTER_QUEUE: SUBMIT_APP:  [= configuredAcls ]
nodeLocalityDelay = 40
labels=persistent,
nodeLocalityDelay = 40
reservationsContinueLooking = true
preemptionDisabled = true

17/12/24 12:07:32 INFO capacity.CapacityScheduler: Initialized queue: dev: capacity=0.1, absoluteCapacity=0.1, usedResources=<memory:0, vCores:0, GPUs:0, GPUAttribute:0>, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=0, numContainers=0
17/12/24 12:07:32 INFO capacity.CapacityScheduler: Initialized queue: root: numChildQueue= 2, capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0, vCores:0, GPUs:0, GPUAttribute:0>usedCapacity=0.0, numApps=0, numContainers=0
17/12/24 12:07:32 INFO capacity.CapacityScheduler: Initialized root queue root: numChildQueue= 2, capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0, vCores:0, GPUs:0, GPUAttribute:0>usedCapacity=0.0, numApps=0, numContainers=0
17/12/24 12:07:32 INFO capacity.CapacityScheduler: Initialized queue mappings, override: false
17/12/24 12:07:32 INFO capacity.CapacityScheduler: Initialized CapacityScheduler with calculator=class org.apache.hadoop.yarn.util.resource.DominantResourceCalculator, minimumAllocation=<<memory:1024, vCores:1, GPUs:0, GPUAttribute:0>>, maximumAllocation=<<memory:32768, vCores:32, GPUs:8, GPUAttribute:0>>, asynchronousScheduling=false, asyncScheduleInterval=5ms
17/12/24 12:07:32 INFO metrics.SystemMetricsPublisher: YARN system metrics publishing service is not enabled
17/12/24 12:07:32 INFO resourcemanager.ResourceManager: Transitioning to active state
17/12/24 12:07:32 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
17/12/24 12:07:32 INFO zookeeper.ZooKeeper: Client environment:host.name=master
17/12/24 12:07:32 INFO zookeeper.ZooKeeper: Client environment:java.version=1.8.0_151
17/12/24 12:07:32 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
17/12/24 12:07:32 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/java-8-openjdk-amd64/jre
17/12/24 12:07:32 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/usr/local/hadoop/etc/hadoop/:/usr/local/hadoop/etc/hadoop/:/usr/local/hadoop/etc/hadoop/:/usr/local/hadoop/share/hadoop/common/lib/apacheds-kerberos-codec-2.0.0-M15.jar:/usr/local/hadoop/share/hadoop/common/lib/java-xmlbuilder-0.4.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-configuration-1.6.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-net-3.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jersey-server-1.9.jar:/usr/local/hadoop/share/hadoop/common/lib/jsr305-3.0.0.jar:/usr/local/hadoop/share/hadoop/common/lib/xz-1.0.jar:/usr/local/hadoop/share/hadoop/common/lib/guava-11.0.2.jar:/usr/local/hadoop/share/hadoop/common/lib/hamcrest-core-1.3.jar:/usr/local/hadoop/share/hadoop/common/lib/jetty-6.1.26.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-jaxrs-1.9.13.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-logging-1.1.3.jar:/usr/local/hadoop/share/hadoop/common/lib/jetty-util-6.1.26.jar:/usr/local/hadoop/share/hadoop/common/lib/xmlenc-0.52.jar:/usr/local/hadoop/share/hadoop/common/lib/api-asn1-api-1.0.0-M20.jar:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar:/usr/local/hadoop/share/hadoop/common/lib/hadoop-annotations-2.7.2.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-beanutils-core-1.8.0.jar:/usr/local/hadoop/share/hadoop/common/lib/curator-recipes-2.7.1.jar:/usr/local/hadoop/share/hadoop/common/lib/stax-api-1.0-2.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-cli-1.2.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-httpclient-3.1.jar:/usr/local/hadoop/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop/share/hadoop/common/lib/curator-framework-2.7.1.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-digester-1.8.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-codec-1.4.jar:/usr/local/hadoop/share/hadoop/common/lib/jsch-0.1.42.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-beanutils-1.7.0.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-compress-1.4.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jets3t-0.9.0.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/common/lib/jersey-core-1.9.jar:/usr/local/hadoop/share/hadoop/common/lib/servlet-api-2.5.jar:/usr/local/hadoop/share/hadoop/common/lib/hadoop-auth-2.7.2.jar:/usr/local/hadoop/share/hadoop/common/lib/apacheds-i18n-2.0.0-M15.jar:/usr/local/hadoop/share/hadoop/common/lib/jaxb-api-2.2.2.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-math3-3.1.1.jar:/usr/local/hadoop/share/hadoop/common/lib/httpcore-4.2.5.jar:/usr/local/hadoop/share/hadoop/common/lib/jettison-1.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-xc-1.9.13.jar:/usr/local/hadoop/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/usr/local/hadoop/share/hadoop/common/lib/jersey-json-1.9.jar:/usr/local/hadoop/share/hadoop/common/lib/netty-3.6.2.Final.jar:/usr/local/hadoop/share/hadoop/common/lib/paranamer-2.3.jar:/usr/local/hadoop/share/hadoop/common/lib/activation-1.1.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-lang-2.6.jar:/usr/local/hadoop/share/hadoop/common/lib/mockito-all-1.8.5.jar:/usr/local/hadoop/share/hadoop/common/lib/gson-2.2.4.jar:/usr/local/hadoop/share/hadoop/common/lib/zookeeper-3.4.6.jar:/usr/local/hadoop/share/hadoop/common/lib/avro-1.7.4.jar:/usr/local/hadoop/share/hadoop/common/lib/httpclient-4.2.5.jar:/usr/local/hadoop/share/hadoop/common/lib/slf4j-api-1.7.10.jar:/usr/local/hadoop/share/hadoop/common/lib/junit-4.11.jar:/usr/local/hadoop/share/hadoop/common/lib/log4j-1.2.17.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-collections-3.2.2.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/common/lib/curator-client-2.7.1.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-io-2.4.jar:/usr/local/hadoop/share/hadoop/common/lib/asm-3.2.jar:/usr/local/hadoop/share/hadoop/common/lib/jsp-api-2.1.jar:/usr/local/hadoop/share/hadoop/common/lib/api-util-1.0.0-M20.jar:/usr/local/hadoop/share/hadoop/common/lib/snappy-java-1.0.4.1.jar:/usr/local/hadoop/share/hadoop/common/lib/htrace-core-3.1.0-incubating.jar:/usr/local/hadoop/share/hadoop/common/hadoop-common-2.7.2.jar:/usr/local/hadoop/share/hadoop/common/hadoop-common-2.7.2-tests.jar:/usr/local/hadoop/share/hadoop/common/hadoop-nfs-2.7.2.jar:/usr/local/hadoop/share/hadoop/hdfs:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/netty-all-4.0.23.Final.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jersey-server-1.9.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jsr305-3.0.0.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/guava-11.0.2.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/leveldbjni-all-1.8.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jetty-6.1.26.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-logging-1.1.3.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jetty-util-6.1.26.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/xmlenc-0.52.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-cli-1.2.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-codec-1.4.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jersey-core-1.9.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/servlet-api-2.5.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/xercesImpl-2.9.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/netty-3.6.2.Final.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-lang-2.6.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/xml-apis-1.3.04.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/log4j-1.2.17.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-io-2.4.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/asm-3.2.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/htrace-core-3.1.0-incubating.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-2.7.2-tests.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-2.7.2.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-nfs-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-server-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jsr305-3.0.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/xz-1.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/guava-11.0.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/leveldbjni-all-1.8.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jetty-6.1.26.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jackson-jaxrs-1.9.13.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-logging-1.1.3.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jetty-util-6.1.26.jar:/usr/local/hadoop/share/hadoop/yarn/lib/stax-api-1.0-2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-cli-1.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-codec-1.4.jar:/usr/local/hadoop/share/hadoop/yarn/lib/zookeeper-3.4.6-tests.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-guice-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-compress-1.4.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/yarn/lib/javax.inject-1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-core-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/servlet-api-2.5.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jaxb-api-2.2.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/guice-servlet-3.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jettison-1.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jackson-xc-1.9.13.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jaxb-impl-2.2.3-1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-json-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/netty-3.6.2.Final.jar:/usr/local/hadoop/share/hadoop/yarn/lib/activation-1.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-lang-2.6.jar:/usr/local/hadoop/share/hadoop/yarn/lib/guice-3.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/zookeeper-3.4.6.jar:/usr/local/hadoop/share/hadoop/yarn/lib/aopalliance-1.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/log4j-1.2.17.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-collections-3.2.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-client-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-io-2.4.jar:/usr/local/hadoop/share/hadoop/yarn/lib/asm-3.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-tests-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-applicationhistoryservice-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-sharedcachemanager-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-client-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-api-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-web-proxy-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-registry-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-common-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-common-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-nodemanager-2.7.2.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/jersey-server-1.9.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/xz-1.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/leveldbjni-all-1.8.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/hamcrest-core-1.3.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/hadoop-annotations-2.7.2.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/jersey-guice-1.9.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/commons-compress-1.4.1.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/javax.inject-1.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/jersey-core-1.9.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/guice-servlet-3.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/netty-3.6.2.Final.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/paranamer-2.3.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/guice-3.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/avro-1.7.4.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/aopalliance-1.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/junit-4.11.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/log4j-1.2.17.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/commons-io-2.4.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/asm-3.2.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-plugins-2.7.2.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.2.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-app-2.7.2.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.7.2.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.7.2.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-2.7.2.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.2-tests.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-2.7.2.jar:/usr/local/hadoop/contrib/capacity-scheduler/*.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-tests-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-applicationhistoryservice-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-sharedcachemanager-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-client-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-api-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-web-proxy-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-registry-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-common-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-common-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-nodemanager-2.7.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-server-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jsr305-3.0.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/xz-1.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/guava-11.0.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/leveldbjni-all-1.8.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jetty-6.1.26.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jackson-jaxrs-1.9.13.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-logging-1.1.3.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jetty-util-6.1.26.jar:/usr/local/hadoop/share/hadoop/yarn/lib/stax-api-1.0-2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-cli-1.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-codec-1.4.jar:/usr/local/hadoop/share/hadoop/yarn/lib/zookeeper-3.4.6-tests.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-guice-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-compress-1.4.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/yarn/lib/javax.inject-1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-core-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/servlet-api-2.5.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jaxb-api-2.2.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/guice-servlet-3.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jettison-1.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jackson-xc-1.9.13.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jaxb-impl-2.2.3-1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-json-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/netty-3.6.2.Final.jar:/usr/local/hadoop/share/hadoop/yarn/lib/activation-1.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-lang-2.6.jar:/usr/local/hadoop/share/hadoop/yarn/lib/guice-3.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/zookeeper-3.4.6.jar:/usr/local/hadoop/share/hadoop/yarn/lib/aopalliance-1.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/log4j-1.2.17.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-collections-3.2.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-client-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-io-2.4.jar:/usr/local/hadoop/share/hadoop/yarn/lib/asm-3.2.jar:/usr/local/hadoop/etc/hadoop//rm-config/log4j.properties
17/12/24 12:07:32 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/usr/local/hadoop/lib/native
17/12/24 12:07:32 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
17/12/24 12:07:32 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
17/12/24 12:07:32 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
17/12/24 12:07:32 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
17/12/24 12:07:32 INFO zookeeper.ZooKeeper: Client environment:os.version=4.4.0-104-generic
17/12/24 12:07:32 INFO zookeeper.ZooKeeper: Client environment:user.name=root
17/12/24 12:07:32 INFO zookeeper.ZooKeeper: Client environment:user.home=/root
17/12/24 12:07:32 INFO zookeeper.ZooKeeper: Client environment:user.dir=/
17/12/24 12:07:32 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=xxx.xxx.xxx.xxx:2181 sessionTimeout=10000 watcher=null
17/12/24 12:07:32 INFO zookeeper.ClientCnxn: Opening socket connection to server xxx.xxx.xxx.xxx/xxx.xxx.xxx.xxx:2181. Will not attempt to authenticate using SASL (unknown error)
17/12/24 12:07:32 INFO recovery.ZKRMStateStore: Created new ZK connection
17/12/24 12:07:32 INFO zookeeper.ClientCnxn: Socket connection established to xxx.xxx.xxx.xxx/xxx.xxx.xxx.xxx:2181, initiating session
17/12/24 12:07:32 INFO zookeeper.ClientCnxn: Session establishment complete on server xxx.xxx.xxx.xxx/xxx.xxx.xxx.xxx:2181, sessionid = 0x160886b7bd20001, negotiated timeout = 10000
17/12/24 12:07:32 INFO recovery.ZKRMStateStore: Fencing node /rmstore/ZKRMStateRoot/RM_ZK_FENCING_LOCK doesn't exist to delete
17/12/24 12:07:33 INFO resourcemanager.ResourceManager: Recovery started
17/12/24 12:07:33 INFO recovery.ZKRMStateStore: Watcher event type: None with state:SyncConnected for path:null for Service org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
17/12/24 12:07:33 INFO recovery.ZKRMStateStore: ZKRMStateStore Session connected
17/12/24 12:07:33 INFO recovery.ZKRMStateStore: ZooKeeper sync operation succeeded. path: /rmstore/ZKRMStateRoot
17/12/24 12:07:33 INFO recovery.RMStateStore: Loaded RM state version info 1.2
17/12/24 12:07:33 INFO security.RMDelegationTokenSecretManager: recovering RMDelegationTokenSecretManager.
17/12/24 12:07:33 INFO resourcemanager.RMAppManager: Recovering 0 applications
17/12/24 12:07:33 INFO resourcemanager.ResourceManager: Recovery ended
17/12/24 12:07:33 INFO security.RMContainerTokenSecretManager: Rolling master-key for container-tokens
17/12/24 12:07:33 INFO security.NMTokenSecretManagerInRM: Rolling master-key for nm-tokens
17/12/24 12:07:33 INFO delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens
17/12/24 12:07:33 INFO security.RMDelegationTokenSecretManager: storing master key with keyID 3
17/12/24 12:07:33 INFO recovery.RMStateStore: Storing RMDTMasterKey.
17/12/24 12:07:33 INFO delegation.AbstractDelegationTokenSecretManager: Starting expired delegation token remover thread, tokenRemoverScanInterval=60 min(s)
17/12/24 12:07:33 INFO delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens
17/12/24 12:07:33 INFO security.RMDelegationTokenSecretManager: storing master key with keyID 4
17/12/24 12:07:33 INFO recovery.RMStateStore: Storing RMDTMasterKey.
17/12/24 12:07:33 WARN hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /yarn/node-labels/nodelabel.mirror.writing could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and no node(s) are excluded in this operation.
	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1552)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:724)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)

	at org.apache.hadoop.ipc.Client.call(Client.java:1475)
	at org.apache.hadoop.ipc.Client.call(Client.java:1412)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
	at com.sun.proxy.$Proxy12.addBlock(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
	at com.sun.proxy.$Proxy13.addBlock(Unknown Source)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1459)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1255)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
17/12/24 12:07:33 INFO service.AbstractService: Service org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in state STARTED; cause: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /yarn/node-labels/nodelabel.mirror.writing could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and no node(s) are excluded in this operation.
	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1552)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:724)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)

org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /yarn/node-labels/nodelabel.mirror.writing could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and no node(s) are excluded in this operation.
	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1552)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:724)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)

	at org.apache.hadoop.ipc.Client.call(Client.java:1475)
	at org.apache.hadoop.ipc.Client.call(Client.java:1412)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
	at com.sun.proxy.$Proxy12.addBlock(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
	at com.sun.proxy.$Proxy13.addBlock(Unknown Source)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1459)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1255)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
17/12/24 12:07:33 WARN service.AbstractService: When stopping the service org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager : java.lang.NullPointerException
java.lang.NullPointerException
	at org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.stopDispatcher(CommonNodeLabelsManager.java:251)
	at org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStop(CommonNodeLabelsManager.java:257)
	at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
	at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
	at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:203)
	at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:579)
	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964)
	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005)
	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001)
	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1041)
	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1178)
17/12/24 12:07:33 INFO service.AbstractService: Service RMActiveServices failed in state STARTED; cause: org.apache.hadoop.service.ServiceStateException: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /yarn/node-labels/nodelabel.mirror.writing could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and no node(s) are excluded in this operation.
	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1552)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:724)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)

org.apache.hadoop.service.ServiceStateException: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /yarn/node-labels/nodelabel.mirror.writing could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and no node(s) are excluded in this operation.
	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1552)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:724)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)

	at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204)
	at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:579)
	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964)
	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005)
	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001)
	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1041)
	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1178)
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /yarn/node-labels/nodelabel.mirror.writing could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and no node(s) are excluded in this operation.
	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1552)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:724)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)

	at org.apache.hadoop.ipc.Client.call(Client.java:1475)
	at org.apache.hadoop.ipc.Client.call(Client.java:1412)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
	at com.sun.proxy.$Proxy12.addBlock(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
	at com.sun.proxy.$Proxy13.addBlock(Unknown Source)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1459)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1255)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
17/12/24 12:07:33 INFO util.AbstractLivelinessMonitor: AMLivelinessMonitor thread interrupted
17/12/24 12:07:33 INFO util.AbstractLivelinessMonitor: AMLivelinessMonitor thread interrupted
17/12/24 12:07:33 INFO util.AbstractLivelinessMonitor: org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.ContainerAllocationExpirer thread interrupted
17/12/24 12:07:33 ERROR delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted
17/12/24 12:07:33 INFO impl.MetricsSystemImpl: Stopping ResourceManager metrics system...
17/12/24 12:07:33 INFO impl.MetricsSystemImpl: ResourceManager metrics system stopped.
17/12/24 12:07:33 INFO impl.MetricsSystemImpl: ResourceManager metrics system shutdown complete.
17/12/24 12:07:33 INFO event.AsyncDispatcher: AsyncDispatcher is draining to stop, igonring any new events.
17/12/24 12:07:33 INFO zookeeper.ZooKeeper: Session: 0x160886b7bd20001 closed
17/12/24 12:07:33 INFO zookeeper.ClientCnxn: EventThread shut down
17/12/24 12:07:33 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.RMFatalEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMFatalEventDispatcher
17/12/24 12:07:33 INFO security.NMTokenSecretManagerInRM: NMTokenKeyRollingInterval: 86400000ms and NMTokenKeyActivationDelay: 900000ms
17/12/24 12:07:33 INFO security.RMContainerTokenSecretManager: ContainerTokenKeyRollingInterval: 86400000ms and ContainerTokenKeyActivationDelay: 900000ms
17/12/24 12:07:33 INFO security.AMRMTokenSecretManager: AMRMTokenKeyRollingInterval: 86400000ms and AMRMTokenKeyActivationDelay: 900000 ms
17/12/24 12:07:33 INFO recovery.RMStateStoreFactory: Using RMStateStore implementation - class org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore
17/12/24 12:07:33 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStoreEventType for class org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler
17/12/24 12:07:33 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.NodesListManagerEventType for class org.apache.hadoop.yarn.server.resourcemanager.NodesListManager
17/12/24 12:07:33 INFO resourcemanager.ResourceManager: Using Scheduler: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
17/12/24 12:07:33 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.SchedulerEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher
17/12/24 12:07:33 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher
17/12/24 12:07:33 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher
17/12/24 12:07:33 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeEventType for class org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$NodeEventDispatcher
17/12/24 12:07:33 INFO impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
17/12/24 12:07:33 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
17/12/24 12:07:33 INFO impl.MetricsSystemImpl: ResourceManager metrics system started
17/12/24 12:07:33 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.RMAppManagerEventType for class org.apache.hadoop.yarn.server.resourcemanager.RMAppManager
17/12/24 12:07:33 INFO event.AsyncDispatcher: Registering class org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncherEventType for class org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher
17/12/24 12:07:33 WARN util.MBeans: Failed to register MBean "Hadoop:service=ResourceManager,name=RMNMInfo": Instance already exists.
17/12/24 12:07:33 INFO resourcemanager.RMNMInfo: Registered RMNMInfo MBean
17/12/24 12:07:33 INFO util.HostsFileReader: Refreshing hosts (include/exclude) list
17/12/24 12:07:33 INFO conf.Configuration: found resource capacity-scheduler.xml at file:/usr/local/hadoop-2.7.2/etc/hadoop/capacity-scheduler.xml
17/12/24 12:07:33 INFO capacity.CapacitySchedulerConfiguration: max alloc mb per queue for root is undefined
17/12/24 12:07:33 INFO capacity.CapacitySchedulerConfiguration: max alloc vcore per queue for root is undefined
17/12/24 12:07:33 INFO capacity.CapacitySchedulerConfiguration: max alloc GPU per queue for root is undefined
17/12/24 12:07:33 INFO capacity.ParentQueue: root, capacity=1.0, asboluteCapacity=1.0, maxCapacity=1.0, asboluteMaxCapacity=1.0, state=RUNNING, acls=ADMINISTER_QUEUE:*SUBMIT_APP:*, labels=*,
, reservationsContinueLooking=true
17/12/24 12:07:33 INFO capacity.ParentQueue: Initialized parent-queue root name=root, fullname=root
17/12/24 12:07:33 INFO capacity.CapacitySchedulerConfiguration: max alloc mb per queue for root.default is undefined
17/12/24 12:07:33 INFO capacity.CapacitySchedulerConfiguration: max alloc vcore per queue for root.default is undefined
17/12/24 12:07:33 INFO capacity.CapacitySchedulerConfiguration: max alloc GPU per queue for root.default is undefined
17/12/24 12:07:33 INFO capacity.LeafQueue: Initializing default
capacity = 0.9 [= (float) configuredCapacity / 100 ]
asboluteCapacity = 0.9 [= parentAbsoluteCapacity * capacity ]
maxCapacity = 1.0 [= configuredMaxCapacity ]
absoluteMaxCapacity = 1.0 [= 1.0 maximumCapacity undefined, (parentAbsoluteMaxCapacity * maximumCapacity) / 100 otherwise ]
userLimit = 100 [= configuredUserLimit ]
userLimitFactor = 1.0 [= configuredUserLimitFactor ]
maxApplications = 9000 [= configuredMaximumSystemApplicationsPerQueue or (int)(configuredMaximumSystemApplications * absoluteCapacity)]
maxApplicationsPerUser = 9000 [= (int)(maxApplications * (userLimit / 100.0f) * userLimitFactor) ]
usedCapacity = 0.0 [= usedResourcesMemory / (clusterResourceMemory * absoluteCapacity)]
absoluteUsedCapacity = 0.0 [= usedResourcesMemory / clusterResourceMemory]
maxAMResourcePerQueuePercent = 0.9 [= configuredMaximumAMResourcePercent ]
minimumAllocationFactor = 1.0 [= (float)(maximumAllocationMemory - minimumAllocationMemory) / maximumAllocationMemory ]
maximumAllocation = <memory:32768, vCores:32, GPUs:8, GPUAttribute:0> [= configuredMaxAllocation ]
numContainers = 0 [= currentNumContainers ]
state = RUNNING [= configuredState ]
acls = ADMINISTER_QUEUE:*SUBMIT_APP:* [= configuredAcls ]
nodeLocalityDelay = 40
labels=persistent,
nodeLocalityDelay = 40
reservationsContinueLooking = true
preemptionDisabled = true

17/12/24 12:07:33 INFO capacity.CapacityScheduler: Initialized queue: default: capacity=0.9, absoluteCapacity=0.9, usedResources=<memory:0, vCores:0, GPUs:0, GPUAttribute:0>, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=0, numContainers=0
17/12/24 12:07:33 INFO capacity.CapacitySchedulerConfiguration: max alloc mb per queue for root.dev is undefined
17/12/24 12:07:33 INFO capacity.CapacitySchedulerConfiguration: max alloc vcore per queue for root.dev is undefined
17/12/24 12:07:33 INFO capacity.CapacitySchedulerConfiguration: max alloc GPU per queue for root.dev is undefined
17/12/24 12:07:33 INFO capacity.LeafQueue: Initializing dev
capacity = 0.1 [= (float) configuredCapacity / 100 ]
asboluteCapacity = 0.1 [= parentAbsoluteCapacity * capacity ]
maxCapacity = 1.0 [= configuredMaxCapacity ]
absoluteMaxCapacity = 1.0 [= 1.0 maximumCapacity undefined, (parentAbsoluteMaxCapacity * maximumCapacity) / 100 otherwise ]
userLimit = 100 [= configuredUserLimit ]
userLimitFactor = 1.0 [= configuredUserLimitFactor ]
maxApplications = 1000 [= configuredMaximumSystemApplicationsPerQueue or (int)(configuredMaximumSystemApplications * absoluteCapacity)]
maxApplicationsPerUser = 1000 [= (int)(maxApplications * (userLimit / 100.0f) * userLimitFactor) ]
usedCapacity = 0.0 [= usedResourcesMemory / (clusterResourceMemory * absoluteCapacity)]
absoluteUsedCapacity = 0.0 [= usedResourcesMemory / clusterResourceMemory]
maxAMResourcePerQueuePercent = 1.0 [= configuredMaximumAMResourcePercent ]
minimumAllocationFactor = 1.0 [= (float)(maximumAllocationMemory - minimumAllocationMemory) / maximumAllocationMemory ]
maximumAllocation = <memory:32768, vCores:32, GPUs:8, GPUAttribute:0> [= configuredMaxAllocation ]
numContainers = 0 [= currentNumContainers ]
state = RUNNING [= configuredState ]
acls = ADMINISTER_QUEUE: SUBMIT_APP:  [= configuredAcls ]
nodeLocalityDelay = 40
labels=persistent,
nodeLocalityDelay = 40
reservationsContinueLooking = true
preemptionDisabled = true

17/12/24 12:07:33 INFO capacity.CapacityScheduler: Initialized queue: dev: capacity=0.1, absoluteCapacity=0.1, usedResources=<memory:0, vCores:0, GPUs:0, GPUAttribute:0>, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=0, numContainers=0
17/12/24 12:07:33 INFO capacity.CapacityScheduler: Initialized queue: root: numChildQueue= 2, capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0, vCores:0, GPUs:0, GPUAttribute:0>usedCapacity=0.0, numApps=0, numContainers=0
17/12/24 12:07:33 INFO capacity.CapacityScheduler: Initialized root queue root: numChildQueue= 2, capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0, vCores:0, GPUs:0, GPUAttribute:0>usedCapacity=0.0, numApps=0, numContainers=0
17/12/24 12:07:33 INFO capacity.CapacityScheduler: Initialized queue mappings, override: false
17/12/24 12:07:33 INFO capacity.CapacityScheduler: Initialized CapacityScheduler with calculator=class org.apache.hadoop.yarn.util.resource.DominantResourceCalculator, minimumAllocation=<<memory:1024, vCores:1, GPUs:0, GPUAttribute:0>>, maximumAllocation=<<memory:32768, vCores:32, GPUs:8, GPUAttribute:0>>, asynchronousScheduling=false, asyncScheduleInterval=5ms
17/12/24 12:07:33 INFO service.AbstractService: Service ResourceManager failed in state STARTED; cause: org.apache.hadoop.service.ServiceStateException: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /yarn/node-labels/nodelabel.mirror.writing could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and no node(s) are excluded in this operation.
	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1552)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:724)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)

org.apache.hadoop.service.ServiceStateException: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /yarn/node-labels/nodelabel.mirror.writing could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and no node(s) are excluded in this operation.
	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1552)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:724)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)

	at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204)
	at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:579)
	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964)
	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005)
	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001)
	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1041)
	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1178)
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /yarn/node-labels/nodelabel.mirror.writing could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and no node(s) are excluded in this operation.
	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1552)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:724)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)

	at org.apache.hadoop.ipc.Client.call(Client.java:1475)
	at org.apache.hadoop.ipc.Client.call(Client.java:1412)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
	at com.sun.proxy.$Proxy12.addBlock(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
	at com.sun.proxy.$Proxy13.addBlock(Unknown Source)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1459)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1255)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
17/12/24 12:07:33 INFO resourcemanager.ResourceManager: Transitioning to standby state
17/12/24 12:07:33 INFO resourcemanager.ResourceManager: Transitioned to standby state
17/12/24 12:07:33 FATAL resourcemanager.ResourceManager: Error starting ResourceManager
org.apache.hadoop.service.ServiceStateException: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /yarn/node-labels/nodelabel.mirror.writing could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and no node(s) are excluded in this operation.
	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1552)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:724)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)

	at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204)
	at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:579)
	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964)
	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005)
	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001)
	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1041)
	at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1178)
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /yarn/node-labels/nodelabel.mirror.writing could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and no node(s) are excluded in this operation.
	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1552)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:724)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)

	at org.apache.hadoop.ipc.Client.call(Client.java:1475)
	at org.apache.hadoop.ipc.Client.call(Client.java:1412)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
	at com.sun.proxy.$Proxy12.addBlock(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
	at com.sun.proxy.$Proxy13.addBlock(Unknown Source)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1459)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1255)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
17/12/24 12:07:33 ERROR hdfs.DFSClient: Failed to close inode 16395
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /yarn/node-labels/nodelabel.mirror.writing could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and no node(s) are excluded in this operation.
	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1552)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:724)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)

	at org.apache.hadoop.ipc.Client.call(Client.java:1475)
	at org.apache.hadoop.ipc.Client.call(Client.java:1412)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
	at com.sun.proxy.$Proxy12.addBlock(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
	at com.sun.proxy.$Proxy13.addBlock(Unknown Source)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1459)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1255)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
17/12/24 12:07:33 INFO resourcemanager.ResourceManager: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down ResourceManager at master/xxx.xxx.xxx.xxx
************************************************************/

Dns service is required.

Not only hadoop need dns to process the node's hostname in Yarn and HDFS, but also kubernetes node name should be resolved by dns. Currently we hard code the IP address in the hostname ( In both kubernetes and hadoop). It's too tricky. I think in the future the dns service must be included in our system.

ETCD configuration problem

 # etcdctl cluster-health
cluster may be unhealthy: failed to list members
Error:  client: etcd cluster is unavailable or misconfigured
error #0: dial tcp 127.0.0.1:2379: getsockopt: connection refused
error #1: dial tcp 127.0.0.1:4001: getsockopt: connection refused

pai-fs cp bug

If files already exist on hdfs, pai-fs -cp will append those files to the same file instead of overriding it.

Job cannot be scheduled after 90% GPU was used.

The PAI cluster config two queue: default, dev
default queue capacity is 0.9 while dev queue capacity is 0.1.
we need config the "maximum-capacity" to allow default queue to use dev queue resource when dev queue is free.

Also, we need enable the preemption feature to allow dev queue abort default queue's extra job.

redundant configuration will make user confused when they configure their cluster-configuration

https://github.com/Microsoft/pai/blob/4fe556bf6714230dde79e2eaa819226e34f58edd/service-deployment/clusterconfig-example.yaml#L104

I don't think each module need configure this IP address. Actually we could get this information through the code following.

{%- for host in cluster_config if 'prometheus' in cluster_config[ host ] -%}
    {{cluster_config[ host ]['ip']}}{% if not loop.last %},{% endif %}
{%- endfor -%}

And many place should remove the redundant configuration too.

paramiko throw exception when ssh option "password" is composed with letters

paramiko log

DEB [20180119-22:25:31.520] thr=2   paramiko.transport: Adding ssh-ed25519 host key for xxx
DEB [20180119-22:25:31.548] thr=5   paramiko.transport: userauth is OK
ERR [20180119-22:25:31.549] thr=5   paramiko.transport: Unknown exception: object of type 'int' has no len()
ERR [20180119-22:25:31.551] thr=5   paramiko.transport: Traceback (most recent call last):
ERR [20180119-22:25:31.551] thr=5   paramiko.transport:   File "/usr/local/lib/python2.7/dist-packages/paramiko/transport.py", line 1908, in run
ERR [20180119-22:25:31.551] thr=5   paramiko.transport:     handler(self.auth_handler, m)
ERR [20180119-22:25:31.551] thr=5   paramiko.transport:   File "/usr/local/lib/python2.7/dist-packages/paramiko/auth_handler.py", line 260, in _parse_service_accept
ERR [20180119-22:25:31.551] thr=5   paramiko.transport:     m.add_string(password)
ERR [20180119-22:25:31.551] thr=5   paramiko.transport:   File "/usr/local/lib/python2.7/dist-packages/paramiko/message.py", line 274, in add_string
ERR [20180119-22:25:31.552] thr=5   paramiko.transport:     self.add_int(len(s))
ERR [20180119-22:25:31.552] thr=5   paramiko.transport: TypeError: object of type 'int' has no len()
ERR [20180119-22:25:31.552] thr=5   paramiko.transport: 

bootstrap.py output

daemonset "kube-proxy" created
src/
src/start.sh
src/cleanup.sh
src/kubelet.sh
Traceback (most recent call last):
  File "./bootstrap.py", line 356, in <module>
    main()
  File "./bootstrap.py", line 342, in main
    remoteBootstrap(cluster_config['clusterinfo'], machine_list[hostname])
  File "./bootstrap.py", line 202, in remoteBootstrap
    sftp_paramiko(src_local, dst_remote, srcipt_package, host_config)
  File "./bootstrap.py", line 104, in sftp_paramiko
    ssh.connect(hostname=hostip, port=port, username=username, password=password)
  File "/usr/lib/python2.7/dist-packages/paramiko/client.py", line 367, in connect
    look_for_keys, gss_auth, gss_kex, gss_deleg_creds, gss_host)
  File "/usr/lib/python2.7/dist-packages/paramiko/client.py", line 571, in _auth
    self._transport.auth_password(username, password)
  File "/usr/lib/python2.7/dist-packages/paramiko/transport.py", line 1262, in auth_password
    return self.auth_handler.wait_for_response(my_event)
  File "/usr/lib/python2.7/dist-packages/paramiko/auth_handler.py", line 197, in wait_for_response
    raise e
Exception: Unknown   ##type

Solution: add quotation marks in the field

Issue in bootstrap

https://github.com/Microsoft/pai/blob/cd432f909a1b3d664ab14c62222287f057965839/kubernetes-deployment/bootstrap.py#L99


Traceback (most recent call last):
  File "./bootstrap.py", line 404, in <module>
    main()
  File "./bootstrap.py", line 369, in main
    remoteCleanUp(cluster_config['clusterinfo'], machine_list[hostname])
  File "./bootstrap.py", line 217, in remoteCleanUp
    sftp_paramiko(src_local, dst_remote, srcipt, host_config)
  File "./bootstrap.py", line 99, in sftp_paramiko
    if (host_config['sshport']):
KeyError: 'sshport'

if 'sshport' in host_config

Docker Auth Credentials Issue

  • If we use auth credentials for docker registry, it exists on headnode only. Worker nodes also need to pull docker image from registry with auth credentials when running jobs if the image is used for the first time.
  • Make the auth credentials optional if users use a registry without auth.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.