carlessanagustin
12/1/2016 - 4:34 PM

HADOOP: Datanode not conecting to Namenode

HADOOP: Datanode not conecting to Namenode

Datanode not conecting to Namenode

Using Ambari installation.

Problem

  • View DataNode log:

tail -f /var/log/hadoop/hdfs/hadoop-hdfs-datanode-MYDATANODE.log

  • The log:
************************************************************/
2016-12-01 16:20:42,033 INFO  datanode.DataNode (LogAdapter.java:info(47)) - registered UNIX signal handlers for [TERM, HUP, INT]
2016-12-01 16:20:42,810 INFO  impl.MetricsConfig (MetricsConfig.java:loadFirst(112)) - loaded properties from hadoop-metrics2.properties
2016-12-01 16:20:42,834 INFO  timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:init(82)) - Initializing Timeline metrics sink.
2016-12-01 16:20:42,834 INFO  timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:init(100)) - Identified hostname = , serviceName = datanode
2016-12-01 16:20:42,835 INFO  timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:init(118)) - Collector Uri: http://:6188/ws/v1/timeline/metrics
2016-12-01 16:20:42,844 INFO  impl.MetricsSinkAdapter (MetricsSinkAdapter.java:start(206)) - Sink timeline started
2016-12-01 16:20:42,899 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:startTimer(376)) - Scheduled snapshot period at 10 second(s).
2016-12-01 16:20:42,899 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:start(192)) - DataNode metrics system started
2016-12-01 16:20:42,904 INFO  datanode.BlockScanner (BlockScanner.java:<init>(172)) - Initialized block scanner with targetBytesPerSec 1048576
2016-12-01 16:20:42,906 INFO  datanode.DataNode (DataNode.java:<init>(437)) - File descriptor passing is enabled.
2016-12-01 16:20:42,906 INFO  datanode.DataNode (DataNode.java:<init>(448)) - Configured hostname is MYDATANODE
2016-12-01 16:20:42,911 INFO  datanode.DataNode (DataNode.java:startDataNode(1211)) - Starting DataNode with maxLockedMemory = 0
2016-12-01 16:20:42,934 INFO  datanode.DataNode (DataNode.java:initDataXceiver(1004)) - Opened streaming server at /0.0.0.0:50010
2016-12-01 16:20:42,936 INFO  datanode.DataNode (DataXceiverServer.java:<init>(78)) - Balancing bandwith is 6250000 bytes/s
2016-12-01 16:20:42,936 INFO  datanode.DataNode (DataXceiverServer.java:<init>(79)) - Number threads for balancing is 5
2016-12-01 16:20:42,939 INFO  datanode.DataNode (DataXceiverServer.java:<init>(78)) - Balancing bandwith is 6250000 bytes/s
2016-12-01 16:20:42,939 INFO  datanode.DataNode (DataXceiverServer.java:<init>(79)) - Number threads for balancing is 5
2016-12-01 16:20:42,939 INFO  datanode.DataNode (DataNode.java:initDataXceiver(1019)) - Listening on UNIX domain socket: /var/lib/hadoop-hdfs/dn_socket
2016-12-01 16:20:43,012 INFO  mortbay.log (Slf4jLog.java:info(67)) - Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2016-12-01 16:20:43,019 INFO  server.AuthenticationFilter (AuthenticationFilter.java:constructSecretProvider(293)) - Unable to initialize FileSignerSecretProvider, falling back to use random secrets.
2016-12-01 16:20:43,025 INFO  http.HttpRequestLog (HttpRequestLog.java:getRequestLog(80)) - Http request log for http.requests.datanode is not defined
2016-12-01 16:20:43,030 INFO  http.HttpServer2 (HttpServer2.java:addGlobalFilter(754)) - Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
2016-12-01 16:20:43,032 INFO  http.HttpServer2 (HttpServer2.java:addFilter(729)) - Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context datanode
2016-12-01 16:20:43,033 INFO  http.HttpServer2 (HttpServer2.java:addFilter(737)) - Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs
2016-12-01 16:20:43,033 INFO  http.HttpServer2 (HttpServer2.java:addFilter(737)) - Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static
2016-12-01 16:20:43,033 INFO  security.HttpCrossOriginFilterInitializer (HttpCrossOriginFilterInitializer.java:initFilter(49)) - CORS filter not enabled. Please set hadoop.http.cross-origin.enabled to 'true' to enable it
2016-12-01 16:20:43,047 INFO  http.HttpServer2 (HttpServer2.java:openListeners(959)) - Jetty bound to port 57212
2016-12-01 16:20:43,047 INFO  mortbay.log (Slf4jLog.java:info(67)) - jetty-6.1.26.hwx
2016-12-01 16:20:43,234 INFO  mortbay.log (Slf4jLog.java:info(67)) - Started HttpServer2$SelectChannelConnectorWithSafeStartup@localhost:57212
2016-12-01 16:20:43,421 INFO  web.DatanodeHttpServer (DatanodeHttpServer.java:start(233)) - Listening HTTP traffic on /0.0.0.0:50075
2016-12-01 16:20:43,423 INFO  util.JvmPauseMonitor (JvmPauseMonitor.java:run(179)) - Starting JVM pause monitor
2016-12-01 16:20:43,573 INFO  datanode.DataNode (DataNode.java:startDataNode(1228)) - dnUserName = hdfs
2016-12-01 16:20:43,574 INFO  datanode.DataNode (DataNode.java:startDataNode(1229)) - supergroup = hdfs
2016-12-01 16:20:43,623 INFO  ipc.CallQueueManager (CallQueueManager.java:<init>(75)) - Using callQueue: class java.util.concurrent.LinkedBlockingQueue scheduler: class org.apache.hadoop.ipc.DefaultRpcScheduler
2016-12-01 16:20:43,643 INFO  ipc.Server (Server.java:run(811)) - Starting Socket Reader #1 for port 8010
2016-12-01 16:20:43,676 INFO  datanode.DataNode (DataNode.java:initIpcServer(917)) - Opened IPC server at /0.0.0.0:8010
2016-12-01 16:20:43,689 INFO  datanode.DataNode (BlockPoolManager.java:refreshNamenodes(152)) - Refresh request received for nameservices: null
2016-12-01 16:20:43,707 INFO  datanode.DataNode (BlockPoolManager.java:doRefreshNamenodes(201)) - Starting BPOfferServices for nameservices: <default>
2016-12-01 16:20:43,718 INFO  datanode.DataNode (BPServiceActor.java:run(733)) - Block pool <registering> (Datanode Uuid unassigned) service to :8020 starting to offer service
2016-12-01 16:20:43,723 INFO  ipc.Server (Server.java:run(1045)) - IPC Server Responder: starting
2016-12-01 16:20:43,724 INFO  ipc.Server (Server.java:run(881)) - IPC Server listener on 8010: starting
2016-12-01 16:20:43,933 INFO  common.Storage (DataStorage.java:getParallelVolumeLoadThreadsNum(384)) - Using 3 threads to upgrade data directories (dfs.datanode.parallel.volumes.load.threads.num=3, dataDirs=3)
2016-12-01 16:20:43,941 INFO  common.Storage (Storage.java:tryLock(774)) - Lock on /grid/0/hadoop/hdfs/data/in_use.lock acquired by nodename 18164@MYDATANODE
2016-12-01 16:20:43,943 WARN  common.Storage (DataStorage.java:loadDataStorage(449)) - Failed to add storage directory [DISK]file:/grid/0/hadoop/hdfs/data/
java.io.IOException: Incompatible clusterIDs in /grid/0/hadoop/hdfs/data: namenode clusterID = CID-6c775e94-4cfb-4366-9a8b-c0f359a2b5d3; datanode clusterID = CID-abfdf47f-98af-4924-a376-33b001206f7b
    at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:799)
    at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadStorageDirectory(DataStorage.java:322)
    at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadDataStorage(DataStorage.java:438)
    at org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:417)
    at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:595)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1483)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1448)
    at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:319)
    at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:267)
    at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:740)
    at java.lang.Thread.run(Thread.java:745)
2016-12-01 16:20:43,946 INFO  common.Storage (Storage.java:tryLock(774)) - Lock on /grid/1/hadoop/hdfs/data/in_use.lock acquired by nodename 18164@MYDATANODE
2016-12-01 16:20:43,946 WARN  common.Storage (DataStorage.java:loadDataStorage(449)) - Failed to add storage directory [DISK]file:/grid/1/hadoop/hdfs/data/
java.io.IOException: Incompatible clusterIDs in /grid/1/hadoop/hdfs/data: namenode clusterID = CID-6c775e94-4cfb-4366-9a8b-c0f359a2b5d3; datanode clusterID = CID-abfdf47f-98af-4924-a376-33b001206f7b
    at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:799)
    at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadStorageDirectory(DataStorage.java:322)
    at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadDataStorage(DataStorage.java:438)
    at org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:417)
    at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:595)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1483)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1448)
    at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:319)
    at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:267)
    at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:740)
    at java.lang.Thread.run(Thread.java:745)
2016-12-01 16:20:43,947 INFO  common.Storage (Storage.java:tryLock(774)) - Lock on /grid/2/hadoop/hdfs/data/in_use.lock acquired by nodename 18164@MYDATANODE
2016-12-01 16:20:43,948 WARN  common.Storage (DataStorage.java:loadDataStorage(449)) - Failed to add storage directory [DISK]file:/grid/2/hadoop/hdfs/data/
java.io.IOException: Incompatible clusterIDs in /grid/2/hadoop/hdfs/data: namenode clusterID = CID-6c775e94-4cfb-4366-9a8b-c0f359a2b5d3; datanode clusterID = CID-abfdf47f-98af-4924-a376-33b001206f7b
    at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:799)
    at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadStorageDirectory(DataStorage.java:322)
    at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadDataStorage(DataStorage.java:438)
    at org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:417)
    at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:595)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1483)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1448)
    at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:319)
    at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:267)
    at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:740)
    at java.lang.Thread.run(Thread.java:745)
2016-12-01 16:20:43,950 ERROR datanode.DataNode (BPServiceActor.java:run(752)) - Initialization failed for Block pool <registering> (Datanode Uuid 2141d0ea-f190-4883-bbee-a22d3adc1652) service to :8020. Exiting.
java.io.IOException: All specified directories are failed to load.
    at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:596)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1483)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1448)
    at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:319)
    at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:267)
    at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:740)
    at java.lang.Thread.run(Thread.java:745)
2016-12-01 16:20:43,950 WARN  datanode.DataNode (BPServiceActor.java:run(776)) - Ending block pool service for: Block pool <registering> (Datanode Uuid 2141d0ea-f190-4883-bbee-a22d3adc1652) service to :8020
2016-12-01 16:20:44,053 INFO  datanode.DataNode (BlockPoolManager.java:remove(103)) - Removed Block pool <registering> (Datanode Uuid 2141d0ea-f190-4883-bbee-a22d3adc1652)
2016-12-01 16:20:46,053 WARN  datanode.DataNode (DataNode.java:secureMain(2637)) - Exiting Datanode
2016-12-01 16:20:46,054 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 0
2016-12-01 16:20:46,056 INFO  datanode.DataNode (LogAdapter.java:info(47)) - SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at MYDATANODE/IP
************************************************************/
  • This first warning informs us of:

WARN common.Storage (DataStorage.java:loadDataStorage(449)) - Failed to add storage directory [DISK]file:/grid/0/hadoop/hdfs/data/ java.io.IOException: Incompatible clusterIDs in /grid/0/hadoop/hdfs/data: namenode clusterID = CID-6c775e94-4cfb-4366-9a8b-c0f359a2b5d3; datanode clusterID = CID-abfdf47f-98af-4924-a376-33b001206f7b at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTrans

  • There is a clusterID missmatch.

Solution

We must check clusterID at the namenode and compate it with the one at each grid folder as shown next.

1. @ namenode

more /hadoop/hdfs/namenode/current/VERSION

copy: clusterID=CID-6c775e94-4cfb-4366-9a8b-c0f359a2b5d3

2. @ datanode

paste: clusterID=CID-6c775e94-4cfb-4366-9a8b-c0f359a2b5d3

vim /grid/0/hadoop/hdfs/data/current/VERSION
vim /grid/1/hadoop/hdfs/data/current/VERSION
vim /grid/2/hadoop/hdfs/data/current/VERSION

3. @ datanode

Restart DataNode service (in my case via Ambari web frontend).