About Cacher
Web App
Download
Sign In
Sign Up
menu
Cacher is the code snippet organizer for pro developers
We empower you and your team to get more done, faster
Learn More
bugcy013
9/27/2014 - 9:49 PM
share
Share
add_circle_outline
Save
FT.md
FT.md
content_copy
file_download
Rendered
Source
Fault Tolerance in MRv2 over Hadoop/YARN
What is/Why YARN?
Generic Resource Manager
http://www.slideshare.net/ozax86/ntt-meets-hadoop-and-cloudera-world-tokyo#17
Architecture Overview
ResourceManager/NodeManager
ApplicationMaster(Master per Job)
http://www.slideshare.net/ozax86/ntt-meets-hadoop-and-cloudera-world-tokyo#21
Application-related failures in MapReduce
ApplicationMaster Failure
initAndStartAppMaster() -> serviceStart() -> processRecovery
JobHistoryServer writes completed events into FileSystem(e.g. HDFS, S3)
Generic JobHistoryServer
https://issues.apache.org/jira/browse/YARN-321
MapTask/ReduceTask failures
MapReduce-Style Fault recovery
MRAppMaster handles all faults of Tasks and recover them
http://www.slideshare.net/ozax86/ntt-meets-hadoop-and-cloudera-world-tokyo#25
MRAppMaster.java
Fault Tolerance in YARN
YARN-related failures
NodeManager Failure
NodeStatusUpdater reports NodeManager health via heartbeat
Faults are deteced by ReousrceManager(heartbeat)
ResourceTrackerService#nodeHeartBeat
ResourceManager Failure
What's happen?
http://www.slideshare.net/ozax86/ntt-meets-hadoop-and-cloudera-world-tokyo#25
Overview
http://www.slideshare.net/ozax86/ntt-meets-hadoop-and-cloudera-world-tokyo#26
Configuration
https://gist.github.com/oza/7055279
Operation
yarn rmadmin -transitionToActive/
ZKFC
clear