GuoLi
11/16/2017 - 2:17 AM

CentOS7.2安装Cloudera5.7.6

CentOS7.2安装Cloudera5.7.6

Cloudera常用链接

cloudera安装文档 PDF HTML

cloudera管理文档 PDF HTML

一、CentOS7.2 系统设置(所有集群内主机都需要设置)

1. 关闭SELinux

getenforce命令检查SELinux是否已禁用

$ getenforce
Disabled

修改SELinux配置文件

$ sudo vim /etc/selinux/config
SELINUX=disabled

2. 关闭防火墙

$ sudo systemctl stop firewalld
$ sudo systemctl disable firewalld

3. 修改hosts文件和hostname文件

此文件必须群集内所有主机都一致,可以在master主机上配置好,然后scp到其他slave主机

$ sudo vim /etc/hosts
192.168.31.160   master
192.168.31.161   slave1
192.168.31.162   slave2
$ sudo scp /etc/hosts slave1:/etc/hosts
$ sudo scp /etc/hosts slave2:/etc/hosts

# 确保hostname命令的的主机名与hosts中本机的主机名一致
$ sudo vim /etc/hostname
master

$ hostnamectl

4. 设置静态IP

sudo vim /etc/sysconfig/network-scripts/ifcfg-eno
BOOTPROTO="static"
ONBOOT="yes"
IPADDR=192.168.31.160
GATEWAY=192.168.31.1
DNS1=192.168.31.1

5. 设置时间同步

$ sudo yum install -y ntp
$ sudo systemctl enable ntpd
$ sudo systemctl enable ntpdate
$ sudo vim /etc/ntp.conf
server time1.aliyun.com

$ sudo ntpdate time1.aliyun.com
$ timedatectl

6. 安装CDH支持的oracle jdk

卸载系统自带的openjdk

$ rpm -qa | grep --color openjdk
$ sudo yum remove -y java-1.7.0-openjdk-headless.x86_64 java-1.7.0-openjdk.x86_64 java-1.8.0-openjdk-headless.x86_64 java-1.8.0-openjdk.x86_64

oracle下载jdk并安装

# 安装oracle jdk1.8
$ sudo yum install -y jdk-8u144-linux-x64.rpm

7. 调整内核参数

$ sudo sysctl vm.swappiness=0
$ sudo vim /etc/sysctl.conf
vm.swappiness=0

# 使参数生效
$ sudo sysctl -p

# CentOS7.2需要修改/usr/lib/tuned下面的文件,否则开机会动态调整vm.swappiness参数。
$ grep -R 'vm.swappiness' *
latency-performance/tuned.conf:vm.swappiness=10
throughput-performance/tuned.conf:vm.swappiness=10
virtual-guest/tuned.conf:vm.swappiness = 30

# 修改virtual-guest/tuned.conf中的参数
$ sudo vim /usr/lib/tuned/virtual-guest/tuned.conf
vm.swappiness=0

8. 禁止透明大页面预先分配

$ sudo sh -c "echo never > /sys/kernel/mm/transparent_hugepage/defrag"
$ sudo sh -c "echo never > /sys/kernel/mm/transparent_hugepage/enabled"
$ sudo vim /etc/rc.local
echo never > /sys/kernel/mm/transparent_hugepage/defrag
echo never > /sys/kernel/mm/transparent_hugepage/enabled
# /etc/rc.local是/etc/rc.d/rc.local的符号链接,修改rc.local为可执行
$ sudo chmod +x /etc/rc.d/rc.local

9. 重启机器

$ sudo reboot

二、安装Cloudera Manager Server的主机设置

0. 下载CM安装所需RPM文件和parcel文件

$ mv CDH-5.7.6-1.cdh5.7.6.p0.6-el7.parcel.sha{1,}

1. 为yum源添加cloudera-manager.repo文件

CM Archive下载cloudera-manager.repo文件,修改里面的baseurl对应到你所安装的版本(我这里的版本是5.7.6),同时把gpgcheck=1改为gpgcheck=0,如果不修改的话,cloudera-manager-installer.bin安装时会自动把已经安装好的cloudera rpm包在线升级到最新版本,gpgkey那行可以删掉。

$ vim cloudera-manager.repo
$ sudo cp cloudera-manager.repo /etc/yum.repos.d
[cloudera-manager]
# Packages for Cloudera Manager, Version 5, on RedHat or CentOS 7 x86_64              
name = Cloudera Manager
baseurl = http://archive.cloudera.com/cm5/redhat/7/x86_64/cm/5.7.6/
gpgcheck = 0

检查在yum源是否可以找到cloudera相关的包

$ yum list | grep cloudera

2. 将parcel文件放入/opt/cloudera/parcel-repo

将下载好的CDH文件(parcel、parcel.sha、manifest.json)移到/opt/cloudera/parcel-repo目录,如果此步没做,在Cloudera Manager进行群集安装时,系统会去网上下载parcel文件,此文件大小在1.4GB左右

$ sudo mkdir -p /opt/cloudera
$ sudo mv ~/cdh /opt/cloudera/parcel-repo

3. 安装Cloudera Manager的所有RPM

解压下载好的CM5.7.6压缩包

$ tar xvzf cm5.7.6-centos7.x86_64

进入解压后的cm目录,找到rpm文件,然后使用yum安装,yum会自动安装相关依赖包

$ cd cm/5/RPMS/x86_64
$ sudo yum localinstall --nogpgcheck -y cloudera-manager-agent-*.rpm cloudera-manager-server-*.rpm cloudera-manager-daemons-*.rpm

注意:如果不使用内置的PostgreSQL数据库,则不需要安装cloudera-manager-server-db的RPM包。

4. 删除db.properties文件

这里不使用内置数据库
$ sudo rm -f /etc/cloudera-scm-server/db.properties

5. 执行installer.bin安装文件

如果前面的RPMS包都已安装,并且cloudera-manager.repo文件配置正确,则这一步会很快完成(1分钟左右)
$ sudo ./cloudera-manager-installer.bin

6. 查看Cloudera Manager的服务状态

$ sudo service --status-all

7. 如果某个Cloudera服务没启动,就重启一下该服务

不使用内置数据库,则不用执行
$ sudo systemctl restart cloudera-scm-server-db

$ sudo systemctl restart cloudera-scm-server
$ sudo systemctl restart cloudera-scm-agent

8. 查看7180端口是否打开

Cloudera Manager Server使用7180端口,重启服务后要等几分钟(有时候需要5分钟左右)才能看到7180端口

$ watch sudo netstat -tulpn

使用浏览器访问Master服务器的ip:7180,就可以进入Cloudera Manager的Web配置界面

三、集群中其它主机上安装Cloudera Manager Agent

  1. 为yum源添加cloudera repo文件,内容与Master主机一样
  2. 只安装cloudera-manager-agent和cloudera-manager-daemons两个RPM包
$ sudo yum localinstall --nogpgcheck -y cloudera-manager-{agent,daemons}-*.rpm

四、主机角色分配

  • Master hosts:运行Hadoop的主要进程,例如HDFS NameNode和YARN Resource Manager.
  • Utility hosts:运行集群中的非主要进程,例如Cloudera Manager和Hive Metastore
  • Edge hosts:一般作为集群中客户端的访问节点来启动一些任务。
  • Worker hosts:主要运行DataNodes和其它一些分布式进程,如Impalad。
集群规模Master hostsUtility hostsEdge hostsWorker hosts
小规模
  • NameNode
  • YARN ResourceManager
  • JobHistory Server
  • ZooKeeper
  • Impala StateStore
  • Kudu Master
  • Secondary NameNode
  • Cloudera Manager
  • Cloudera Manager Management Service
  • Hive Metastore
  • HiveServer2
  • Impala Catalog
  • Hue
  • Oozie
  • Flume
  • Gateway configuration
  • DataNode
  • NodeManager
  • Impalad
  • Kudu tablet server
  • 五、数据库配置

    官方数据库设置文档

    1、安装MariaDB数据库

    • 查看CDH版本支持的MariaDB数据库版本(这里选择10.2版本)
    • 设置MariaDB
    # 移除旧的InnoDB日志文件
    $ sudo service mariadb stop
    $ mv /var/lib/mysql/ib_logfile{0,1} /tmp
    $ sudo vim /etc/my.cnf.d/server.cnf
    
    [mysqld]
    sql_mode=STRICT_ALL_TABLES
    
    transaction-isolation = READ-COMMITTED
    # Disabling symbolic-links is recommended to prevent assorted security risks;
    # to do so, uncomment this line:
    # symbolic-links = 0
    
    key_buffer = 16M
    key_buffer_size = 32M
    max_allowed_packet = 32M
    thread_stack = 256K
    thread_cache_size = 64
    query_cache_limit = 8M
    query_cache_size = 64M
    query_cache_type = 1
    
    max_connections = 550
    #expire_logs_days = 10
    #max_binlog_size = 100M
    
    #log_bin should be on a disk with enough free space. Replace '/var/lib/mysql/mysql_binary_log' with an appropriate path for your system
    #and chown the specified folder to the mysql user.
    log_bin=/var/lib/mysql/mysql_binary_log
    
    binlog_format = mixed
    
    read_buffer_size = 2M
    read_rnd_buffer_size = 16M
    sort_buffer_size = 8M
    join_buffer_size = 8M
    
    # InnoDB settings
    innodb_file_per_table = 1
    innodb_flush_log_at_trx_commit  = 2
    innodb_log_buffer_size = 64M
    innodb_buffer_pool_size = 4G
    innodb_thread_concurrency = 8
    innodb_flush_method = O_DIRECT
    innodb_log_file_size = 512M
    
    [mysqld_safe]
    log-error=/var/log/mariadb/mariadb.log
    pid-file=/var/run/mariadb/mariadb.pid
    

    MYSQL官方下载mysql的jdbc,在所有需要连接MariaDB的主机上复制一份到/usr/share/java/mysql-connector-java.jar

    2、需要数据库的服务

    服务名说明
    Cloudera ManagerContains all the information about services you have configured and their role assignments, all configuration history, commands, users, and running processes. This relatively small database (< 100 MB) is the most important to back up.
    Oozie ServerContains Oozie workflow, coordinator, and bundle data. Can grow very large.
    Sqoop ServerContains entities such as the connector, driver, links and jobs. Relatively small.
    Activity MonitorContains information about past activities. In large clusters, this database can grow large. Configuring an Activity Monitor database is only necessary if a MapReduce service is deployed.
    Reports ManagerTracks disk utilization and processing activities over time. Medium-sized.
    Hive Metastore ServerContains Hive metadata. Relatively small.
    Hue ServerContains user account information, job submissions, and Hive queries. Relatively small.
    Sentry ServerContains authorization metadata. Relatively small.
    Cloudera Navigator Audit ServerContains auditing information. In large clusters, this database can grow large.
    Cloudera Navigator Metadata ServerContains authorization, policies, and audit report metadata. Relatively small.

    3、创建Cloudera Manager数据库

    $ sudo /usr/share/cmf/schema/scm_prepare_database.sh mysql -h <mysql-server> -u root -p[password] --scm-host <cm-server> scm scm scm
    

    4、根据需要创建以下数据库

    角色数据库名用户名密码
    Activity Monitor(如果使用MapReduce服务)amonamonamon
    Reports Managerrmanrmanrman
    Hive Metastore Servermetastorehivehive
    Sentry Serversentrysentrysentry
    Cloudera Navigator Audit Servernavnavnav
    Cloudera Navigator Metadata Servernavmsnavmsnavms
    # 连入mysql
    mysql -u root -p
    
    -- 创建aman数据库
    create database amon default character set utf8;
    grant all on amon.* to 'amon'@'%' identified by 'amon';
    
    -- 创建rman数据库
    create database rman default character set utf8;
    grant all on rman.* to 'rman'@'%' identified by 'rman';
    
    -- 创建hive数据库
    create database metastore default character set utf8;
    grant all on metastore.* to 'hive'@'%' identified by 'hive';
    

    5、创建Oozie数据库

    create database oozie default character set utf8;
    grant all on oozie.* to 'oozie'@'localhost' identified by 'oozie';
    grant all on oozie.* to 'oozie'@'%' identified by 'oozie';
    

    复制mysql jdbc文件到/opt/cloudera/parcels/CDH/lib/ooize/lib

    6、创建Hue数据库

    create database hue default character set utf8 default collate utf8_general_ci;
    grant all on hue.* to 'hue'@'%' identified by 'hue';
    select * from information_schema.schemata;