CentOS7.2安装Cloudera5.7.6
用getenforce
命令检查SELinux是否已禁用
$ getenforce
Disabled
修改SELinux配置文件
$ sudo vim /etc/selinux/config
SELINUX=disabled
$ sudo systemctl stop firewalld
$ sudo systemctl disable firewalld
此文件必须群集内所有主机都一致,可以在master主机上配置好,然后scp到其他slave主机
$ sudo vim /etc/hosts
192.168.31.160 master
192.168.31.161 slave1
192.168.31.162 slave2
$ sudo scp /etc/hosts slave1:/etc/hosts
$ sudo scp /etc/hosts slave2:/etc/hosts
# 确保hostname命令的的主机名与hosts中本机的主机名一致
$ sudo vim /etc/hostname
master
$ hostnamectl
sudo vim /etc/sysconfig/network-scripts/ifcfg-eno
BOOTPROTO="static"
ONBOOT="yes"
IPADDR=192.168.31.160
GATEWAY=192.168.31.1
DNS1=192.168.31.1
$ sudo yum install -y ntp
$ sudo systemctl enable ntpd
$ sudo systemctl enable ntpdate
$ sudo vim /etc/ntp.conf
server time1.aliyun.com
$ sudo ntpdate time1.aliyun.com
$ timedatectl
卸载系统自带的openjdk
$ rpm -qa | grep --color openjdk
$ sudo yum remove -y java-1.7.0-openjdk-headless.x86_64 java-1.7.0-openjdk.x86_64 java-1.8.0-openjdk-headless.x86_64 java-1.8.0-openjdk.x86_64
从oracle下载jdk并安装
# 安装oracle jdk1.8
$ sudo yum install -y jdk-8u144-linux-x64.rpm
$ sudo sysctl vm.swappiness=0
$ sudo vim /etc/sysctl.conf
vm.swappiness=0
# 使参数生效
$ sudo sysctl -p
# CentOS7.2需要修改/usr/lib/tuned下面的文件,否则开机会动态调整vm.swappiness参数。
$ grep -R 'vm.swappiness' *
latency-performance/tuned.conf:vm.swappiness=10
throughput-performance/tuned.conf:vm.swappiness=10
virtual-guest/tuned.conf:vm.swappiness = 30
# 修改virtual-guest/tuned.conf中的参数
$ sudo vim /usr/lib/tuned/virtual-guest/tuned.conf
vm.swappiness=0
$ sudo sh -c "echo never > /sys/kernel/mm/transparent_hugepage/defrag"
$ sudo sh -c "echo never > /sys/kernel/mm/transparent_hugepage/enabled"
$ sudo vim /etc/rc.local
echo never > /sys/kernel/mm/transparent_hugepage/defrag
echo never > /sys/kernel/mm/transparent_hugepage/enabled
# /etc/rc.local是/etc/rc.d/rc.local的符号链接,修改rc.local为可执行
$ sudo chmod +x /etc/rc.d/rc.local
$ sudo reboot
$ mv CDH-5.7.6-1.cdh5.7.6.p0.6-el7.parcel.sha{1,}
从CM Archive下载cloudera-manager.repo文件,修改里面的baseurl对应到你所安装的版本(我这里的版本是5.7.6),同时把gpgcheck=1
改为gpgcheck=0
,如果不修改的话,cloudera-manager-installer.bin安装时会自动把已经安装好的cloudera rpm包在线升级到最新版本,gpgkey那行可以删掉。
$ vim cloudera-manager.repo
$ sudo cp cloudera-manager.repo /etc/yum.repos.d
[cloudera-manager]
# Packages for Cloudera Manager, Version 5, on RedHat or CentOS 7 x86_64
name = Cloudera Manager
baseurl = http://archive.cloudera.com/cm5/redhat/7/x86_64/cm/5.7.6/
gpgcheck = 0
检查在yum源是否可以找到cloudera相关的包
$ yum list | grep cloudera
将下载好的CDH文件(parcel、parcel.sha、manifest.json)移到/opt/cloudera/parcel-repo目录,如果此步没做,在Cloudera Manager进行群集安装时,系统会去网上下载parcel文件,此文件大小在1.4GB左右
$ sudo mkdir -p /opt/cloudera
$ sudo mv ~/cdh /opt/cloudera/parcel-repo
解压下载好的CM5.7.6压缩包
$ tar xvzf cm5.7.6-centos7.x86_64
进入解压后的cm目录,找到rpm文件,然后使用yum安装,yum会自动安装相关依赖包
$ cd cm/5/RPMS/x86_64
$ sudo yum localinstall --nogpgcheck -y cloudera-manager-agent-*.rpm cloudera-manager-server-*.rpm cloudera-manager-daemons-*.rpm
注意:如果不使用内置的PostgreSQL数据库,则不需要安装cloudera-manager-server-db的RPM包。
这里不使用内置数据库
$ sudo rm -f /etc/cloudera-scm-server/db.properties
如果前面的RPMS包都已安装,并且cloudera-manager.repo文件配置正确,则这一步会很快完成(1分钟左右)
$ sudo ./cloudera-manager-installer.bin
$ sudo service --status-all
不使用内置数据库,则不用执行
$ sudo systemctl restart cloudera-scm-server-db
$ sudo systemctl restart cloudera-scm-server
$ sudo systemctl restart cloudera-scm-agent
Cloudera Manager Server使用7180端口,重启服务后要等几分钟(有时候需要5分钟左右)才能看到7180端口
$ watch sudo netstat -tulpn
使用浏览器访问Master服务器的ip:7180,就可以进入Cloudera Manager的Web配置界面
$ sudo yum localinstall --nogpgcheck -y cloudera-manager-{agent,daemons}-*.rpm
集群规模 | Master hosts | Utility hosts | Edge hosts | Worker hosts |
---|---|---|---|---|
小规模 |
# 移除旧的InnoDB日志文件
$ sudo service mariadb stop
$ mv /var/lib/mysql/ib_logfile{0,1} /tmp
$ sudo vim /etc/my.cnf.d/server.cnf
[mysqld]
sql_mode=STRICT_ALL_TABLES
transaction-isolation = READ-COMMITTED
# Disabling symbolic-links is recommended to prevent assorted security risks;
# to do so, uncomment this line:
# symbolic-links = 0
key_buffer = 16M
key_buffer_size = 32M
max_allowed_packet = 32M
thread_stack = 256K
thread_cache_size = 64
query_cache_limit = 8M
query_cache_size = 64M
query_cache_type = 1
max_connections = 550
#expire_logs_days = 10
#max_binlog_size = 100M
#log_bin should be on a disk with enough free space. Replace '/var/lib/mysql/mysql_binary_log' with an appropriate path for your system
#and chown the specified folder to the mysql user.
log_bin=/var/lib/mysql/mysql_binary_log
binlog_format = mixed
read_buffer_size = 2M
read_rnd_buffer_size = 16M
sort_buffer_size = 8M
join_buffer_size = 8M
# InnoDB settings
innodb_file_per_table = 1
innodb_flush_log_at_trx_commit = 2
innodb_log_buffer_size = 64M
innodb_buffer_pool_size = 4G
innodb_thread_concurrency = 8
innodb_flush_method = O_DIRECT
innodb_log_file_size = 512M
[mysqld_safe]
log-error=/var/log/mariadb/mariadb.log
pid-file=/var/run/mariadb/mariadb.pid
从MYSQL官方下载mysql的jdbc,在所有需要连接MariaDB的主机上复制一份到/usr/share/java/mysql-connector-java.jar
服务名 | 说明 |
---|---|
Cloudera Manager | Contains all the information about services you have configured and their role assignments, all configuration history, commands, users, and running processes. This relatively small database (< 100 MB) is the most important to back up. |
Oozie Server | Contains Oozie workflow, coordinator, and bundle data. Can grow very large. |
Sqoop Server | Contains entities such as the connector, driver, links and jobs. Relatively small. |
Activity Monitor | Contains information about past activities. In large clusters, this database can grow large. Configuring an Activity Monitor database is only necessary if a MapReduce service is deployed. |
Reports Manager | Tracks disk utilization and processing activities over time. Medium-sized. |
Hive Metastore Server | Contains Hive metadata. Relatively small. |
Hue Server | Contains user account information, job submissions, and Hive queries. Relatively small. |
Sentry Server | Contains authorization metadata. Relatively small. |
Cloudera Navigator Audit Server | Contains auditing information. In large clusters, this database can grow large. |
Cloudera Navigator Metadata Server | Contains authorization, policies, and audit report metadata. Relatively small. |
$ sudo /usr/share/cmf/schema/scm_prepare_database.sh mysql -h <mysql-server> -u root -p[password] --scm-host <cm-server> scm scm scm
角色 | 数据库名 | 用户名 | 密码 |
---|---|---|---|
Activity Monitor(如果使用MapReduce服务) | amon | amon | amon |
Reports Manager | rman | rman | rman |
Hive Metastore Server | metastore | hive | hive |
Sentry Server | sentry | sentry | sentry |
Cloudera Navigator Audit Server | nav | nav | nav |
Cloudera Navigator Metadata Server | navms | navms | navms |
# 连入mysql
mysql -u root -p
-- 创建aman数据库
create database amon default character set utf8;
grant all on amon.* to 'amon'@'%' identified by 'amon';
-- 创建rman数据库
create database rman default character set utf8;
grant all on rman.* to 'rman'@'%' identified by 'rman';
-- 创建hive数据库
create database metastore default character set utf8;
grant all on metastore.* to 'hive'@'%' identified by 'hive';
create database oozie default character set utf8;
grant all on oozie.* to 'oozie'@'localhost' identified by 'oozie';
grant all on oozie.* to 'oozie'@'%' identified by 'oozie';
复制mysql jdbc文件到/opt/cloudera/parcels/CDH/lib/ooize/lib
create database hue default character set utf8 default collate utf8_general_ci;
grant all on hue.* to 'hue'@'%' identified by 'hue';
select * from information_schema.schemata;