It’s been a while without writing any post but this one is going to be useful for people that face the task of installing a CDH cluster from scratch.
There are a few prerequisites before starting with the installation that need to be configured or otherwise the installation can/will crash. To address some of these I wrote some small scripts in ansible to automatize the adjustment of these parameters or the data copy. The scripts are not finished, and I plan to improve them adding a few parts that I still haven’t automatized like the creation of the yum repositories for installing cloudera manager and a few other small tasks.
For this manual we assume that Cloudera manager has already been installed, and the database to hold Cloudera manager data repository and some of the other tools has been installed as well.
The script is written in ansible and it has three playbooks and an inventory file. The inventory file is in yaml format. I grouped the nodes between master(s) / workers (datanodes) the Cloudera manager server and the gateway. The inventory file is defined as follows and you can save it as inventory.yaml:
all: children: cm: hosts: cm_host: master: hosts: master_host: workers: hosts: worker1_host: worker2_host: worker3_host: gateway: hosts: gateway_host:
Replace the xxx_host by the fully qualified domain name of your server.
Then we have the prerequisites playbook, this one has more substance. Apart from the prerequisites enunciated in Cloudera’s website I added some tweaks and actions like the change of the mysql jdbc driver, as the one in yum is outdated and will make the creation of the dbs to crash in the wizard. You can save this one as cloudera_prerequisites.yaml:
--- - hosts: all connection: ssh remote_user: youruser become: yes become_method: sudo become_user: root tasks: - service: name=firewalld state=stopped enabled=False - selinux: state=disabled - sysctl: name=net.ipv6.conf.all.disable_ipv6 value=1 state=present - sysctl: name=net.ipv6.conf.default.disable_ipv6 value=1 state=present - sysctl: name=vm.swappiness value=1 state=present - shell: sysctl -w vm.swappiness=1 - copy: src=/etc/hosts dest=/etc/hosts owner=root group=root mode=0644 - yum: name=java-1.8.0-openjdk-devel state=latest - systemd: name=tuned state=started - shell: tuned-adm off - systemd: name=tuned state=stopped enabled=False - name: Disable THP support scripts added to rc.local lineinfile: path: /etc/rc.local line: | echo never > /sys/kernel/mm/transparent_hugepage/enabled echo never > /sys/kernel/mm/transparent_hugepage/defrag - name: Change permissions of /etc/rc.local to make it run on boot shell: chmod +x /etc/rc.d/rc.local become_method: sudo - service: name=ntpd state=started - name: Allow 'simigsolutions' user to have passwordless sudo lineinfile: path: /etc/sudoers state: present regexp: '^youruser' line: 'youruser ALL =(ALL) NOPASSWD: ALL' # - name: Install mysql jdbc driver # yum: # name: mysql-connector-java # state: latest - name: download newer jdbc for mysql to avoid crash get_url: url=https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.46.tar.gz dest=/tmp/mysql-connector-java-5.1.46.tar.gz - name: Check if /tmp/mysql-connector-java-5.1.46.tar.gz exists stat: path: /tmp/mysql-connector-java-5.1.46.tar.gz register: stat_result - block: - name: Extract downloaded jdbc unarchive: src: /tmp/mysql-connector-java-5.1.46.tar.gz dest: /tmp/ - name: Creates directory for the java driver if it does not exist file: path: /usr/share/java state: directory mode: 0755 recurse: yes - name: Copies the file copy: src: /tmp/mysql-connector-java-5.1.46/mysql-connector-java-5.1.46-bin.jar dest: /usr/share/java/mysql-connector-java.jar owner: root group: root mode: 0644 - name: Copies the file to sqoop as well copy: src: /tmp/mysql-connector-java-5.1.46/mysql-connector-java-5.1.46-bin.jar dest: /var/lib/sqoop/mysql-connector-java.jar owner: sqoop group: sqoop mode: 0644 when: stat_result.stat.exists == True
The script can be called with the following: ansible-playbook -i inventory.yaml cloudera_prerequisites.yaml –ask-pass –ask-become-pass
I’ve also created two aditional playbooks, one to create folders in the mount points to store hdfs and yarn data:
--- - hosts: workers connection: ssh remote_user: youruser become: yes become_method: sudo become_user: root tasks: - name: Creates directory datanodes file: path: /home/data/dfs/dn state: directory owner: hdfs group: hdfs mode: 0700 recurse: yes - name: Creates directory namenodes file: path: /home/data/yarn/nm state: directory owner: yarn group: yarn mode: 0700 recurse: yes - name: Creates directory namenodes file: path: home/data/impala/impalad state: directory owner: impala group: impala mode: 0700 recurse: yes - hosts: master connection: ssh remote_user: youruser become: yes become_method: sudo become_user: root tasks: - name: Creates directory namenode file: path: /home/data/dfs/nn state: directory owner: hdfs group: hdfs mode: 0700 recurse: yes - name: Creates directory secondary namenode file: path: /home/data/dfs/snn state: directory owner: hdfs group: hdfs mode: 0700 recurse: yes
And another one to copy some jdbc driver and distribute it in all machines of the cluster (sql server) but can be adapted to any downloadable jdbc driver:
- hosts: all connection: ssh remote_user: youruser become: yes become_method: sudo become_user: root tasks: - name: download sqlserver jdbc driver get_url: url=https://download.microsoft.com/download/4/D/C/4DCD85FA-0041-4D2E-8DD9-833C1873978C/sqljdbc_184.108.40.206_enu.tar.gz dest=/tmp/sqljdbc_220.127.116.11_enu.tar.gz - name: Check if /tmp/sqljdbc_18.104.22.168_enu.tar.gz exists stat: path: /tmp/sqljdbc_22.214.171.124_enu.tar.gz register: stat_result - block: - name: Extract downloaded jdbc unarchive: src: /tmp/sqljdbc_126.96.36.199_enu.tar.gz dest: /tmp/ - name: Copies the file into the aqoop folder copy: src: /tmp/sqljdbc_7.2/enu/mssql-jdbc-7.2.2.jre8.jar dest: /var/lib/sqoop/mssql-jdbc-7.2.2.jre8.jar owner: sqoop group: sqoop mode: 0644 when: stat_result.stat.exists == True
Happy Installation 🙂