Cloudera Hadoop 5 Configuration

This guide discusses the process of configuring Cloudera Hadoop 5 on a physical or virtual cluster.

Install Cloudera Manager

The Cloudera Manager must be downloaded and installed. For instructions, see the Cloudera website (http://www.cloudera.com/content/support/en/downloads/cloudera_manager/cm-5-1-0.html).

Proxy and Parcel Info in Cloudera Manager

  1. On a web browser, go to the Cloudera Manager.
  2. Click the Cloudera Manager hyperlink graphic on the top left portion of the window
  3. Click the Administration drop-down along the top of the window, then select Settings
  4. Select the Network button along the menu pane to the left
  5. In the Proxy Server field, enter the proxy qualified name, for example, proxy.my.company.com
  6. In the Proxy Port field, enter the proxy port number
  7. Select the Parcels button along the menu pane to the left
    1. Overwrite the field that says http://archive.cloudera.com/cdh5/parcels/latest/ with http//archive.cloudera.com/cdh5/parcels/5.3.1/
  8. Hit the Save Changes button to the top right of the active menu
  9. Hit the admin drop-down menu at the top right corner of the window and logout
  10. Log back in using the same admin username password combo

Submit License File

  1. Acquire the Cloudera license file.
  2. Under the Cloudera Enterprise column, click on the empty text field to the left of the Upload button
  3. Select the license file
  4. Hit the Upload button
  5. Hit Continue on the bottom right of the window

Specifying Hosts

This step connects the master node to the rest of the cluster. The syntax used to search for hostnames is identical to what can be found in the /etc/hosts file or by DNS lookup.

Hit Continue through the “Thank you for choosing Cloudera Manager and CDH” window. In the text field presented, enter the hostnames of each node in the following syntax:

master.clustername.cluster
node[01-03].clustername.cluster

Where clustername is the name of the cluster, and [01-03] is the range of slave nodes in the cluster ([01-07] for an 8 node cluster, [01-15] for a 16 node cluster).

Hit Search, and make sure that the computer detects as many hosts as there are nodes in the cluster. See Fig. 11.1 for examples. If all correct hosts are selected, hit Continue. Otherwise, click New Search.

R_images/ad_inst_cloudera_specify_host.png

Fig. 11.1 Specify hosts for your CDH cluster installation.

CDH Parcel Repository

The repository/proxy information should populate the parcel list in a minute. If not, click on More Options field to reconfigure. Make sure CDH-5.3.1-1.cdh5.3.1.p).3 is selected under Remote Parcel Repository (see Fig. 11.2) and then hit Continue.

R_images/ad_inst_cloudera_select_repo.png

Fig. 11.2 Select Repository

Java Encryption

Java encryption is not currently supported.

SSH Login Credentials

Fill out appropriate login information for CDH administrator user.

Cluster Installation

The next couple of windows are just progress bars. If any of them fail and turn red, sometimes just hitting Retry will fix the problem nodes. See Fig 11.3.

Hit Continue button when it lights up after the progress bar fills. You will be greeted by more progress bars. Wait and hit Continue when they finish too.

R_images/ad_inst_cloudera_cluster_installation.png

Fig. 11.3 Cluster Installation

Host Configuration

When the cluster installation finishes, look for any critical errors. Take note of anything that doesn’t have a green check mark next to it and resolve the issue. See Fig. 11.4.

Click Finish

R_images/ad_inst_cloudera_validations.png

Fig. 11.4 Host Configuration

CDH Services to Install

Choose the CDH 5 services to install on your cluster. The following windows will show the process of installing services and roles on each node in the cluster. This is the Trusted Analytics Platform default setup.

In the “Choose a combination of services to install” dialogue, select the “Custom Services” button. In the drop-down menu, mark the following boxes:

  • HBase
  • HDFS
  • Spark
  • YARN (MR2 Included)
  • ZooKeeper

See Fig. 11.5. Click Continue.

R_images/ad_inst_cloudera_cdh_services.png

Fig. 11.5 Custom CDH Services

Customize Role Assignments

This page allows designation of which roles the different nodes will take up. In a default loadout, almost all of these fields will be left to their default, but there are four that need to be changed.

  1. Under the HBase section, click on the HBase Thrift Server dialogue and select the “master” node of the cluster
  2. Under the HDFS section, click on the Secondary Name Node dialogue and select “node01” of the cluster
  3. Under the YARN section, click on the Job History Server dialogue and select “node01” of the cluster
  4. Under the ZooKeeper section, click on the Server dialogue and select “node01”, “node02” and “node03” of the cluster

Leave all other fields in their default values and click Continue.

See Fig. 11.6 for changes to make near the top:

R_images/ad_inst_cloudera_hbase.png

Fig. 11.6 Hbase

See Fig. 11.7 for changes to make near the bottom:

R_images/ad_inst_cloudera_yarn.png

Fig. 11.7 Yarn

Database Setup

The “Database Host Name” field should auto-populate with the hostname of the system on which Cloudera Manager is installed. If not, fill that in.

Click Test Connection. See Fig 11.8. If successful, click Continue.

R_images/ad_inst_cloudera_database_setup.png

Fig. 11.8 Database Setup

Review Changes

In the “Review Changes” window, all fields should remain their default values.

Click Continue.

Finishing Up In Cloudera Manager

The next page requires no interaction. Just more loading bars.

  1. Wait for all services to start up, then hit Continue.
  2. In the Congratulations! window, click Finish.
  3. Some of the health indicators may be orange or red in the first few moments of the cluster’s life. Wait a minute for them to all turn green.
  4. In the Cloudera Manager page, change the name of the cluster by hitting the drop down arrow to the right of the Cluster 1 heading then clicking Rename Cluster. See Fig. 11.9.
  5. In the Cloudera Manager, hit the admin drop-down at the top right corner of the screen and select Change Password. Change the password as desired.
  6. Select the Spark service from the homescreen.
    1. Select Configuration along the top Spark menu.
    2. Select Worker Default Group along the left side menu pane.
    3. Select the Work Directory field and change the value to a directory with the capacity to store lots of temporaty data (the /mnt directory for virtual clusers).
R_images/ad_inst_cloudera_finishing.png

Fig. 11.9 Finishing Up In Cloudera Manager

Final Settings and Tests

Test functionality of HDFS.