Virtual Machines

Introduction

This guide goes through the download and import of Trusted Analytics Platform beta on a virtual machine (VM). Currently Trusted Analytics Platform VM only supports Virtual Box. These instructions do not cover the installation of Virtual Box. Virtual Box supports many platforms and can be downloaded for free. The installation documentation is also available online.

Requirements

  • 12GB of memory needs to be allocated to the VM
  • 45GB of free hard drive space
  • Working Virtual Box 4.3 installation

Download VM Image

Open a Linux shell (or for Windows user a command prompt) to run the various commands.

The VM image is downloaded from AWS. The download requires the AWS Command Line Interface (CLI) client. Instructions for downloading and installing CLI can be found at Amazon cli documentation.

After installing the interface, verify the installation by running:

$ aws --version

The result should be similar to this:

aws-cli/1.2.9 Python/2.7.6 Linux/3.8.0-35-generic

Take note of the aws-cli version, as it must be greater or equal to 1.2.9. Older versions of the aws-cli client does not work with the restricted permissions.

The aws-cli client can be updated with pip or download the new Windows MSI (reference Windows GUI Client).

$ sudo pip install -U awscli

After aws installation, run:

$ aws configure

The program prompts for the access and secret tokens given at registration. When prompted for the “Default region name”, use “us-west-2”. When prompted for the “Default output format”, use “json”.

AWS Access Key ID [None]: <my access key>
AWS Secret Access Key [None]: <my secret key>
Default region name [None]: <us-west-2>
Default output format [None]: <json>

List the files in the directory:

$ aws s3 ls s3://trustedanalytics-repo/release/latest/vm/
2014-08-19 12:57:03           0
2014-11-25 16:22:57          70 TrustedAnalytics-VM.md5
2014-11-25 16:22:57 14656025025 TrustedAnalytics-VM.tar.gz

Download the tar.gz file. In this case, it’s ‘TrustedAnalytics-VM.tar.gz’:

$ aws s3 cp s3://trustedanalytics-repo/release/latest/vm/TrustedAnalytics-VM.tar.gz ./

[Skip section about Windows GUI Client].

Windows GUI Client

If you are on a Windows machine, and you prefer a GUI client, use the S3Browser to download the VM.

  1. Download the Windows MSI http://s3browser.com/download.php.

  2. Install and open the S3Browser application.

  3. Add the keys provided.

    1. Navigate to:

      1. Accounts
      2. Add new account

      or press Ctrl + Shift + A.

      See Fig. 15.1.

      R_images/ad_inst_vm_add_new_acct.png

      Fig. 15.1 Add New Account

    2. In the account creation window:

      1. Add your access and secret keys
      2. Give the account a name

      See Fig. 15.2.

      R_images/ad_inst_vm_new_acct_info.png

      Fig. 15.2 New Account Information

  4. Navigate to:

    1. Buckets
    2. Add External Bucket

    or press Ctrl + E.

    See Fig. 15.3.

    R_images/ad_inst_vm_add_bucket.png

    Fig. 15.3 Add External Bucket

  5. Add the bucket url “trustedanalytics-repo/release”, then click Add External bucket. See Fig. 15.4.

    R_images/ad_inst_vm_bucket_name.png

    Fig. 15.4 Give Bucket Name

  6. After adding the bucket, a list of folders shows up on the right. See Fig. 15.5.

    R_images/ad_inst_vm_check_folder_list.png

    Fig. 15.5 Check Folder List

  7. Select the appropriate version, and navigate to the VM folder, then right click and download the “tar.gz” file. See Fig. 15.6.

    R_images/ad_inst_vm_download_file.png

    Fig. 15.6 Download File

Extract Archive

Extracting On Windows

Extracting on Windows is relatively easy. Use 7zip (or equivalent tool) to extract the archive.

Extracting On Linux

After acquiring the VM, extract the archive:

$ tar -xvf TrustedAnalytics-VM.tar.gz

After extraction, there should be two (2) files, one with the extension ‘vmdk’, and another with the extension ‘ovf’.

Import Image

To import the VM image, do the following steps in Virtual Box.

  1. Go to the File menu, then Import Appliance. See Fig. 15.7.

    R_images/ad_inst_vm_file_import_app.png

    Fig. 15.7 File -> Import Appliance

  2. Select the file with the extension ‘ovf’, which was extracted earlier from the VM image. See Fig. 15.8.

    R_images/ad_inst_vm_app_to_import.png

    Fig. 15.8 Appliance to Import

  3. Import the Trusted Analytics Platform VM. See Fig. 15.9.

    R_images/ad_inst_vm_app_settings.png

    Fig. 15.9 Appliance Settings

  4. After clicking Import, wait for the VM to be imported. See Fig. 15.10.

    R_images/ad_inst_vm_watch_import.png

    Fig. 15.10 Watching Appliance Import

  5. Once the VM is imported, boot the VM by selecting the VM and clicking Start. See Fig. 15.11.

    R_images/ad_inst_vm_boot_vm.png

    Fig. 15.11 Boot the VM

Running Trusted Analytics Platform VM Image

Before starting

After every reboot of the VM, the Trusted Analytics Platform server must also be restarted.

$ sudo service trustedanalytics restart

Upon restart, if the service wasn’t running before it was told to stop, the system reports:

initctl: Unknown instance:

This message can be safely ignored.

Sample Scripts

The VM is pre-configured and installed with Trusted Analytics Platform. Several examples and datasets are included to get people familiar with the coding and behavior of Trusted Analytics Platform.

The examples are located in ‘/home/cloudera/examples’.

drwxr-xr-x 2 cloudera cloudera 4096 Aug  1 00:53 datasets
-rw-r--r-- 1 cloudera cloudera 1100 Aug  1 10:15 lbp.py
-rw-r--r-- 1 cloudera cloudera  707 Aug  1 00:53 lda.py
-rw-r--r-- 1 cloudera cloudera  930 Aug  1 00:53 lp.py

The datasets are located in ‘/home/cloudera/examples/datasets’ and ‘hdfs://user/trustedanalytics/datasets/’.

-rw-r--r--   1 atkuser atkuser        122 2014-08-01 /user/trustedanalytics/datasets/README
-rw-r--r--   1 atkuser atkuser     617816 2014-08-01 /user/trustedanalytics/datasets/apl.csv
-rw-r--r--   1 atkuser atkuser    8162836 2014-08-01 /user/trustedanalytics/datasets/lbp_edge.csv
-rw-r--r--   1 atkuser atkuser     188470 2014-08-01 /user/trustedanalytics/datasets/lp_edge.csv
-rw-r--r--   1 atkuser atkuser  311641390 2014-08-01 /user/trustedanalytics/datasets/test_lda.csv

The datasets in ‘/home/cloudera/examples/datasets’ are for reference. The actual data that is being used by the Python examples and the Trusted Analytics Platform server is in ‘hdfs://user/trustedanalytics/datasets’.

To run any of the Python example scripts, start in the examples directory and start Python with the script name:

$ python <SCRIPT_NAME>.py

where <SCRIPT_NAME> is any of the scripts in ‘/home/cloudera/example’.

Example:

$ cd /home/cloudera/examples
$ python pr.py

Eclipse/PyDev

The VM comes with Eclipse and PyDev installed and ready for use. Importing the example scripts is easy.

  1. Go to the desktop, and double-click on the Eclipse icon.

  2. Go to File menu, and select New and then Other.

    See Fig. 15.12.

    R_images/ad_inst_vm_start_eclipse.png

    Fig. 15.12 Starting Eclipse

  3. After selecting File->**New**->**Other**, look for the PyDev folder and expand the list, then select PyDev Project then click Next. See Fig. 15.13.

    R_images/ad_inst_vm_new_pydev.png

    Fig. 15.13 New PyDev Project

  4. The only field you have to change is the ‘Project Contents’ default directory. Uncheck ‘Use default’ and enter the directory you want to use ‘/home/cloudera/examples’. Everything else can be left with the default values. Click Next when you are done. See Fig. 15.14.

    R_images/ad_inst_vm_working_path.png

    Fig. 15.14 Enter Working Path

  5. You should now be able to see all the example scripts on the left hand pane. See Fig. 15.15.

    R_images/ad_inst_vm_example_scripts.png

    Fig. 15.15 Examining Example Scripts

Logs

To debug changes to the scripts (or to peek behind the curtain), the log file is ‘/var/log/trustedanalytics/rest-server/output.log’. To show the log as it is generated, run tail -f:

$ sudo tail -f /var/log/trustedanalytics/rest-server/output.log

Updating

Upon receipt of access and secret tokens, edit ‘/etc/yum.repos.d/ta.repo’ and replace myKey and mySecret. Afterwards, it is recommended to run yum commands to check for and perform updates.

$ sudo [vi|vim] /etc/yum.repos.d/ta.repo

[Trusted Analytics repo]
name=Trusted Analytics yum repo
baseurl=https://s3-us-west-2.amazonaws.com/trustedanalytics-repo/release/latest/yum/dists/rhel/6
gpgcheck=0
priority=1
#enabled=0
s3_enabled=0
key_id=myKey
secret_key=mySecret

To check for new updates and see the difference between the new and installed version:

$ sudo yum info trustedanalytics-rest-server

To update:

$ sudo yum update trustedanalytics-rest-server

Common VM problems

  • The VM doesn’t have enough memory allocated.
  • The TA REST server wasn’t restarted after restart or boot.