Hadoop 101 – DAPLAB – Data Analysis and Processing Lab http://daplab.ch Reduces the entry barrier for companies to find value out of their data and ultimately turn into a data-driven company Wed, 29 Jun 2016 07:11:51 +0000 en-US hourly 1 https://wordpress.org/?v=5.6.10 http://daplab.ch/wp-content/uploads/2017/06/cropped-daplab-favicon-1-32x32.png Hadoop 101 – DAPLAB – Data Analysis and Processing Lab http://daplab.ch 32 32 Spark Hello World http://daplab.ch/2015/09/21/spark-hello-world/ Mon, 21 Sep 2015 12:54:03 +0000 http://daplab.ch/?p=62 A new tutorial is available on docs.daplab.ch. It will guide you through the basics of Apache Spark and its scala interpreter (spark-shell). Enjoy !

The post Spark Hello World appeared first on DAPLAB - Data Analysis and Processing Lab.

]]>
A new tutorial is available on docs.daplab.ch. It will guide you through the basics of Apache Spark and its scala interpreter (spark-shell). Enjoy !

The post Spark Hello World appeared first on DAPLAB - Data Analysis and Processing Lab.

]]>
Pig Hello World http://daplab.ch/2015/09/21/pig-hello-world/ Mon, 21 Sep 2015 12:51:17 +0000 http://daplab.ch/?p=60 A new tutorial is available on docs.daplab.ch. It will guide you through the basics of Apache Pig. Enjoy !

The post Pig Hello World appeared first on DAPLAB - Data Analysis and Processing Lab.

]]>
A new tutorial is available on docs.daplab.ch. It will guide you through the basics of Apache Pig. Enjoy !

The post Pig Hello World appeared first on DAPLAB - Data Analysis and Processing Lab.

]]>
Hive Hello World http://daplab.ch/2015/09/21/hive-hello-world/ Mon, 21 Sep 2015 12:45:04 +0000 http://daplab.ch/?p=55 A new tutorial is available on docs.daplab.ch. It will guide you through the basics of Apache Hive. Enjoy !

The post Hive Hello World appeared first on DAPLAB - Data Analysis and Processing Lab.

]]>
A new tutorial is available on docs.daplab.ch. It will guide you through the basics of Apache Hive. Enjoy !

The post Hive Hello World appeared first on DAPLAB - Data Analysis and Processing Lab.

]]>
HDFS Hello World http://daplab.ch/2015/09/04/hdfs-hello-world/ Fri, 04 Sep 2015 09:28:06 +0000 http://daplab.ch/?p=47 This page aims at creating a “copy-paste”-like tutorial to familiarize with HDFS commands . It mainly focuses on user commands (uploading and downloading data into HDFS). Requirements SSH (for Windows, use PuTTY and see how to create a key with PuTTY) An account in the DAPLAB, and send your ssh public key to Benoit. A browser — well, […]

The post HDFS Hello World appeared first on DAPLAB - Data Analysis and Processing Lab.

]]>
This page aims at creating a “copy-paste”-like tutorial to familiarize with HDFS commands . It mainly focuses on user commands (uploading and downloading data into HDFS).

Requirements

  • SSH (for Windows, use PuTTY and see how to create a key with PuTTY)
  • An account in the DAPLAB, and send your ssh public key to Benoit.
  • A browser — well, if you can access this page, you should have met this requirement 🙂

Resources

While the source of truth for HDFS commands is the code source, the documentation page describing the hdfs dfs commands is really useful:

Basic Manipulations

Listing a folder

Your home folder

$ hdfs dfs -ls
Found 28 items
...
-rw-r--r--   3 bperroud daplab_user    6398990 2015-03-13 11:01 data.csv
...
^^^^^^^^^^   ^ ^^^^^^^^ ^^^^^^^^^^^    ^^^^^^^ ^^^^^^^^^^ ^^^^^ ^^^^^^^^
         1   2        3           4          5          6     7        8
Columns, as numbered below, represent:
  1. Permissions, in unix-style syntax
  2. Replication factor (RF in short), default being 3 for a file. Directories have a RF of 0.
  3. Owner
  4. Group owning the file
  5. Size of the file, in bytes. Note that to compute the physical space used, this number should be multiplied by the RF.
  6. Modification date. As HDFS is mostly a write-once-read-many filesystem, this date often means creation date
  7. Modification time. Same as date.
  8. Filename, within the listed folder

Listing the /tmp folder

$ hdfs dfs -ls /tmp

Uploading a file

In /tmp

$ hdfs dfs -copyFromLocal localfile.txt /tmp/
The first arguments after -copyFromLocal point to local files or folders, while the last argument is a file (if only one file listed as source) or directory in HDFS.
Note: hdfs dfs -put is doing about the same thing, but -copyFromLocal is more explicit when you’re uploading a local file and thus preferred.

Downloading a file

From /tmp

$ hdfs dfs -copyToLocal /tmp/remotefile.txt .
The first arguments after -copyToLocal point to files or folder in HDFS, while the last argument is a local file (if only one file listed as source) or directory.
hdfs dfs -get is doing about the same thing, but -copyToLocal is more explicit when you’re downloading a file and thus preferred.

Creating a folder

In your home folder

$ hdfs dfs -mkdir dummy-folder

In /tmp

$ hdfs dfs -mkdir /tmp/dummy-folder
Note that relative paths points to your home folder, /user/bperroud for instance.

The post HDFS Hello World appeared first on DAPLAB - Data Analysis and Processing Lab.

]]>