This page aims at creating a “copy-paste”-like tutorial to familiarize with HDFS commands . It mainly focuses on user commands (uploading and downloading data into HDFS).
- SSH (for Windows, use PuTTY and see how to create a key with PuTTY)
- An account in the DAPLAB, and send your ssh public key to Benoit.
- A browser — well, if you can access this page, you should have met this requirement 🙂
While the source of truth for HDFS commands is the code source, the documentation page describing the
hdfs dfs commands is really useful:
Listing a folder
Your home folder
$ hdfs dfs -ls Found 28 items ... -rw-r--r-- 3 bperroud daplab_user 6398990 2015-03-13 11:01 data.csv ... ^^^^^^^^^^ ^ ^^^^^^^^ ^^^^^^^^^^^ ^^^^^^^ ^^^^^^^^^^ ^^^^^ ^^^^^^^^ 1 2 3 4 5 6 7 8
- Permissions, in unix-style syntax
- Replication factor (RF in short), default being 3 for a file. Directories have a RF of 0.
- Group owning the file
- Size of the file, in bytes. Note that to compute the physical space used, this number should be multiplied by the RF.
- Modification date. As HDFS is mostly a write-once-read-many filesystem, this date often means creation date
- Modification time. Same as date.
- Filename, within the listed folder
Listing the /tmp folder
$ hdfs dfs -ls /tmp
Uploading a file
$ hdfs dfs -copyFromLocal localfile.txt /tmp/
-copyFromLocalpoint to local files or folders, while the last argument is a file (if only one file listed as source) or directory in HDFS.
hdfs dfs -putis doing about the same thing, but
-copyFromLocalis more explicit when you’re uploading a local file and thus preferred.
Downloading a file
$ hdfs dfs -copyToLocal /tmp/remotefile.txt .
-copyToLocalpoint to files or folder in HDFS, while the last argument is a local file (if only one file listed as source) or directory.
hdfs dfs -getis doing about the same thing, but
-copyToLocalis more explicit when you’re downloading a file and thus preferred.
Creating a folder
In your home folder
$ hdfs dfs -mkdir dummy-folder
$ hdfs dfs -mkdir /tmp/dummy-folder