Spark MLlib Hello World

This page aims at creating a “copy-paste”-like tutorial to run your first Spark MLlib script. Requirements SSH (for Windows, use PuTTY and see how to create a key with PuTTY) An account in the DAPLAB, and send your ssh public key to Benoit. A browser — well, if you can access this page, you should […]

Spark Hello World

A new tutorial is available on docs.daplab.ch. It will guide you through the basics of Apache Spark and its scala interpreter (spark-shell). Enjoy !

A new framework to simplify interaction with YARN: Apache Twill

YARN, aka NextGen MapReduce, is awesome for building fault-tolerant distributed applications. But writing plain YARN application is far than trivial and might even be a show-stopper to lots of engineers. The good news is that a framework to simplify interaction with YARN emerged and met the Apache foundation: Apache Twill. While still in the incubation phase, the project looks […]

HDFS Hello World

This page aims at creating a “copy-paste”-like tutorial to familiarize with HDFS commands . It mainly focuses on user commands (uploading and downloading data into HDFS). Requirements SSH (for Windows, use PuTTY and see how to create a key with PuTTY) An account in the DAPLAB, and send your ssh public key to Benoit. A browser — well, […]