Spark MLlib Hello World

This page aims at creating a “copy-paste”-like tutorial to run your first Spark MLlib script. Requirements SSH (for Windows, use PuTTY and see how to create a key with PuTTY) An account in the DAPLAB, and send your ssh public key to Benoit. A browser — well, if you can access this page, you should […]

A new framework to simplify interaction with YARN: Apache Twill

YARN, aka NextGen MapReduce, is awesome for building fault-tolerant distributed applications. But writing plain YARN application is far than trivial and might even be a show-stopper to lots of engineers. The good news is that a framework to simplify interaction with YARN emerged and met the Apache foundation: Apache Twill. While still in the incubation phase, the project looks […]

Available dataset : homogeneous meteorological data

We give access to homogeneous monthly values of temperature and precipitation for 14 stations from 1864 until today. Yearly values are averaged for whole Switzerland Since 1864 and are now on the DAPLAB ! Data set Explanation The file is a .txt and contains a four rows headers. MeteoSchweiz / MeteoSuisse / MeteoSvizzera / MeteoSwiss […]