I am looking at running some code on multiple machines so it will scale. Although I have never done this kind of thing before, I think "Surely I can Google it..."
Here's one article I found: https://www.ibm.com/developerworks/data/library/techarticle/dm-1209hadoopbigdata/
The title is (click the link if you don't believe me):
Open Source Big Data for the Impatient, Part 1: Hadoop tutorial: Hello World with Java, Pig, Hive, Flume, Fuse, Oozie, and Sqoop with Informix, DB2, and MySQL.
I thought this was a joke. But it's not.
One: Feel free to make fun of the title of the article or explain to me why this isn't a laughing matter.
Two: Has anyone used a distributed file system and job scheduler of some kind that actually worked?