I am looking at running some code on multiple machines so it will scale. Although I have never done this kind of thing before, I think "Surely I can Google it..."

Here's one article I found: https://www.ibm.com/developerworks/data/library/techarticle/dm-1209hadoopbigdata/

The title is (click the link if you don't believe me):

Open Source Big Data for the Impatient, Part 1: Hadoop tutorial: Hello World with Java, Pig, Hive, Flume, Fuse, Oozie, and Sqoop with Informix, DB2, and MySQL.

I thought this was a joke. But it's not.

One: Feel free to make fun of the title of the article or explain to me why this isn't a laughing matter.

Two: Has anyone used a distributed file system and job scheduler of some kind that actually worked?

I think, therefore I am. I think? - "George Carlin"
My Website: Indie Game Programming

My Twitter: https://twitter.com/indieprogram

My Book: http://amzn.com/1305076532

Eelco

301

February 04, 2014 09:28 PM

As far as 'it just works' is concerned (not something cloud computing is known for), by far the best thing out there as far as I know of is picloud. But im not an expert; I just dabble.

ApochPiQ

23,138

February 04, 2014 09:55 PM

"Big data" is a shitty term.

It's popular as a fad name for a nebulous concept that nobody agrees on. No two people seem to share a definition for what it means. The common elements seem to be overengineering, hot new platforms, and hero worship.

It's basically just "enterprise software" all over again.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

Glass_Knife

Author

8,637

February 04, 2014 10:20 PM

As far as 'it just works' is concerned (not something cloud computing is known for), by far the best thing out there as far as I know of is picloud. But im not an expert; I just dabble.

I did a terrible job of explaining what we're trying to do. We do not want to use cloud resources, but setup our own inside network with our own servers. I had hoped that http://hadoop.apache.org/ would work, but like ApochPiQ suggested, it seems like an over-engineered nightmare.

I think, therefore I am. I think? - "George Carlin"
My Website: Indie Game Programming

My Twitter: https://twitter.com/indieprogram

My Book: http://amzn.com/1305076532

alnite

3,454

February 05, 2014 02:30 AM

The company I previously worked at used Hadoop to handle large quantity of application statistics. This ranges from response time to user activitiy events. If your application is used by 10M+ users, you can expect to generate, at least, 100+ incoming traffic coming from your app per second. Where do you want to store all this data so you can analyze it further? SQL?

RLS0812

3,045

February 05, 2014 06:07 AM

if I had 10M users at the same time ... I'd hire some C programmers to design the most efficient system possible to handle my unique situation - and make it scalable.

I personally have a hard time trusting databases I do not design myself ... had SQL corruption a few years ago that wiped everything off the servers .

I cannot remember the books I've read any more than the meals I have eaten; even so, they have made me.

~ Ralph Waldo Emerson

AlanSmithee

2,102

February 05, 2014 06:43 AM

This might not be relative to the question asked by the OP but to reply to Shippou; what do you mean by SQL corruption? If you have a large database, surely you would have scheduled backups?

Buster2000

4,310

February 05, 2014 12:18 PM

if I had 10M users at the same time ... I'd hire some C programmers to design the most efficient system possible to handle my unique situation - and make it scalable.

I personally have a hard time trusting databases I do not design myself ... had SQL corruption a few years ago that wiped everything off the servers .

Unfortunatly for finance and banking this is pretty much out of the question. The reason being that code has to be gone over with a fine tooth comb by all kinds of govenmental bodies to ensure that it is secure. They seem to think that standard C and C++ library features are unsecure and force you to rewite layers and layers of security wrapping up arrays and STL library containers until they perform so slowly that they are virtually unusable. They also follow language specifications from 15 years ago too.
If you use Java or a Java based language then they just nod it through.

ProtectedMode

1,339

February 05, 2014 12:35 PM

This might not be relative to the question asked by the OP but to reply to Shippou; what do you mean by SQL corruption? If you have a large database, surely you would have scheduled backups?

I'm not an expert, but if I had a server serving a lot of requests, even downtime of seconds to minutes isn't what you want. Corruption also means you can have unexpected errors depending on the situation. And, after corruption, you still have to change the state to the back-up, which costs time and money. Another problem is that you can hardly make backups every second. ;)

Glass_Knife

Author

8,637

February 05, 2014 04:38 PM

The company I previously worked at used Hadoop to handle large quantity of application statistics. This ranges from response time to user activitiy events. If your application is used by 10M+ users, you can expect to generate, at least, 100+ incoming traffic coming from your app per second. Where do you want to store all this data so you can analyze it further? SQL?

I understand what you're saying. For a situation where you've got servers all over the country, and they're aggregating tons of data, and then you need to run searches on the data in a fast, parallel, scale-able way, then the "Big Data" concept is the way to go. I just don't understand why everything I find seems broken, an alpha version, or so complicated that no one at work can get it to work. I felt the same way about the Java EE stuff years ago. A simple "Hello World" service takes dozens of files and hours to setup.

I think, therefore I am. I think? - "George Carlin"
My Website: Indie Game Programming

My Twitter: https://twitter.com/indieprogram

My Book: http://amzn.com/1305076532