Styles of distributing computing (push,pull)
Hi, OK, this is not really game related, just a CS project at local university. No, it's not direct homework help either, but rather discussion about principles. So basically, we need to implement a kind of distributed computation. A centralized system seems best: one machine is the master, who other machines (slaves) serve. Then there are two basic choices how the system would work: 1) push - the master connects to slaves (e.g via ssh) and runs processes on them 2) pull - the slaves connect to the master, asking it for job Now, I really don't know much about these things, but I could perhaps make some rough guesses about their respective pros and cons: Push strategy, pros: easy to implement e.g via shell scripting, cons: the master needs to know all the slaves beforehand *) Pull strategy, pros: the master doesn't need to know the slaves, cons: hard to implement - a real programming language must be used to implement the interchange of jobs/results But perhaps you guys have some more experience on the subject, "common wisdom" sort of. Could you share any? Thanks, -- Mikko EDIT: *) actually, not necessarily: there could still be a separate "notify-of-existence" program through which slaves could inform the master about their existence
Hey,
Just use a library to do the distribution, since it handles most of the mundane and common-place tasks of distributed computing. Of course, this requires you do "real programming". My pick is MPICH, and it looks like there is a version 2 out now.
Just use a library to do the distribution, since it handles most of the mundane and common-place tasks of distributed computing. Of course, this requires you do "real programming". My pick is MPICH, and it looks like there is a version 2 out now.
Turring Machines are better than C++ any day ^_~
The MMO library I'm working on operates on a registration principle:
A single master server is responsible for administration of the cluster.
A new slave server is provided a master server address as a startup argument.
The slave connects to the master and is authenticated and registered. The master then assigns tasks to the slave, informing it and other slaves of each other where relevant.
A client connecting to the cluster initially (after authentication) is assigned to the master server, which passes it on to slaves as the client's point of presence shifts around the relevance graph. Slaves can also pass the client directly to each other.
Event propogation follows the relevance graph of the system - this is covered in detail in another thread.
The overall cluster size is therefore not fixed, and can vary from minimal (single box, master process, db process, optional login process) to large (as previous, but many slaves) to widely distributed (since a slave is not required to be on the same local network, though for security and performance the entire cluster should be on a fast private network, with a large pipe and capable router.
However, cluster operations like this require either kernel modifications to the OS if you're looking at load sharing job distribution (an interesting project would be to modify a linux kernel to distribute tasks this way), or the system to be distributed to be custom written to operate this way. I'm fairly certain we had a 3D package at Climax that operated a render farm principle this way... can't remember which one it was though.
Generally since it's a specific task that requires a lot of resources that ends up getting distributed, the task software is often designed from the outset to operate in a distributed environment.
A single master server is responsible for administration of the cluster.
A new slave server is provided a master server address as a startup argument.
The slave connects to the master and is authenticated and registered. The master then assigns tasks to the slave, informing it and other slaves of each other where relevant.
A client connecting to the cluster initially (after authentication) is assigned to the master server, which passes it on to slaves as the client's point of presence shifts around the relevance graph. Slaves can also pass the client directly to each other.
Event propogation follows the relevance graph of the system - this is covered in detail in another thread.
The overall cluster size is therefore not fixed, and can vary from minimal (single box, master process, db process, optional login process) to large (as previous, but many slaves) to widely distributed (since a slave is not required to be on the same local network, though for security and performance the entire cluster should be on a fast private network, with a large pipe and capable router.
However, cluster operations like this require either kernel modifications to the OS if you're looking at load sharing job distribution (an interesting project would be to modify a linux kernel to distribute tasks this way), or the system to be distributed to be custom written to operate this way. I'm fairly certain we had a 3D package at Climax that operated a render farm principle this way... can't remember which one it was though.
Generally since it's a specific task that requires a lot of resources that ends up getting distributed, the task software is often designed from the outset to operate in a distributed environment.
Winterdyne Solutions Ltd is recruiting - this thread for details!
Quote:
hard to implement - a real programming language must be used to implement the interchange of jobs/results
Not necessarily. You could still use shell connections to retrieve the data to be worked on. Have the clients SSH to the server and "check out" a chunk of work -- no real language needed. The only thing you need to worry about is two clients connecting to check out at the same time, so the "check out" piece needs to be synchronized.
Clients then ssh back in to "check in" the pieces they complete. The master would likely predict how long it "should" take a client to complete a piece of work, and if the client is way behind, it'd give that piece to someone else as well, and accept the first one to complete; this adds robustness in the face of client disconnection.
In fact, I think this would not only be more scalable (each client only needs to know about the one server), but actually easier to implement. The components are:
job setup: Run on the server, once; delimits the "chunks" of work and marks them all as started by 0 clients so far. Put all chunks in a queue.
chunk checkout: Serialized. A client connects, and the server takes the next un-finished chunk in the queue and hands it to the client, incrementing the "client count" of the cunk by one.
chunk checkin: Serialized. A client connects, and hands results back to the server. If the chunk is currently not complete, the data is accepted, and the chunk is removed from the incomplete queue. If all chunks are complete, you declare success. If the chunk was already complete, just ignore the check-in.
status: Used on the server to see what chunks are being processed, what chunks are done, and what chunks are left.
client: Whatever perl or shell script that runs on the client, connects to the server, and retrieves a chunk to work on. It completes work on the chunk, checks the results back in, and attempt to get another chunk, until there are no more chunks.
enum Bool { True, False, FileNotFound };
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement
Recommended Tutorials
Advertisement