I've already worked on multiple projects like this in past (partially on the university and then during my career - I can share some details).
One of them had models exported in a specific format and executed a batch file which sent the scene to the server, where automated job picked it up in a sequence and pre-processed (e.g. building Split-BVH mainly). It wasn't a cloud though, just dedicated server with GPU rendering. Originally we thought of using MaxScript but in the end decided against it - interchange format like Collada or FBX is by far better, simply because some artists in the group preferred different tools. After having the scene processed, users could send requests with camera definition and received an image back (with quality settings).
Another one was just a post-script in MaxScript that performed a render with specific parameters. After that, the image was taken and processed (from within the maxscript) with special software that performed advanced de-noising. And stored results, which were presented to the user in a new window. This wasn't done on separate server, but as a new process spawned from within the maxscript. Side note, MaxScript is quite tricky and may be a bit slow - I personally don't like it, but as per client requests we had to use it to interact directly with 3ds Max software.
And another one was for simple product image generation - model and textures were updated with script from specific folder to server, which returned a sequence of images (from viewpoints around the object). This was running a service on dedicated server which we were communicating. Again, using GPU based rendering and throwing a sequence back directly was acceptable solution.
...
In short: In general if you have a server processing requests like this, you need to consider that you need to (at least):
-
Have a way to send requests (custom software/scripts - most of the time we used a custom software, that connected to the server and sent all necessary data ~ e.g. the scene (with all textures) and the command, possibly only the command when the scene was already prepared there), the software also waited for the response with results which were later presented to user.
-
Have a way to process requests on server-side (job like software that will have a queue of received requests and spawn process for each one that was received, for the finished ones it sends back the results). This one is technically just a "queue" and "dispatcher". However you have to think off how to get results back to the requester.
-
Rendering software (spawned as the process from server-side job). In our case this was a custom GPU-accelerated path tracer - which took me most of the time, and was the piece of software that was doing the actual rendering.
There are possibilities of fusing the rendering software into server-side job, or doing other changes in the design. I did it for two of the above cases this way, as it was good enough for what was required in the end (which is by far the most important thing - that your solution will solve the problem). Bear in mind though, that each of this may take quite a lot of time to design & implement - and that of course there are multiple ways how to improve further on (if and when necessary).
In the end, one of the most challenging tasks was to make it easy enough for the actual artists to use (they won't write command line arguments or such), they need it simple enough to setup, with settings they do understand!
...
If you have any questions on details, feel free to ask.