Advertisement

Considering an RPC or message oriented server architecture, confused

Started by January 12, 2016 04:50 PM
7 comments, last by Krohm 8 years, 10 months ago
Hello network experts! I somehow managed to get an idea for a promising application and I'm slowly executing it. This time networking is required and since I am moving to a whole different platform (Android) I decided to take a few days to figure out something which is state of the art. In theory. In practice I'm fine with something industry standard at least.
While looking for serialization solutions I couldn't avoid stumbling on google protocol buffers. I'm 100% on them.
When it comes to service definition however I'm a bit more confused.
My system is very easygoing. It models a fully sequential process so basically I have a list of peers connected to a server and the server decides who gets to act when. The active device gets its chance at modifying the shared data and then signals itself inactive (the server can also timeout the user). So it's all about manipulating this shared state, no surprises.
The server 'owns' the data in the sense it initializes gathering of the clients and keeps persistent data on it but in practice the clients are allowed to manipulate the data and they have power even on other people's data within the limits of the protocol.
In my head I modeled it as message passing / events: when you get a Active(ID) and ID == yours it's your turn to do stuff. Now you can send Shuffle or ChangeState messages. Most of those messages are not confirmed. I just expect the server to push a different, updated state.
This is where the thing starts to be odd. Of course I wanted to streamline even more by using protobuf service definitions and this is where stuff starts to go awry. Sure, there's Empty but I think the underlying problem would still hold.
Most of the examples I've seen look like this:


service EasyGoing {
    rpc JustCallMe(SomeRequest) returns (SomeReply);
}
I get the idea that most RPC calls are about getting stuff back and not really about manipulations 'returning' void.
Plus, the client has it easy but the server also needs to call stuff on the client. Granted, I could have a ClientService I'm considering the thing and I would rather do full message passing at that point...
Furthermore services can stream back results so for the case of my Active message/event I could do something like

service CentralArbitrator {
    rpc SendMeActiveEvents(Empty) returns (stream ActiveMessage);
}
I suspect I haven't fully grasped what RPC is. I always assumed it was just a layout on top of networking to have a somewhat unified interface (talking simple blocking stuff here) but I have got the vibe we're talking mostly about get-resources somehow, like most of the requests to be stateless, with a heavy client-server architecture.
By using something like the above, I would have each client register to the various events. I could still get the 'RPC' sticker (evrybody luvs biz talk) but I have the impression this isn't what an RPC-oriented system is supposed to do. Would that make sense to you?
I feel like I'm missing something fundamental about the theory.
I think I can put it short as follows:
What would you suggest to consider when evaluating an RPC-oriented architecture vs an explicit message passing one?

Previously "Krohm"

I haven't really looked into google protocol buffers so I cannot comment on that. However, I don't really agree with your view of RPCs. An RPC is a Remote Procedure Call, and nothing more. In other words it is a way for a program to call a function on another computer (or motorcycle or gerbil or whatever happens to be on the other side of the network cable) without having to go into details about how the communication is handled. That's it. They can be synchronous or asynchronous. They can return values or not. They are still RPCs.

In my engine the network communication is all handled through RPCs (except for some connect/disconnect events and such). However, the application layer programmer does not necessarily need to concern themselves with writing RPC code, as any data exchange system can be modeled on top of the RPCs. Remember, RPCs are just about calling functions (it makes sense to have some channeling information and calling context info at hand, eg. it might be nice to know who called an RPC function on the server).

For example, I have a concept called "propagated values" which are wrappers around plain old data members. They have a network identity and a concept of ownership. So you can do "PropagatedValue<Vec3> position = newPosition;" and if position is owned by the current node its value is propagated to all other nodes. (In the actual code PVs are usually class members and they need to be registered on the network before they can be used, but you get the idea.) At this level no-one is concerned with RPCs anymore even though the system is driven by doing RPC calls.

My point is, having a good RPC system does not solve (m)any problems but it is a good base for modelling higher level logic on top of it, be it messaging systems, service layers or streaming protocols. If you go down that route, my advice is to make it asynchronous (the network will make you wait anyway, no point in blocking the app) and add return value support (which means having some sort of callback mechanism in place because of the asynchronicity). Also add a way to define some metadata about the RPC (who is allowed to call who, which calls are reliable etc).

Hope that helps.

Advertisement
RPCs that "return" void is not that uncommon -- telling the server "something happened" without necessarily needing a response is a fine use case.
RPC in the other direction (server telling client to do things) is also not that uncommon. I would prefer that setup to one where the client has to keep polling for events.

That being said, you might want to think in terms of asynchronous messaging, rather than RPC. It tends to make the asynchronicity and unpredictability of networking more clear, and tends to introduce fewer bugs related to RPC calls not being instant.
A second-best option is to make your RPC libray not support a blocking operating mode. So, to call some RPC, the API would look like:

class MyRPC {
  public:
    virtual Operation *CallFunctionA(ArgsForFunctionA const &args, FunctionACompleted *fac);
};
class FunctionACompleted {
  public:
    virtual void OnComplete(Operation &op, FunctionAReturnArgs const &ret) = 0;
};
The idea being that you provide a FunctionACompleted implementation to the RPC system and it will call it when Function A has actually completed. In the meanwhile, you have some Operation that you can use to cancel the RPC if you really have to. (Note: Canceling the RPC typically means "do not call the completion function anymore" but it doesn't mean "don't execute whatever the function is on the server" -- that's too late, and probably already happened!)

This necessity to keep state around for "outstanding" RPCs makes RPC architectures more fragile and bug prone.
I much prefer asynchronous messaging that is just loosely coupled.
A server can send a client "here's a list of connected clients" message.
Perhaps the server will always do that when a client connects, but it might also do that at other times.
The client is then implemented to refresh its list of connected clients whenever this message comes in, reactively, rather than having to proactively ask the user to send it the list.
In the end, this "reactive," loose model ends up being more robust in my experience.
enum Bool { True, False, FileNotFound };

Cancellation is another thing I haven't fully considered. In my system, all the messages are critical but as I send them through a TCP pipe I just take for granted they will complete at a certain point.

What is a completion callback for you? Back when I played a bit with JSON-RPC I noticed some messages had to be confirmed while others didn't. Are the callback result the 'function return value' or also include the 'call executed' status?

The main problem I see is that in the RPC-oriented client-server system the only viable way I see to send events is by in fact generating the various return 'result streams' and keeping them around. That's a whole world better than having a ServerService and a ClientService but still a far cry from just parsing an unique source of data.

I honestly never got on this RPC thing especially considering with some abstractions all systems would be RPC due to Single Responsability.

Nonetheless, I will be doing an oldschool message system, mostly because I have spent enough time thinking about this already and I don't think this whole RPC mindset applies at all. Perhaps it was a big thing when serialization and marshalling was a big thing.

Previously "Krohm"


My system is very easygoing. It models a fully sequential process so basically I have a list of peers connected to a server and the server decides who gets to act when. The active device gets its chance at modifying the shared data and then signals itself inactive (the server can also timeout the user). So it's all about manipulating this shared state, no surprises.

Keep in mind that regardless of the calling model, this kind of shared-state concepts is a headache to implement, especially if you need to consider scenarios such as "client got access to (effectively lock on) shared state - partially modified shared state leaving it inconsistent and was going to modify it further to make it consistent back - and connection got lost, what server should do now?". If it cannot realistically happen (like "all the clients are actually on the same physical box communicating over shared memory") - it is one thing (even in this case it is a rather bold assumption BTW), but over the Internet you do need to consider this kind of stuff.

Usually, it is much better to have game-specific atomic operations on the server-side (like "Dear Server, please move this guy 5 meters to the left"), opposed to client itself modifying shared state to the same effect. I think it is better to understand your client-server interaction (which looks quite unusual) before going into async-RPC-vs-messages question (which, leaving syntax aside, can be made pretty much the same, though it depends on what exactly you have on the receiving side).

This is exactly what I meant by


Most of those messages are not confirmed. I just expect the server to push a different, updated state.

I honestly don't see much difference. Whatever the client sends updated state or a request to update it there are indeed more problems besides atom-icity.

I have plans to allow client reconnect in the future but for first iteration the only effort spent in robustness is in a proper protocol design.

Previously "Krohm"

Advertisement

Phew :-) . I was afraid that you're doing come kind of locking to get access to that shared-state (however crazy it is, I've seen that too).

Then if it is already lock-less, then I (give or take) agree with the post of hplus0603 above (disclaimer: my reasoning is a bit more different, but gets pretty much to the same point for your case). Most importantly: whatever you're doing, stay away from blocking RPC calls :-).

What is a completion callback for you?


It is a specific mechanism that lets you retrieve information about the completion of an asynchronous operation, implemented as "continuation passing."


In synchronous code, you'd write something like this (in glorious pseudo-code):

onUserSelectedShowPlayers():
  allPlayersWindow.show()
  players = rpc.call("getAllPlayers()")
  allPlayersWindow.playerList.setPlayers(players)

In continuation/callback code, you'd write something like this:

onUserSelectedShowPlayers():
  allPlayersWindow.show()
  rpc.call("getAllPlayers()", onGetAllPlayersComplete)

onGetAllPlayersComplete(players):
  allPlayersWindow.playerList.setPlayers(players)

The synchronous/blocking code will make your GUI lock up while it's waiting for the network. The asynchronous/callback/continuation code will let the user keep interacting with the GUI while the request is outstanding.

You'd be surprised at the number of apps and games that make blocking network calls from the main loop, locking up the GUI at times. (Even something as simple as a DNS query can take several seconds.) I hate it, because it makes for such slow, jerky, unpolished user interfaces.
There are other reasons to support more fine-grained asynchronous task structure of your app, too, such as easier testability.
enum Bool { True, False, FileNotFound };

I see it's a whole lot more than what I have thought.

I assume you will be pleased to know Android will kill the app if you attempt some network operations from the gui thread. I don't remember the exact name of the exception but it's something like NetworkOperationAttemptedOnMainThreadException. Super ugly but hey, I support.

I had some hints about continuations when I considered some improvements to std::future in some cppcon presentations. I don't see any way to make it work for me but perhaps I haven't even tried.

Today I mashed together some type-safe 'funneling' pumper. Hopefully I'll be done soon!

Thank you very much!

Previously "Krohm"

This topic is closed to new replies.

Advertisement