Also, a variety of multi-entity architectures have been tried. The most famous failure is probably Sun Darkstar, which ended up supporting fewer entities in cluster mode than in single-server mode :-) They used tuple spaces, which ends up being an instance of "shard by ID." The other main approach is "shard by geography."
A "massively multiplayer" kind of server ends up with, at a worst case, every entity wanting to interact with every other entity. For example, everyone try to pile into the same auction area or GM quest or whatever. (Or all the mages gather in one place and all try to manabolt each other / all soldiers try to grenade each other / etc.)
N-squared, as we know, leads to an upper limitation to the number of objects that can go into a single server. Designing your game to avoid this, helps not just servers, but also gameplay. When there's a single auction area that EVERYBODY wants to be in, it's not actually a great auction experience (too spammy,) so there's something to be said for spreading the design out. (Same thing for instanced dungeons/quests, etc.)
Anyway, once you need more than one server, then you can allocate different servers to different parts of the world (using level files, or quad trees, or voronoi diagrams, or some other spatial index,) To support people interacting across borders, you need to duplicate an entity across the border for as far as the "perception range" is. This, in turn, means that you really want the minimum size of the geographic areas to be larger than the perception range, so you don't need to duplicate a single entity across very many servers. If by default you have chessboard distribution, and the view range is two squares, you have to duplicate the entity across 9 servers all the time. That means you need 10 servers just to get up to the capacity range of a single non-sharded server! The draw-back then is that you have a maximum density per area, and a minimum area size per server, which means your world has to spread out somewhat evenly. Because the server/server communication is "local" (only neighbors,) you can easily scale this to as large an area as you want, as long as players keep under the designated maximum limit. Many games have used methods similar to this (There.com, Asheron's Call, and several other.)
The other option is to allocate by ID, or just randomly by load on entity instantiation. Each server simulates entities allocated to them. You have to load the entire static world into each server, which may be expensive if your world is really large, but on modern servers, that's not a problem. Then, to interact between other servers, each server broadcasts the state of their entities using something like UDP broadcast, and all other servers decode the packets and forward entities that would "interact with" entities that are in their own memory. This obviously lets you add servers in linear relation to number of players, and instead you are limited by the speed at which servers can process incoming UDP broadcast updates to filter for interactions with their own entities, and you are limited by available bandwidth on the network. 100 Gbps Ethernet starts looking really exciting if you want to run simulation at 60 Hz for hundreds of thousands of entities across a number of servers! (In reality, you might not even get there, depending on a number of factors -- Amdahl's Law ends up being a real opponent.)
None of this is new. The military did it in the '80s on top of DIS. And then again in the late '90s / early '00s on top of HLA. It's just that their scale stops at how many airplanes and boats and tanks they own, and they also end up accepting that they have to buy one computer per ten simulated entities or whatever the salespeople come up with. There's only so many billion-dollar airplanes in the air at one time, anyway.
For games, the challenge is much more around designing your game really tightly around the challenges and opportunities of whatever technology you choose, and then optimizing the constant factor such that you can run a real-size game world on reasonable hardware. (For more on single-hardware versus large-scale, see for example http://www.frankmcsherry.org/assets/COST.pdf )