Advertisement

Commiting persistent server data

Started by October 19, 2016 10:04 AM
15 comments, last by Acar 8 years, 1 month ago

if I were to keep the data in a file in disk and if the data is unsorted, wouldn't it take too long to query what I need from that file?


If you don't save/load everything, but keep everything in an unstructured file on disk, then yes, that is kind-of the slowest of both worlds :-)

If you save/load pieces, then you typically either put each piece in some kind of database. This can be something heavyweight like MySQL or DB/2, or something simple like Berekeley DB or just a file with a table of contents index at the front, or something super-simple like a file per user/entity (with some recursive directory hierarchy to avoid 100,000 files in one directory.)

Also if the memory usage isn't a concern, is it still a bad idea to keep all data in memory?


If the performance is fine (maybe you can save the data asynchronously?) and you have no better use for the RAM, then that's fine.

Another option you may want to consider is memory mapping the file, which makes the file be automatically saved all the time without you having to worry about it.
mmap() on Linux; CreateFileMapping() on Windows.
enum Bool { True, False, FileNotFound };

Also if the memory usage isn't a concern, is it still a bad idea to keep all data in memory?


Eventually you'll run out. You'll have almost a thousand times more disk space than memory space, so you may as well use it where it makes sense. If you're using files anyway, there's no good reason to keep data in memory that you're not using, and may never use again (e.g. if a player never comes back).

Advertisement

.. or just a file with a table of contents index at the front ..

How would I go about implementing this because I couldn't find a way to append/subtract data at a certain offset in a file?

Also, hopefully not going off-topic, if I were to keep them in a file in a sorted format and only load/commit the active data, how would I go about implementing it, especially the sorting part. Is there any source code which I could look into?

Eventually you'll run out. You'll have almost a thousand times more disk space than memory space, so you may as well use it where it makes sense. If you're using files anyway, there's no good reason to keep data in memory that you're not using, and may never use again (e.g. if a player never comes back).

Any implementations you could recommend which is loading/saving from/to a sorted file? I'm using C and Windows but anything which would help me grasp the concept would be helpful.

I don't think a single sorted file is the way to go, because they are not very practical. Hplus0603 already gave one good alternative - a file per user/entity/player. That way, the file system is your sorting mechanism.

If you really want it to be all in one file, with quick access to arbitrary parts of the file, then there's software that will do that for you, and it's called a database. Again, Hplus0603 mentioned some names above.

I don't think a single sorted file is the way to go, because they are not very practical. Hplus0603 already gave one good alternative - a file per user/entity/player. That way, the file system is your sorting mechanism.

If you really want it to be all in one file, with quick access to arbitrary parts of the file, then there's software that will do that for you, and it's called a database. Again, Hplus0603 mentioned some names above.

Thanks for the reply. I'll look into that alternative.

there's no good reason to keep data in memory that you're not using, and may never use again (e.g. if a player never comes back).


Shh! Don't tell the Redis folks!

(FWIW: We run the biggest single Redis instance I know of, with 768 GB of RAM. This turned out to be a mistake, because the entire kernel locks up for 10 seconds each time it forks to checkpoint the data.)

How would I go about implementing


At that point, you're building the lowest-level component of a database (or, for that matter, file system) which is "indirect block allocation and management."
A very simple way to do that is to split your file into chunks, say 1 MB each, and have each chunk link to the next chunk when it gets full. To read all the data, you follow the chain of links and concatenate all the data.
A slightly more sophisticated way is to make the first chunk be an array of chunk offset, and each time you need another 1 MB chunk, add a new offset to the table, and when the table runs out, you either say "file is full," or you apply the linked-list of table chunks, or you add a second layer of indirection.
(Chunk size varies by application -- 1 MB may be way too big or not big enough, depending on what you're doing.)
An even more sophisticated way of doing this is to structure your data in an ordered index -- at this point, you'll want to read up on B-trees, B*-trees, and other such structures, because you're well on your way to building your own database!

Simple math example:
Let's assume 1 MB chunks. Let's assume 64 bit file offset.
1 MB can fit 128K of file offset pointers. Each pointer references a 1 MB chunk of file data.
Maximum size of data stored in file: 128K * 1M == 128 GB of data.
enum Bool { True, False, FileNotFound };
Advertisement

A very simple way to do that is to split your file into chunks, say 1 MB each, and have each chunk link to the next chunk when it gets full. To read all the data, you follow the chain of links and concatenate all the data.

A slightly more sophisticated way is to make the first chunk be an array of chunk offset, and each time you need another 1 MB chunk, add a new offset to the table, and when the table runs out, you either say "file is full," or you apply the linked-list of table chunks, or you add a second layer of indirection.
(Chunk size varies by application -- 1 MB may be way too big or not big enough, depending on what you're doing.)
An even more sophisticated way of doing this is to structure your data in an ordered index -- at this point, you'll want to read up on B-trees, B*-trees, and other such structures, because you're well on your way to building your own database!

Simple math example:
Let's assume 1 MB chunks. Let's assume 64 bit file offset.
1 MB can fit 128K of file offset pointers. Each pointer references a 1 MB chunk of file data.
Maximum size of data stored in file: 128K * 1M == 128 GB of data.

Much appreciated the detailed answer. I'll read up on these and invest further time into building something similar.

This topic is closed to new replies.

Advertisement