Rewaz said:
I don't know if there is a better way to load those files, I wanted to do the code from 0 ( the game use D3D9 ) and I didn't knew if I should use FILE , std::ifstream, which tenchniques I can use to improve the speed of loading those assets
From my experience, it is harder to write simple than optimized code. I made my Bitmap loader quite easy (I also support PNG, JPEG and TGA files) and it runs in a good time depending on the size of the image.
bool Asset::BmpReadHeader(IDataReader& stream, BmpHeader& header)
{
byte uint[4] = { 0 };
if(stream.Get() != 'B' || stream.Get() != 'M') return false;
stream.Read((byte*)&header.Size, 4);
stream.Read(uint, 2);
stream.Read(uint, 2);
stream.Read((byte*)&header.OffBits, 4);
stream.Read(uint, 4); if(*(uint32*)uint != 40)
return false;
stream.Read(uint, 4); header.DataHeader.Width = *(uint32*)uint;
stream.Read(uint, 4); header.DataHeader.Height = *(uint32*)uint;
stream.Read(uint, 2); header.DataHeader.Planes = *(uint16*)uint;
stream.Read(uint, 2); header.DataHeader.BitCount = *(uint16*)uint;
stream.Read(uint, 4); header.DataHeader.Compression = *(uint32*)uint;
stream.Read(uint, 4); header.DataHeader.SizeImage = *(uint32*)uint;
stream.Read(uint, 4); header.DataHeader.X_pels_per_meter = *(uint32*)uint;
stream.Read(uint, 4); header.DataHeader.Y_pels_per_meter = *(uint32*)uint;
stream.Read(uint, 4); header.DataHeader.Clr_used = *(uint32*)uint;
stream.Read(uint, 4); header.DataHeader.Clr_important = *(uint32*)uint;
byte BitsPerPixel; if (header.DataHeader.BitCount == 32) BitsPerPixel = 32;
else BitsPerPixel = 24;
header.Buffer = header.DataHeader.Width * (BitsPerPixel / 8) * header.DataHeader.Height;
return true;
}
bool Asset::BmpRead(IDataReader& stream, BmpHeader const& header, byte* data)
{
uint32 BytesPerPixel = header.DataHeader.BitCount / 8;
uint32 BytesPerRow = BytesPerPixel * header.DataHeader.Width;
uint32 BytePaddingPerRow = 4 - BytesPerRow % 4;
if (BytePaddingPerRow == 4) BytePaddingPerRow = 0;
if(header.DataHeader.BitCount <= 8) //using color table
{
uint32 colors = ((1 << header.DataHeader.BitCount) * 4);
byte *colorMap = MainAllocator::Allocator().Allocate<byte>((size_t)colors);
stream.Read(&colorMap[0], colors);
for(uint32 y = header.DataHeader.Height; y > 0; y--)
{
for(uint32 x = 0; x < header.DataHeader.Width; x++)
{
uint32 clIdx = stream.Get() * 4;
for(uint32 i = 0; i < 3; i++)
data[(((y - 1) * header.DataHeader.Width) + x) * 3 + i] = colorMap[clIdx + 1];
}
for(uint32 i = 0; i < BytePaddingPerRow; i++)
stream.Get();
}
MainAllocator::Allocator().Release(colorMap);
}
else if(header.DataHeader.BitCount == 16)
{
for(uint32 y = header.DataHeader.Height; y > 0; y--)
{
for(uint32 x = 0; x < header.DataHeader.Width; x++)
for(uint32 i = 0; i < BytesPerPixel; i++)
stream.Read(&data[(((y - 1) * header.DataHeader.Width) + x) * (BytesPerPixel * 2) + i], 2);
for(uint32 i = 0; i < BytePaddingPerRow; i++)
stream.Get();
}
}
else //24bit Rgb and 32bit Rgba image
{
for(uint32 y = header.DataHeader.Height; y > 0; y--)
{
for(uint32 x = 0; x < header.DataHeader.Width; x++)
for(uint32 i = 0; i < BytesPerPixel; i++)
data[(((y - 1) * header.DataHeader.Width) + x) * BytesPerPixel + i] = stream.Get();
for(uint32 i = 0; i < BytePaddingPerRow; i++)
stream.Get();
}
}
return true;
}
Everything in my engine code has a streaming interface, I don't use any binary blocks allocated in heap to load everything at once because this would rather take much more time and wastes resources at all. My streams are custom classes that inherit from a common interface (just showing the in-stream as the out one is not relevant here)
/**
Interface abstracting I/O read operation to certain data
*/
interface IDataReader
{
public:
/**
Class destructor
*/
virtual ~IDataReader() {}
/**
Gets current position of the reader
*/
virtual int64 Position() const = 0;
/**
Sets current position of the reader
*/
virtual void Position(int64 pos) = 0;
/**
Gets the underlaying data size in bytes
*/
virtual int64 Size() const = 0;
/**
Gets if data pointer has reached end bit
*/
virtual bool Eof() const = 0;
/**
Reads the next byte without incrementing the data pointer
*/
virtual byte Peek() = 0;
/**
Reads the next byte of data
*/
inline byte Get()
{
byte bt; Read(&bt, 1);
return bt;
}
/**
Reads size bytes into the given memory. Returns 0 when anything
failed inside the file handle, otherwise the size read.
*/
virtual size_t Read(byte* buffer, size_t size) = 0;
/**
Reads size bytes into an other stream. Returns 0 when anything
failed inside the file handle, otherwise the size read.
*/
api size_t Copy(IDataWriter& stream, size_t size);
};
My InFileStream implements this interface and is build from two fundamentals, the specific OS API to open file handles and read the data, as same as a circular buffer to simulate a continous read. Every read operation performs a lookup if the buffer has any data to read and otherwise fills the next N bytes fof the file into the buffer. 256 byte turned out to be a good size for the buffer to cache enougth data from disk but not waste too much memory on smaller files. The time consuming operation in here is fetching the data from disk!
However, some ms are ok when loading a file from disk.
As I wrote above, if you need more speed, you have to use Memory Mapped File I/O which maps your data to virtual memory pages managed by the OS and offers access via pointer to them. I also implemented a MemoryStream class that wraps around that pointer and a given size to make static memory also available as stream (you remember, everything is working on streams ? )
Rewaz said:
like a static map always loaded X texture, since it's used like 80% of the time, when I need it, I can instant get it. Have some “refCount” so when 2 obj use the same texture, I can use same loaded pointer for both.
This is a matter of optimizing memory, not load times and you should ALWAYS use references for your models rather than have each model it's own texture instance. You should also use the same Mesh, Shader etc. for similar models as well.
Rewaz said:
The textures are packed into .pak files ( example, I have 500mb in texture folder, I will do like 500 / 100, so I will have 5 files each 100mb with all the textures inside ), and there is a file that it is encrypted, which contains all the data ( fileName, offset, endPos ) like a DB of those assets, so I read that header, and I know if I want “PLAYER.MODEL.02” I read that file, and it will tell me in which file of those 5, my texture is, in which offset starts and in which pos it ends. that way I will have a buffer like char buffer[ end - offset ] with a the raw data of the .DDS.
This solution sounds similar to what I use, however our .sep archives also use some digital signature algorythm and hashes to ensure no one modified it. A major difference, also in performance between our and your format is that we use 64k chunks and padd our data to them. As I wrote above, this has the advantage of filling an entire disk cache line and so prevents jumping around on the harddrive. Also 64k is a multiple of 4k, which is the usual OS memory page size.
Once in a job I also wrote a database from scratch with the same techniques, it was able, thanks to Memory Mapped I/O, to handle up to 5 TB of data while access time was even faster than reading something from disk by a stream. So how did I do that:
- Handle everything in chunks of 64k
- Always load a full chunk, no matter if you need 1 or 1000 bytes
- Don't try to optimize storage, some trash bytes in between chunks is ok
Also our package tool is trying to puzzle data into chunks on the most optimized way and after that writes the header chunks. It may happen that a file is taking several chunks on it's own, shares some chunks with other data or is so small that multiple files are on one chunk. It doesn't matter where the data are, the primary goal is to have them stored for linear access in page/chunk boundaries.
Rewaz said:
Sorry, didn't quite understand, how do I fill the HDD cache?
Data needs to be as large as the disk cache. This is usually 64k bytes. If your data is smaller, fill the remaining space with thrash bytes to get the desired size, if it is larger: pad it to the next multiple of that size.
Rewaz said:
I don't want something that complicate honestly, I preffer something like MemoryPool with auto alignment. That's why I preffer some lib, bc I know there should be a good one, not the best, but useful.
I don't guess there is a library out there you want to use and be happy with for the rest of your life. As I wrote, memory allocation is always a matter of personal taste and switches from engine/developer to one another.
Do you want memory buckets, garbadge collection or even something not thought about yet, do you prefer memory reallocation and how do you handle pointers, what about different platforms and memory alignment for CPU? Those questions offer millions of possible ways to handle everything and so a memory allocator will.
I really suggest reading the blog first and then think about what you want and if you really “wanted to do the code from 0”, you'll also get this done. It is however important to write your own std::shared_ptr to have whatever memory manager you intend to use, the chance to reallocate memory ofr better use. This is the main secret of memory management, otherwise you could fire and forgett malloc/free and don't care about anything. The real magic happens when you optimize the memory by filing gaps, moving data around and handle all of these in a multithreading environment is a lot of fun (and failure)