Heres the basic pipeline for each pixel.
a)retrive pixel color
b)find closest mapping in destination color space (16 bit, 8 bit palletted , misc)
c)add error componenets from previous pixels into each approriate color component.
d)take difference of each color component between the original color value vs the remapped color value plus error in source color space.
e)convert error into destination color space.
f)distribute error to adjacent pixel in any manner you prefer.
g)repeat for next pixel.
So for example lets do 24 bit -> 16 bit conversion.
source : 24 bit, 8 bit per channel color space
desintation : 16 bit, 5 bit per channel color space
(step A)
source pixel : 0,0,255 (RGB format)
(step B)
remapped into destionation we get :
destination pixel : 0, 0, 31 (RGB format)
(step C)
since this was the first pixel there are no error values to contribute for this pixel.
(step D)
to take the difference between the oringal color value and the ramapped color value in source color space we have to convert the ramapped color value back into source color space. Lets do this for each component:
ramapped pixel color : 0,0,31
delta_R = orignal_R - remapped_R*convert;
delta_G = orignal_G - remapped_G*convert;
delta_B = orignal_B - remapped_B*convert;
convet = some scaling factor which remaps the destination color space into the source in this case its 8.2258
delta_R = 255 - 31*8.2258;
delta_G = 0 - 0*8.2258;
delta_B = 0 - 0*8.2258;
delta_R = 0.025;
delta_G = 0;
delta_B = 0;
So we have a small red error component, mostly due to our low precesion remap value.
(step E)
remaped_error = error/convert;
convert all the color componenets.
(step F)
let do a simple error diffusion where all the error just goes to the next pixel. So add the errors into the next pixel error buffer, since we dont need to keep track of any pixel other then the next one we can use a preallocated reusable fixed buffer, no need to generate one for each pixel.
(step G)
Ya, now you do the next pixel, and so on.
Hope this helps, Good luck!
-ddn