My game has reached a point where I need voice or text objectives to tell the player what to do next. The HUD provides visual pointers for the next objective, but a verbal or text representation is badly needed and it would give the ability to layer in a storyline and add some depth.
I think my favored approach is speaking avatars in the HUD, like in StarCraft, but I'm not sure how to achieve this. I'm open to buying lip-synced 2D sprites if there is such a thing, but it may be difficult to match my visual style with sprites. Another idea that I've considered is trying to record video sequences of actual people (either wholesale or splicing frames/clips). Anybody try to do either of these? Advice? Is it as hard as it seems?
Doing a voice-over without avatars has struck me as a more practical alternative, in conjunction with a text representation of objectives (overlay when objectives change or when the game is paused). I fear this approach would be less immersive, but is it good enough for a single-player indie game? Are there examples of this that you think worked well?
Here's a video of the gameplay to give you some context: