I don't understand this: He has a specific starting configuration and we are suggesting that he run A* once to reach a goal configuration. What's wrong with that?
Agreed; nothing's wrong with that. My comment was about computing an entire policy offline; for that, running a brushfire-type algorithm once beats running A* O(n^2) times (which would do a lot of redundant work). It's not a super-profound observation, but I thought it might be worth mentioning since A* tends to get treated as a black box.