I've now got this as optimized as I can using the binary heap and doing as little FP as I can get away with but it's just not getting the performance I need. What are your thoughts on these techniques from the
gamasutra article hereIn particular, this:
A Faster Implementation of the Standard A*
Before proceeding with turning and smoothing modifications to the A* algorithm, let's start with some basic optimizations that can speed up the standard A* algorithm by a factor of 40 or more. To start with, the standard A* algorithm uses a sorted linked list (or two linked lists) to track nodes that are checked. Instead, we'll use a 60x60 fixed matrix. When starting a search from point a to point b, we find the midpoint between those two and place it at point [30,30] on our matrix. Each point on the matrix stores:
*
The cost to get to the point
*
The total cost through that point to the goal
*
The [x,y] location of its "parent" tile (the tile before it on the path)
*
A Boolean stating whether or not it is on the "Open" list of actively pursued nodes, and
*
The [x,y] locations of the Previous and Next nodes in the Open list.
We also keep a separate array of 1-bit Booleans, which store whether or not each node in our matrix has been touched yet during this search. That way, we can very rapidly initialize at the beginning of the search without needing to clear the entire matrix.
Whereas the original algorithm maintains a separate sorted Open list (actually a Priority Queue), we instead maintain basic list functionality simply by using Previous and Next pointers within the fixed array. Note that we do have the memory requirement for our 60x60 matrix, but our compacted data structure requires only 16 bytes per node, for a total of 57K. (Even expanding the matrix to 120x120 will only require 230K of memory.)
Note additionally that the "list" can be implemented as a binary tree (by having two Next node pointers at each element), but we've actually found it to be substantially faster to have a simple (non-priority) list. While this does result in time O(n) for the search for the lowest cost node at the top of the A* loop (rather than O(log n) for a priority queue), it excels in that all insertions and deletions, of which there are many, are only O(1). Best of all, it eliminates the inner loop search that checks if neighboring nodes yet exist on the Open or Closed lists, which otherwise would take O(n) (or maybe a bit better if a hash table is used).
Overall, by avoiding all memory allocations and list insertions, this method turns out to be dramatically faster. I have profiled it to be as much as 40 times faster than standard A* implementations.
Note that for the Directional search described later in this article, eight times the number of nodes are necessary, so the memory requirement will all increase by a factor of eight.