Advertisement

Multi-threaded shortest path

Started by October 05, 2007 04:41 AM
8 comments, last by bubbleez 17 years, 1 month ago
Hi, I am seeking for good algorithm to solve 'single shortest path problem' (big rectangular grid 10000x10000). Grid is VERY big and searching shortest path on single CPU is slow. I have quad core dual CPU server, so need algorithm that can utilize it. I already tried Bellman algorithm, but it uses dynamic programming and not effective for parallel computing. Thanks. Sorry for bad English.
I would cut the grid into a large amount of identical square blocks, and generate enough worker threads to fill all cores. Each block is written to by zero or one worker threads, and may be read from by any worker thread. The only synchronization here is that two worker threads must not catch the same block at the same time, reads and writes are not synchronized.

A worker thread basically gets an unused block which is tagged as 'to be examined', loads it and its neighbours from RAM, and performs a single-threaded relaxation algorithm on it (using Bellman, for instance) by reading the edge distances in the neighbouring blocks. When the relaxation algorithm terminates, the worker thread flushes the cache to RAM, releases the block and (if any modification happened on the edges of the block) tags all neighbouring blocks as 'to be examined'. It then repeats the work.

When the 'to be examined' list is empty and all blocks are unused, the algorithm is done. The block with the starting point is initially 'to be examined' while no other is.

[Edited by - ToohrVyk on October 5, 2007 6:24:19 AM]
Advertisement
I would try to solve this problem using better AI rather than more processing power. Try looking into Hierarchical A* solutions...

The only way I've seen pathfinders multi-threaded is to have two different queries running in parallel. You could try having sharing the A* search lists (with semaphores around them) and then have threads do read-only calculations before adding the next candidate, but I'm not sure that would gain you much performance... I expect it'll be quite a bottleneck.

I hope that helps.

Join us in Vienna for the nucl.ai Conference 2015, on July 20-22... Don't miss it!

[ToohrVyk]
I tried to implement your idea, works correct on single CPU:

#include <limits.h>#include <stdio.h>#include <stdlib.h>#include <windows.h>#define INFINITY ((1 << 14)-1)typedef struct {	int source; // source vertex id	int dest; // dest vertex id	int weight;} Edge;// 5x5 grid edgesEdge edges[] = {	{0,1,6},{0,5,10},{1,2,11},{1,6,7},{2,3,14},{2,7,11},{3,4,20},{3,8,10},{4,9,12},	{5,6,11},{5,10,7},{6,7,13},{6,11,8},{7,8,17},{7,12,12},{8,9,18},{8,13,9},{9,14,10},	{10,11,9},{10,15,8},{11,12,10},{11,16,9},{12,13,14},{12,17,13},{13,14,16},{13,18,8},{14,19,8},	{15,16,8},{15,20,9},{16,17,11},{16,21,10},{17,18,12},{17,22,14},{18,19,15},{18,23,10},{19,24,9},	{20,21,7},{21,22,8},{22,23,9},{23,24,16}	};int nvertx = 5;int nverty = 5;int nvertices = nvertx * nverty;int nedges = sizeof(edges) / sizeof(Edge);// 4 regions, 12 edges per region, 3x3 vertices per regionint regions[] = { // array of edge id's	0,1,2,3,9,10,11,12,18,20,14,5,	4,5,6,7,13,14,15,16,22,24,17,8,	18,19,20,21,27,28,29,30,36,37,32,23,	22,23,24,25,31,32,33,34,38,39,35,26	};int nregx = 2;int nregy = 2;int nregions = nregx * nregy;int nregedges = 12;int *distances;int *predcessor;int *tasks;int source = 0;int nthreads = 4;bool run = true;CRITICAL_SECTION critsect;// single threadedvoid BellmanFord(int source){	distances = 0;	for (int i=0; i < nvertices; i++) {		for (int j=0; j < nedges; j++) {			if (distances[edges[j].source] != INFINITY) {				int new_distance = distances[edges[j].source] + edges[j].weight;				if (new_distance < distances[edges[j].dest]) {					distances[edges[j].dest] = new_distance;					predcessor[edges[j].dest] = edges[j].source;				}			}		}	}	for (int i=0; i < nvertices; i++) {		printf("The shortest distance between nodes %d and %d is %d\n",			source, i, distances);	}	for (int i=0; i < nvertices; i++) {		printf("predcessor[%d]=%d\n",			i, predcessor);	}}bool process_region(int rid){	printf("process_region %d\n", rid);	int first = rid * nregedges;	int last = first + nregedges;	bool flag = false;	for (int k=0; k < nvertices; k++)	{		for (int i = first; i < last; i++)		{			int j = regions;			//printf("edge %d srcdist=%d dstdist=%d\n", 				//j, distances[edges[j].source], distances[edges[j].dest]);			if (distances[edges[j].source] != INFINITY) {				int new_distance = distances[edges[j].source] + edges[j].weight;				if (new_distance < distances[edges[j].dest]) {					distances[edges[j].dest] = new_distance;					predcessor[edges[j].dest] = edges[j].source;					flag = true;				}			}		}	}	return flag;}DWORD thread_proc(LPVOID arg){	while(run)	{		int rid=-1;		EnterCriticalSection(&critsect);		for(int i=0; i < nregions; i++)		{			if(tasks) {				tasks=0;				rid=i;				break;			}		}		LeaveCriticalSection(&critsect);		if(rid<0)		{			Sleep(10);			continue;		}		if(process_region(rid))		{			EnterCriticalSection(&critsect);			int px = rid - 1;			int nx = rid + 1;			int py = rid - nregx;			int ny = rid + nregx;			if(px>=0)				tasks[px]=1;			if(nx<nregions)				tasks[nx]=1;			if(py>=0)				tasks[py]=1;			if(ny<nregions)				tasks[ny]=1;			LeaveCriticalSection(&critsect);		}	}	return 0;}bool can_exit(){	for(int i = 0; i<nregions; i++)	{		if(tasks)			return false;	}	return true;}void reset_data(){	for (int i=0; i < nvertices; i++)	{		distances = INFINITY;		predcessor = -1;	}	memset(tasks, 0, sizeof(int)*nregions);	// source	distances = 0;	tasks[0] = 1;}void multi_threaded(){	HANDLE *h = new HANDLE[nthreads];	for(int i=0; i<nthreads; i++)		h=CreateThread(0, 0, (LPTHREAD_START_ROUTINE)thread_proc, 0, 0, 0);	while(run)	{		EnterCriticalSection(&critsect);		if(can_exit())			run=false;				LeaveCriticalSection(&critsect);	}	for(int i=0; i<nthreads; i++)	{		WaitForSingleObject(h, 1000);		CloseHandle(h);	}	for (int i=0; i < nvertices; i++) {		printf("The shortest distance between nodes %d and %d is %d\n", source, i, distances);	}	for (int i=0; i < nvertices; i++) {		printf("predcessor[%d]=%d\n", i, predcessor);	}	delete [] h;}int main(void){	InitializeCriticalSection(&critsect);	distances = new int[nvertices];	predcessor = new int[nvertices];	tasks = new int[nregions];	printf("single threaded:\n");	reset_data();	BellmanFord(source);	printf("\nmulti threaded:\n");	reset_data();	multi_threaded();		delete [] distances;	delete [] predcessor;	delete [] tasks;	DeleteCriticalSection(&critsect);	//getchar();	return 0;}
Well there's your problem...Bellman-Ford and Dijkstra are much slower than A*.
alexjc
drakostar

A* is good but I have no idea how to make it parallel. Also I want to run algorithm on different computers in network (maybe with MPI), so frequent synchronization of threads will kill performance.
Advertisement
Quote: Original post by baga
Also I want to run algorithm on different computers in network (maybe with MPI)


I've still got no idea what you're trying to do... If I were you I'd try to solve the problem more "intelligently" by dividing up the problem better before using traditional pathfinders.

If citeseer has no papers on this, it's probably not trivial to achieve.

What's wrong with HPA* (hierarchical A*)?

Join us in Vienna for the nucl.ai Conference 2015, on July 20-22... Don't miss it!

If your search space is a regular grid you could do the following:

You might be able to make a A* variant with 2 processes running each on a different CPUs one working from Startpoint to Target and the other from Target back towards the Startpoint. You would have to write to a common data area (grid) marking positions (nodes) reached and the cost. Each A* would be checking if new nodes had prviously been reached by the other thread and add that cost to their own (and thus not having to do the other half of the path).

Without a regular grid structure to do a quick lookup of the a positions 'visited' status the added processing to do a hash (or other) might be too costly to be much of an improvement over just a single thread A*.
--------------------------------------------[size="1"]Ratings are Opinion, not Fact
Iterative deepening A* (IDA*) is possible to parallelize, but I've no idea how well it works. I found a reference which suggests it works quite well though:

http://books.google.com/books?id=EcctUN9T8gMC&pg=PA327&lpg=PA327&dq=multithreaded+ida+-disassemble&source=web&ots=AbcXPefTir&sig=l1TrTfqJpwv58PBEaO6A7EsijxM
I have implemented multi-threaded Bellman-Ford block based algorithm, it works ~10 times faster than generic Bellman-Ford algorithm. Later I will try A*.
Thanks for help!

This topic is closed to new replies.

Advertisement