Simple way to create static sorted array?
I''ve had this kind of thing pop up a couple of times, and I never really have a good solution for it.
I have a function that needs to take some action because of a string. There''s a list of actions to take, and I have to pick one. For example, I have an equation parser that uses CStrings as tokens:
void arithmetic_op (const CString& left, const CString& right, const CString& op, CString& result)
{
if (op == "+")
result.Format ("%d", atoi (left) + atoi (right));
else if (op == "-")
result.Format ("%d", atoi (left) - atoi (right));
// etc, etc
}
Basically, I''m just doing a linear search through my list of possible op strings. I do the same kind of thing for many different functions. My problem is efficiency--it''s a linear search every time, and sometimes my list of strings (14 operators, 60 keywords, etc. etc.) gets very long.
The BEST way to do this would be a binary search. STL has one built-in, but the problem is that you need to create a SORTED array. Creating a static array is a piece of cake, but if you change your array (add a keyword or op), you''d have to make sure you physically put it in the right place in the array. This is error-prone for a keyword list, and downright difficult for a list of operators (is "+" < "-" ??).
So I''d like a way to automatically create a sorted array, i.e. programatically sort it. Obviously there are sorting functions, but what''s the best way to make them only run once at program start?
I know pluggable factories is one solution for this, but that seems like a real over-design because:
1) I don''t want to make a class for every operator--that''s just silly.
2) Since it''s just simple strings, and the list can''t change at runtime, the overhead of using a map structure is unwarranted.
Can anybody think of some good way to do this? I''d like to keep everything function-local if possible.
CString::Compare is probably what you want:
CString::Compare
int Compare( LPCTSTR lpsz ) const;
Return Value
Zero if the strings are identical, < 0 if this CString object is less than lpsz, or > 0 if this CString object is greater than lpsz.
Parameters
lpsz
The other string used for comparison.
Remarks
Compares this CString object with another string using the generic-text function _tcscmp. The generic-text function _tcscmp, which is defined in TCHAR.H, maps to either strcmp, wcscmp, or _mbscmp depending on the character set that is defined at compile time. Each of these functions performs a case-sensitive comparison of the strings, and is not affected by locale. For more information, seestrcmp, wcscmp, _mbscmp in the Run-Time Library Reference.
Example
The following example demonstrates the use of CString::Compare.
// example for CString::Compare
CString s1( "abc" );
CString s2( "abd" );
ASSERT( s1.Compare( s2 ) == -1 ); // Compare with another CString.
ASSERT( s1.Compare( "abe" ) == -1 ); // Compare with LPTSTR string.
CString::Compare
int Compare( LPCTSTR lpsz ) const;
Return Value
Zero if the strings are identical, < 0 if this CString object is less than lpsz, or > 0 if this CString object is greater than lpsz.
Parameters
lpsz
The other string used for comparison.
Remarks
Compares this CString object with another string using the generic-text function _tcscmp. The generic-text function _tcscmp, which is defined in TCHAR.H, maps to either strcmp, wcscmp, or _mbscmp depending on the character set that is defined at compile time. Each of these functions performs a case-sensitive comparison of the strings, and is not affected by locale. For more information, seestrcmp, wcscmp, _mbscmp in the Run-Time Library Reference.
Example
The following example demonstrates the use of CString::Compare.
// example for CString::Compare
CString s1( "abc" );
CString s2( "abd" );
ASSERT( s1.Compare( s2 ) == -1 ); // Compare with another CString.
ASSERT( s1.Compare( "abe" ) == -1 ); // Compare with LPTSTR string.
This might sound a bit weird, but if you''re doing what I think you''re doing, why not use sscanf? That will save you the trouble of converting half the stuff I see you converting in your tokenized arithmetic operation up there. (I know it doesn''t answer your overall array question but thought I''d make the suggestion anyway)
Now, this is going to sound weird too, but... why not use function pointers? If you use GetProcAddress you can pass it your "operator" string (and the library module of course) and pick up the function without having to do massive if''s on it, then you just have to have your "operator" functions in a .dll and use LoadLibrary to pick it up.
Now, this is going to sound weird too, but... why not use function pointers? If you use GetProcAddress you can pass it your "operator" string (and the library module of course) and pick up the function without having to do massive if''s on it, then you just have to have your "operator" functions in a .dll and use LoadLibrary to pick it up.
~ The opinions stated by this individual are the opinions of this individual and not the opinions of her company, any organization she might be part of, her parrot, or anyone else. ~
JonStelly:
I know how to sort strings. The problem is when to sort them. FYI, you don''t even need compare; CString overloads operator <, so you can use std::sort directly.
fel:
I''m probably not doing what you think I''m doing. This is for a script-interpretting program. Things need to be tokenized, scoped, etc, etc. An example script snippet would be:
loops = 10;
while (loops != 0)
{
loops = loops - 1;
}
My arithmetic function would be called when evaluating the loops != 0 and loops - 1 portions.
I think I must not have state my problem clearly. What I really want to do is switch (string). Obviously this is impossible, but there has to be something better than a linear search through each condition.
I thought about using function pointers (and still might), but that doesn''t seem much better than writing a class for each operation.
I know how to sort strings. The problem is when to sort them. FYI, you don''t even need compare; CString overloads operator <, so you can use std::sort directly.
fel:
I''m probably not doing what you think I''m doing. This is for a script-interpretting program. Things need to be tokenized, scoped, etc, etc. An example script snippet would be:
loops = 10;
while (loops != 0)
{
loops = loops - 1;
}
My arithmetic function would be called when evaluating the loops != 0 and loops - 1 portions.
I think I must not have state my problem clearly. What I really want to do is switch (string). Obviously this is impossible, but there has to be something better than a linear search through each condition.
I thought about using function pointers (and still might), but that doesn''t seem much better than writing a class for each operation.
To be honest I'm a little confused as to why you would want to go to all the trouble of using the STL binary search when you could just make a tree of basic_string, CString, or char* keywords. Making a tree out of that is pretty easy, especially since traversal will just depend on strcmp or the equivalent.
Can't you use one of the dynamic array containers? (I haven't worked with the STL binary search stuff so I don't know)
Actually since you're just making it at the beginning and forgetting about it as far as I can tell, can't you just be sure to put the keyword in the array yourself?
const char *keywords[][] = {"bob", "joe", "tom"};
and if you wanted to add "sam" just put him in order anyway. Then again you haven't specified whether or not changes in the language will be dynamic or read from a file that somebody with little knowledge of the alphabet might play with... and if that's the case I would suggest forming it in a linked list and spitting the sorted list into the array (or just using a dynamic array).
I don't suppose there's an opensource compiler or interpreter out there someplace that you could look at...
-fel
Edit: my grammar sucks today.
Edited by - felisandria on October 12, 2000 5:44:39 PM
Can't you use one of the dynamic array containers? (I haven't worked with the STL binary search stuff so I don't know)
Actually since you're just making it at the beginning and forgetting about it as far as I can tell, can't you just be sure to put the keyword in the array yourself?
const char *keywords[][] = {"bob", "joe", "tom"};
and if you wanted to add "sam" just put him in order anyway. Then again you haven't specified whether or not changes in the language will be dynamic or read from a file that somebody with little knowledge of the alphabet might play with... and if that's the case I would suggest forming it in a linked list and spitting the sorted list into the array (or just using a dynamic array).
I don't suppose there's an opensource compiler or interpreter out there someplace that you could look at...
-fel
Edit: my grammar sucks today.
Edited by - felisandria on October 12, 2000 5:44:39 PM
~ The opinions stated by this individual are the opinions of this individual and not the opinions of her company, any organization she might be part of, her parrot, or anyone else. ~
I somewhat assume you are reffereing to overhead involved with interpreted scripts and languages. In this case, sorting will do nothing for you. There is no way around this overhead which requires you to have a condition for each possible keyword/token/etc. Using a function table of course will just give you function call overhead. You can speed up the string comparisons though by using a hash. This way you can convert a string into a keyword index or whatever fairly quickly and compare integers instead of strings.
cmaker
- its not the principle. its the money.
cmaker
- its not the principle. its the money.
cmaker- I do not make clones.
Stoffel
Let me try to clarify your question:
You have an MFC app, and you need to know where to put initialization code (the array sort) so that it only runs once?
I haven''t looked at MFC in years, but WINMAIN or the object''s constsructor seem like decent places to me. If that''s not possible (beats me as to why not), use a flag...
Of course, if I''ve got your question completely wrong, ignore this whole post.
Micah
Let me try to clarify your question:
You have an MFC app, and you need to know where to put initialization code (the array sort) so that it only runs once?
I haven''t looked at MFC in years, but WINMAIN or the object''s constsructor seem like decent places to me. If that''s not possible (beats me as to why not), use a flag...
bool ArrayHasBeenSorted = false;//...Lotsa code... if(ArrayHasBeenSorted == false) { //Do Sort ArrayHasBeenSorted = true; }
Of course, if I''ve got your question completely wrong, ignore this whole post.
Micah
Hm, good answers, but I think they''re answers to questions I''m not asking. =)
I''m not talking MFC-specific, I''m talking just a general "how do you switch on a string in the fastest way possible"?
Considering the discussion, I guess the best thing to do would be something like this:
Now, this works. Drawbacks:
1) I didn''t realize you couldn''t use STL on function-local types. Oops. I really hate stuff being in the global namespace, so I may have to re-think my objection to making this a class object.
2) You have to check whether it''s sorted each time. Blecha.
3) There''s actually two searches: one to search for the OpPair, then the next to switch on the OpCode. Blecha.
I hope from this example everybody can finally see what I''m trying to do. The question is: what''s the best way to do this?
I''m not talking MFC-specific, I''m talking just a general "how do you switch on a string in the fastest way possible"?
Considering the discussion, I guess the best thing to do would be something like this:
#include <algorithm>#include <string>#include <iostream>using namespace std;enum OpCode { OP_INVALID, OP_PLUS, OP_MINUS, OP_LESS_EQUAL, OP_LOGICAL_OR };struct OpPair{ const char* name; OpCode code; bool operator < (const OpPair& rhs) const { return strcmp (name, rhs.name) < 0; }};void arithmetic (const string& left, const string& op, const string& right, string& result){ static OpPair op_array[] = { {"+", OP_PLUS}, {"-", OP_MINUS}, {"<=", OP_LESS_EQUAL}, {"||", OP_LOGICAL_OR} }; static const unsigned int op_array_end = sizeof op_array / sizeof OpPair; static bool sorted = false; // code for function actually starts here // this flag needs to be checked each time if (!sorted) { sort (op_array, op_array + op_array_end); sorted = true; } OpPair searchOp = {op.c_str (), OP_INVALID}; OpPair* pFoundOp = lower_bound (op_array, op_array + op_array_end, searchOp); if (pFoundOp == op_array + op_array_end || strcmp (pFoundOp->name, searchOp.name)) { throw runtime_error ("operation not found"); } switch (pFoundOp->code) { case OP_LESS_EQUAL: result = atoi (left.c_str ()) <= atoi (right.c_str ()) ? "1" : "0"; break; // etc, etc }}int main (int argc, char **argv){ string result; arithmetic ("4", "<=", "6", result); cout << result; return 0;};
Now, this works. Drawbacks:
1) I didn''t realize you couldn''t use STL on function-local types. Oops. I really hate stuff being in the global namespace, so I may have to re-think my objection to making this a class object.
2) You have to check whether it''s sorted each time. Blecha.
3) There''s actually two searches: one to search for the OpPair, then the next to switch on the OpCode. Blecha.
I hope from this example everybody can finally see what I''m trying to do. The question is: what''s the best way to do this?
October 12, 2000 06:51 PM
I didn''t know about point 1) either. Sometimes the limitations and gotchas of STL really make wonder if it''s worth it.
As far as point 2) goes, the easiest way to ensure that the array is always sorted (and hence eliminate the need to check if it''s sorted or not) is to add some debug only code that checks it for you. This of course requires you to run debug builds as part of the regular development process. These checks and their overhead then disappear for release builds. Other than that I think you have to just bite the bullet and put your OpPair arrays in a larger scope and use an initialization function to do the sorting. You can use a namespace to restrict the scope at least a bit.
For point 3), one solution would be to encode each operation as a seperate function and then put a function pointer into your OpPair. You pay the price of a function call but don''t have to do the search.
As far as the whole approach goes, sure binary search is faster when you lots of elements but I wonder if it''s faster if you only have 4. Even 14 might not be enough faster to be worth the effort especially if the linear version has items sorted by how common they are.
Interesting problem.
-Mike
As far as point 2) goes, the easiest way to ensure that the array is always sorted (and hence eliminate the need to check if it''s sorted or not) is to add some debug only code that checks it for you. This of course requires you to run debug builds as part of the regular development process. These checks and their overhead then disappear for release builds. Other than that I think you have to just bite the bullet and put your OpPair arrays in a larger scope and use an initialization function to do the sorting. You can use a namespace to restrict the scope at least a bit.
For point 3), one solution would be to encode each operation as a seperate function and then put a function pointer into your OpPair. You pay the price of a function call but don''t have to do the search.
As far as the whole approach goes, sure binary search is faster when you lots of elements but I wonder if it''s faster if you only have 4. Even 14 might not be enough faster to be worth the effort especially if the linear version has items sorted by how common they are.
Interesting problem.
-Mike
Stoffel
I have done parsing like this before at work, only I optimized it for ease of coding not speed. You could eliminate the second search by creating a base class, and deriving from that for each discrete operation (replaces struct OpPair). The actual processing function is a member of the object, and thus once you''ve found the operator, you''ve also found the function to perform. Unfortunately this adds the overhead of a function call, which makes it overkill for this situation.
Given this incarnation, you''re probably not going to get much more out of it. If you need more efficiency I''d suggest going back to the drawing board and starting over. A global array or an object would eliminate the flag check at the beginning of the function. You could combine the method I mention above with a tree structure. Also, if your application model supports it, you could parse first, then execute later. This doesn''t speed up your parser, but may speed up your execution times if it repeated several times.
In the end, sometimes the clumsiest methods are the only methods.
Micah.
I have done parsing like this before at work, only I optimized it for ease of coding not speed. You could eliminate the second search by creating a base class, and deriving from that for each discrete operation (replaces struct OpPair). The actual processing function is a member of the object, and thus once you''ve found the operator, you''ve also found the function to perform. Unfortunately this adds the overhead of a function call, which makes it overkill for this situation.
Given this incarnation, you''re probably not going to get much more out of it. If you need more efficiency I''d suggest going back to the drawing board and starting over. A global array or an object would eliminate the flag check at the beginning of the function. You could combine the method I mention above with a tree structure. Also, if your application model supports it, you could parse first, then execute later. This doesn''t speed up your parser, but may speed up your execution times if it repeated several times.
In the end, sometimes the clumsiest methods are the only methods.
Micah.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement
Recommended Tutorials
Advertisement