Advertisement

I made a pattern matching tool, please help!

Started by April 02, 2015 05:09 PM
6 comments, last by tufflax 9 years, 6 months ago
Hi!
I have made a lib/tool as part of a school project, and I need to evaluate its usefulness. It's like a regex tool, but it works on sequences of records/hash-maps instead of on text.
Say you had a sequence of hash-maps in memory, and the hash-maps represent events of some kind, say a server log. You want to see what happened before a `{event-type: error}` occurred. So you can write the pattern "...{event-type: error}" where `.` means any hash-map, just like it means any character in ordinary text regexes. If you then run `matches(the-pattern, the-sequence)` it would return all instances of the pattern, so you can inspect what happened before the errors. You can also use the `*`, `+`, `[ ]`, `[^ ]`, `?` and `|` metacharacters. So you could look for streaks of errors with "{event-type: error}+". You can also use parentheses both for grouping and for extracting submatches from the matches.
Does anyone have a problem that could be solved by something like this? If so, please let me know.
I have a github page for it https://github.com/oskarkv/map-regexps but it is in Clojure, so I didn't think most of you would understand it, because Clojure is not very popular among game developers (or just developers tongue.png). But you could use it with any JVM language, and you can help me by just telling me about your problems, even if you have already solved them! I just need to know about some example problems that could be solved with it. But they should be real-world problems, not made up ones.
EDIT: I should also add, that I am very eager to get some problems for my evaluation. So if you have a good problem that could be solved by my lib, I could, for example, implement extra features and help get it to work in Java, just for you.
Would it be able to operate on recursive maps, such as a nested JSON-like data structure? It seems like there might be some interesting cases where you'd want to find/replace something deep in a JSON object that could be done conveniently with this.

Can regular expressions be applied *within* the map pattern? such as "...{event-type: error|warning}", or would you have to write it as "...{event-type: error}|{event-type: warning}" ?

As is, I think it could definitely be useful as a way to search through server logs, though there are often more complex queries you'd want to perform there. For example: "Get all records where the UserID is 12345, then search them for a span of errors starting at (some regex pattern) and ending at (some regex pattern)". If the original sequence contains several UserIDs all at once, it would be difficult to write a single regular expression to do this. You'd need to pipe the results of one search into another.
Advertisement

@Nypyren (is this doing something? :p)

Right now it can not be used recursively, but it would be pretty easy to implement I think, if I could just decide on a good syntax for it. Same with the error|warning. But in that case, since Clojure has pretty convenient function literals (for example #(> % 3) is a function that checks whether the input is larger than 3) something like {event-type: #(contains? [error warning] %)} could be used if I made functions as values be applied as predicates. I'm not up to date with Java, but it's getting lambdas too, right?

In your last example, yes, piping the results to a next step seems easiest, and something that seems perfectly acceptable.

I usually use C#, and I don't use Java or Clojure, so I won't be able to actually try out your lib. It seems cool though - I like the idea of applying regular expression-style pattern matching to sequences and structured data instead of just text.

In C# normally we have a set of features called "LINQ" which can be used for processing sequences of data. They work more like what you'd see in a functional language:


// filters all log entries for one user and skips everything until the first error record.
var results = logEntries.Where(x => x.UserID == 12345).SkipWhile(x => x.EntryType != EntryType.Error);
The built-in LINQ functions provided in .Net don't have the power of regular expressions (as far as I know...), but it would be possible to implement a LINQ-compatible library that does, like yours. I think that would be pretty awesome.
C# also has LINQ-to-XML. The way this typically works is you can access the XML hierarchy as an object tree, and also get the "descendants" of any node. The Descendants function generates an enumerable representation that you can pass to a sequence-based processing algorithm.

I don't know Clojure, correct me if I'm wrong, but it seems that you can only search through hashmaps in memory.

Server logs are usually on disks, zipped, or in the cloud. If you can have this library run on top of a hadoop cluster, redis, or some document-based database, then it can be more useful.

Advertisement

I don't know Clojure, correct me if I'm wrong, but it seems that you can only search through hashmaps in memory.

Server logs are usually on disks, zipped, or in the cloud. If you can have this library run on top of a hadoop cluster, redis, or some document-based database, then it can be more useful.

First of all, Clojure has lazy sequences, so it could pretty easily be read bit by bit from disk and process it. Same with zips I guess. It can also fairly easily be used together with Cascalog http://cascalog.org/ which is a declarative language for use on Hadoop. I have done it before but don't have it installed right now. I could also extend it to data types other than hashmaps. Just need to come up with a good syntax for everything.

I usually use C#, and I don't use Java or Clojure, so I won't be able to actually try out your lib. It seems cool though - I like the idea of applying regular expression-style pattern matching to sequences and structured data instead of just text.

In C# normally we have a set of features called "LINQ" which can be used for processing sequences of data. They work more like what you'd see in a functional language:


// filters all log entries for one user and skips everything until the first error record.
var results = logEntries.Where(x => x.UserID == 12345).SkipWhile(x => x.EntryType != EntryType.Error);
The built-in LINQ functions provided in .Net don't have the power of regular expressions (as far as I know...), but it would be possible to implement a LINQ-compatible library that does, like yours. I think that would be pretty awesome.

There is an implementation of Clojure for .NET too, so it should be farily easy to port it. If anyone is really interested then let me know. And those LINQ operators looks like `filter` and `drop-while` in Clojure. Btw I highly recommend Clojure over Java and C#, it is really, really good. :P


(drop-while (fn [m] (not= (:entry-type m) :error))
            (filter (fn [m] (= (:user-id m) 12345)) log-entries)

This topic is closed to new replies.

Advertisement