20090415

CL-UNIFICATION does regexps (thanks to Edi Weitz)

I have been working for some time on the CL-UNIFICATION library. Somehow, "unification" and "parsing" are related but not quite (I bet the literature on the topic is huge). In an afternoon of the Easter vacation, I decided to take the easy way out of extending CL-UNIFICATION with some "parsing" functionality: I just added Edi Weitz's wonderful CL-PPCRE library. Edis' library has a very simple and intuitive interface which made the integration a SMOP! The result is the following. Now you can write the following (assuming all the packages are USEed):

(unify #T(regexp "a(b+)c(d+)" (?bs ?ds)) "abbbbcddd")
The call will produce an environment where ?BS is bound to "bbbb" and ?DS is bound to "ddd". Here is a full transcript.
CL-USER 3 > (in-package "UNIFY")
#<The CL.EXT.DACF.UNIFICATION package, 326/512 internal, 20/64 external>

UNIFY 4 > (unify #T(regexp "a(b+)c(d+)" (?bs ?ds)) "abbbbcddd")
#<UNIFY ENVIRONMENT: 1 frame 200BC147>

UNIFY 5 > (v? '?bs *)
"bbbb"
T

UNIFY 6 > (v? '?ds **)
"ddd"
T
Of course, the other matching operations work as expected.
UNIFY 9 > (match-case ("abbbbcdd")
                      (#T(regexp "a(b+)c(d+)" (?bs ?ds)) (concatenate 'string ?ds ?bs))
                      (t "It did not work!"))
"ddbbbb"
I.e., there is an interface to get to the actual regexp groups (beyond those available directly in CL-PPCRE). Maybe the syntax of the regexp unification templates could be made even more CL-PPCRE-like by exploiting "named registers", but, for the time being, the above is the best way to use it.
(cheers)

1 comment: