[FoRK] Data Structures Question

Stephen D. Williams <sdw at lig.net> on Mon Apr 2 18:48:37 PDT 2007

You can do this with a simple tree where each node is either an AND or 
OR node and has a list of variables and subexpression node links.  You 
probably want to do a breadth first evaluation of variables at each 
level to determine if you can short circuit. 

If you don't care about guaranteeing that variables and subexpressions 
are not evaluated unless needed (C/C++/Java does this, Pascal did not), 
you could convert all variables to a bitmap and condense the search into 
bitwise anding with popcount to determine when you have matched an 
expression path.  I wrote a nifty algorithm recently that found common 
temporal subgraphs and their "parents" using bitmaps in a variety of ways.

So then the problem becomes one of searching many different patterns in 
parallel.  This seems to overlap perfect hashes (misses are cheap, hits 
take time to determine whether they are real), automatically constructed 
state machines (such a lex parser which searches for tokens in 
parallel), and Rete from expert systems.

It seems like for each set bit, you could traverse a tree of bits to 
check next to get to a terminal (yes or fail).  The nifty thing is that 
when a later bit depends on an earlier bit, the tree for the later bit 
doesn't have to have anything for the earlier bit.  If it was set, it 
would have already checked, if it wasn't set, then that AND already 
failed and is free.  If you then memoize all of the shared subtrees from 
earlier bit expressions into new bits or something for later checking, 
then you should be able to minimize total steps.  Unless you also have 
statistics, at which point you could do a query optimization by 
rearranging bits (through a translation table) so that 
more-frequent-determiner bits are checked first.


Reza B'Far wrote:
> Hi Folks:
> So, here is a question that I don't seem to be able to find the answer to
> through Google!
> Given an expression similar to:
> [(e1 && e2) || (e3 || e4)] && e5
> I want to get sub expressions
> e1 && e2 && 65
> e3 && e5
> e4 && e5
> As "sub-expressions".
> Definitions -
> - Subexpressions only have and's
> - If any of the sub expressions are satisfied, then the expression is
> satisfied (satisfied meaning resulting in true)
> - Only valid operators are {&&,||} ('and' and 'or')
> - There are no limits to the number of variables in the expression
> - Variables evaluate to either true or false, but the value is unknown until
> time when we evaluate the subexpressions.
> - Variables can only be used once in any expression (variables in the
> expression are orthogonal and mutually exclusive)
> It seems to me that there must be a type of tree that satisfies the
> conditions above so that a set of nodes with particular properties (for
> example all the end nodes) navigating to the root will give me my sub
> expressions.
> An optimal implementation must exist somewhere.  Anyone know the answer to
> this?
> Thanks,
> Reza
> _______________________________________________
> FoRK mailing list
> http://xent.com/mailman/listinfo/fork

More information about the FoRK mailing list