[FoRK] Q re: ConceptNet (also FluidDB)
J. Andrew Rogers
andrew at ceruleansystems.com
Wed Oct 21 10:03:40 PDT 2009
On Oct 21, 2009, at 7:43 AM, Jeff Bone wrote:
> JAR asks:
>> Are we talking about ConceptNet/OpenMind specifically or semantic
>> web technologies generally? And how do you define "interesting"?
> ConceptNet specifically. It's got some interesting attributes and
> applications that most of the "strong" ontological systems don't
> have (at least in my own uses of them.) "Interesting" in this case
> means just that --- useful beyond what e.g. Cyc, various rdf /
> semweb technologies, and so on have been in my experience.
> Particularly when dealing with large and diverse natural language
> corpuses, CN2/3 (with supplemental data) have proven more useful at
> various extraction, classification, and extrapolation tasks than
> other methods including statistical ones, naive bayes, etc. (For
> me. YMMV.)
In my experience, you are entirely correct that most semantic web
technologies are over-structured to the point of being not very
useful. Most of the interesting R&D currently is on very general
graph analytic systems that subsume the rigid classical models but do
not require them. Rigid models are more susceptible to the numerous
NP land mines that litter this theoretical landscape.
> The actual CN tools themselves aren't as useful (you're correct,
> toys so far) as the knowledge base per se and its data model ---
> which can be easily embedded into a slightly more robust model that
> more easily and effectively handles meta-information such as
> provenance, etc. --- and does so defeasibly if necessary, important
> for real-world use. IMHO, those are the major conceptual problems
> w/ the rdf-like approaches; representing defeasible information as-
> such and handling reification and self-referentiality. Do-able, but
> not in a satisfying or particularly practical way.
Yes, which is why a lot of the current R&D is focused on generalized
graph-like computation, not the narrow case of RDF-like systems. You
can do it with RDF-like systems in principle, but not efficiently and
efficiency is very important for most real work at the scales
required. Current popular tools and models are badly designed for the
actual markets for this kind of technology.
The primary real (and "interesting") use case for these types of
models in commercial and other systems is induction and prediction in
highly dynamic data environments at large scales. There are
organizations starting to build more generalized graph systems for
this purpose at very large scales, but it definitely isn't open source
(or even shrink-wrap) technology at this point. It will probably be a
few years before this creeps into the web at large.
More information about the FoRK