Gk: a short tutorial

Contents:

Introduction

Gk is an automated common-sense reasoner, based on full classical logic which is extended and softened with the following mechanisms:

  • answering questions, i.e. finding suitable substitutions to variables in a question
  • ignoring inconsistencies in the knowledge base
  • calculating probability-like confidences of answers
  • handling default rules, i.e. rules with exceptions
  • handling similarities

The first three mechanisms, in particular, calculating confidences, are covered in a paper "Confidences for Commonsense Reasoning" https://link.springer.com/chapter/10.1007/978-3-030-79876-5_29 .

Gk currently runs only on Linux. Call gk like this:

./gk problemfilename

or like this for reading several files

./gk filename1 ... filenameN

or like this to get an explained list of command line keys and input/output options:

./gk --help    

The problem file has to be given using the JSON syntax described in the JSON-LD-LOGIC. The gk output is also in JSON.

The problem file has to contain a question clause like

{"@question": ["-flies","?:X"]}

Further notes about the input format are given in a later section. It is recommended to first read and try out these examples in the demo folder:

  • trivial.js: trivial question answering
  • rulemult.js: basic multiplication of confidences
  • cumulate.js: basic cumulating of confidences
  • socialsmoking2.js: confidences and rules
  • socialsmoking.js: confidences and rules
  • near.json: a chain of confidences with transitivity
  • penguin.js: default rules and taxonomies
  • nixon.js: contradictory default rules
  • classify.js: confidences and default rules for simple classification
  • people_room.js: describing a situation-event chain with default rules

Gk is built upon a high-performance automated reasoner gkc for classical logic available at https://github.com/tammet/gkc ; see also a web version at https://logictools.org/ and a paper "GKC: A Reasoning System for Large Knowledge Bases" https://link.springer.com/chapter/10.1007%2F978-3-030-29436-6_32

Answers given by gk

For questions not containing variables, gk tries to prove the question, and if it fails, it tries to prove the negation of the question.

If the question clause contains variables, like in {"@question": ["-flies","?:X"]}, gk tries to find suitable answers, i.e. substitutions to these variables such that the question is proved to be true. In this case, if no suitable answers are found, gk will not try to find answers for the negated query.

By default gk uses 10 seconds as a strict overall time limit. Change this with the -seconds N key like this:

./gk trivial.js -seconds 1

Successful answers: positive and negative

For all the successful cases gk outputs a json structure

{"result": "answer found",
 "answers": [...] } 

with a list of answers with additional details under the "answers"key.

Consider the example file trivial.js:

[
["p","a"],
["p","b"],
["-p","c"],
{"@question": ["p","a"]}
]

Gk will answer

{"result": "answer found",

"answers": [
{
"answer": true,
"confidence": 1,
"positive proof":
[
[1, ["in", "frm_4", "goal", 1], [["-p","a"]]],
[2, ["in", "frm_1", "axiom", 1], [["p","a"]]],
[3, ["simp", 1, 2, "fromgoal", 1], false]
]}
]}

The proofs are a sequence of input and derived clauses ending with the derivation of contradiction or an answer clause. Observe that the goal is a negated question.

Each element in a proof is a triple: clause number, derivation method and the clause itself. The last element of the derivation method indicates confidence. Further details of the proof will be explained in a later chapter.

If you change the line {"@question": ["p","a"]} to {"@question": ["-p","a"]} gk will first try and fail to prove the question, and then it will try and succeed to prove the negated question, as a consequence of which you will get (observe false, negative proof and the $auto_negated_question):

{"result": "answer found",

"answers": [
{
"answer": false,
"confidence": 1,
"negative proof":
[
[1, ["in", "$auto_negated_question", "goal", 1], [["-p","a"]]],
[2, ["in", "frm_1", "axiom", 1], [["p","a"]]],
[3, ["simp", 1, 2, "fromgoal", 1], false]
]}
]}

Changing the question to {"@question": ["p","?:X"]} forces gk to look for suitable substitutions for ?:X, giving a limited number (by default, not more than 10: see the "max_answers" key in the strategy chapter for changing the limit) different answers:

{"result": "answer found",

"answers": [
{
"answer": [["$ans","a"]],
"confidence": 1,
"positive proof":
[
[1, ["in", "frm_1", "axiom", 1], [["p","a"]]],
[2, ["in", "frm_4", "goal", 1], [["-p","?:X"], ["$ans","?:X"]]],
[3, ["mp", 1, 2, "fromgoal", 1], [["$ans","a"]]]
]},
{
"answer": [["$ans","b"]],
"confidence": 1,
"positive proof":
[
[1, ["in", "frm_2", "axiom", 1], [["p","b"]]],
[2, ["in", "frm_4", "goal", 1], [["-p","?:X"], ["$ans","?:X"]]],
[3, ["mp", 1, 2, "fromgoal", 1], [["$ans","b"]]]
]}
]}

Gk will spend all the allocated time (unless the search space is limited) to look for different new answers until the prescribed number of answers is found.

Answers given are in the form of clauses (disjunctions of elements) where the values for variables in the question are wrapped into the special ["$ans",.... ] atom. Although most of our examples give a single-element answer clause, the following input

[
[["p","a"],"|",["p","b"]],
{"@question":  ["p","?:X"]}
]

would give this value:

"answer": [["$ans","a"], ["$ans","b"]]

Let us now add this line to trivial.js indicating that there is separate evidence with a low confidence 0.2 that ["-p","a"]:

{"@logic": ["-p","a"], "@confidence": 0.2},

Gk will answer to {"@question": ["p","a"]} with the following proof, indicating separately the positive and negative evidence for the answer found for ["p","a"]:

{"result": "answer found",

"answers": [
{
"answer": true,
"confidence": 0.8,
"positive proof":
[
[1, ["in", "frm_5", "goal", 1], [["-p","a"]]],
[2, ["in", "frm_1", "axiom", 1], [["p","a"]]],
[3, ["simp", 1, 2, "fromgoal", 1], false]
],
"negative proof":
[
[1, ["in", "frm_4", "axiom", 0.2], [["-p","a"]]],
[2, ["in", "$auto_negated_question", "goal", 1], [["p","a"]]],
[3, ["simp", 1, 2, "fromgoal", 0.2], false]
]}
]}

The combined confidence in an answer is an absolute value of the difference between positive and negative confidences. By default gk will discard and not show answers with a combined confidence below 0.1. You can change this 0.1 threshold with a key to gk like this:

./gk problemfilename -confidence N

where N is either an integer 2..100 (percentage) or float 0..1.

Let us now add this line to trivial.js indicating a default closed-world rule that every object for which we do not know that "p" holds, "p" does not hold:

[["-p","?:X"],["$block",1,["p","?:X"]]]

Asking a question about an unknown object m with {"@question": ["-p","m"]} will give

{"result": "answer found",

"answers": [
{
"answer": true,
"blockers": [["$block",1,["p","m"]]],
"confidence": 1,
"positive proof":
[
[1, ["in", "frm_4", "axiom", 1], [["$block",1,["p","?:X"]], ["-p","?:X"]]],
[2, ["in", "frm_5", "goal", 1], [["p","m"]]],
[3, ["mp", [1,1], 2, "fromgoal", 1], [["$block",1,["p","m"]]]]
]}
]}

where the "blockers": [["$block",1,["p","m"]]] line indicates that the proof assumes that ["p","m"] cannot be proven, and gk has tried to prove that and failed. This failure is generally no guarantee that a proof does not exist after all: maybe it would just take much more time than allocated.

Non-successful answers

For all the non-successful non-error cases gk outputs a json structure

{"result": "explanatory string"}

with several possible explanatory strings described below.

Consider the following example:

[
["p","a"],
{"@logic": ["-p","a"], "@confidence": 0.99},
{"@question": ["p","a"]}
]

Gk will detect that the confidence of both the positive and negative version of the question is under the default limit 0.1 and will answer

{"result": "evidence below limit"}    

As said above, you can make gk to show a successful answer for this example by lowering the threshold like this:

 ./gk problemfilename -confidence 0

Gk will behave analogously for the question with variables {"@question": ["p","?:X"]}.

There are several other variations of the explanatory string for the cases where some initial candidate answers were found, but all filtered out later:

  • "generic assumptions contradicted" : there was a single initial answer candidate, but it was blocked.
  • "evidence for candidate answers below limit" : there were several initial answer candidates, but they were either blocked or had a confidence under threshold.

The exact nomenclature and meaning of these explanatory strings is expected to change in the later versions of gk.

Change the question above to one where no proofs can be found whatsoever: {"@question": ["m","k"]}. In this case gk will answer

{"result": "no information"}

Gk will give a slightly different answer for the question containing variables, for which no substitution leading to a proof can be found: {"@question": ["m","?:X"]}:

{"result": "no answers found"}

In case of a timeout, gk gives:

{"result": "time limit, proof not found"}

Such a timeout may indicate that some parts of the proof search tree were unfinished or that some specific subsearch was unfinished. It may also occur due to smaller bugs in the time management of gk. It should not be interpreted as saying that no useful information whatsoever was found. The timeout is reliably enforced by an alarm/signal mechanism terminating any ongoing activities.

Errors

In case gk encounters an error it can handle, it gives a JSON output

{"error": "explanatory string"}

with a large number of possible error explanations like "json syntax error ...", "no question given" etc. Please read the explanation and try to amend the input or keys.

Gk produces non-json output only if

  • -print N key is used for search tracing with N > 11
  • a hard unhandled error occurred or
  • an unexpected bug in code produces output where none should happen.

If gk does not produce output at all, then it has encountered a hard unhandled error. A common reason for such errors is erroneous input which gk may sometimes fail to handle.

Basic input

As said earlier, gk expects input files to contain logic in the JSON-LD-LOGIC format.

Gk implements all the builtins defined in this specification, including equalities, arithmetic, lists and distinct symbols.

Gk allows json to contain C-style comments: both line comments // and block comments /* .... */.

There is also an option to give input directly from a command line with the -jstext '.. input json ...' key.

However, JSON-LD-LOGIC does not address the encoding of questions, confidences or default rules. We will cover these here.

Ordinary fact / rule clauses

An example of two clauses with a simple form and no additional information, following the JSON-LD-LOGIC syntax:

["bird","b1"],
[["bird","?:X"],"=>",["canfly","?:X"]]

Notice that the second clause is equivalent to the disjunctive list form used in output proofs

[["-bird","?:X"],["canfly","?:X"]]

Both - or any other equivalent form - can be used for encoding clauses.

By default simple clauses in input are assumed to have a confidence 1 (totally certain) and no exceptions.

Questions and assumptions

First, gk without the -readkb key requires that one of the input files contains a question clause with the form containing a question key "@question" like this:

{"@question": ["canfly","?:X"]}

There is also an optional equivalent version of this encoding using roles:

{"@logic": ["canfly","?:X"], "@role": "question"}

A question clause must follow a fairly limited form. It should be either a positive or negative atom or a conjunction of such atoms, like

{"@question":  [["p","?:X"],"&",["r","?:X"]]}

You can encode more complex questions by defining a predicate which is equivalent to the complex question, and then just writing a question clause using this predicate.

The variables in the question clause are interpreted as free variables for which we need to find suitable substitutions making the question true.

A question clause does not have to contain variables: in that case gk will attempt to determine whether it is true or false and with which confidence. While in traditional resolution-based systems the questions are negated, gk assumes the questions are not negated.

Conceptually, questions often have a form where we assume some information A and then ask whether a question Q can be derived from A and a knowledge base. For example: suppose b1 is a bird; can b1 fly? Such questions could be encoded simply as

["bird","b1"],
{"@question": ["canfly","b1"]}

but it is better to encode them by using a special assumption role like this:

{"@logic": ["bird","b1"], "@role": "assumption"},
{"@question": ["canfly","b1"]}

The main point of using the assumption role is that gk can take advantage of this information during proof search by focusing more on the known assumptions than arbitrary axioms. Additionally, gk will put the information about the roles of clauses into proofs found for better readability.

Confidences Exceptions Similarities

The main features of gk for making logic "soft" are confidences, exceptions and similarities.

Confidences

By "confidences" we mean inexact subjective probabilities which can be assigned to clauses. Gk calculates confidences of derived clauses and final answers by

  • using classical rules for multiplying and cumulating probabilities,
  • estimating independences of input and derived clauses,
  • estimating the summary confidence of answers as a difference of positive and negative evidence.

To give a confidence, say, 0.8, to a clause, write it as the following json object with the "@logic" and "@confidence" keys:

{"@logic": [["bird","?:X"],"=>",["canfly","?:X"]], "@confidence": 0.8}

Optionally you can give integer "@confidence" values in the range 2..100: these are interpreted as percentages and translated to the range 0.02...1.

By default all clauses have a confidence 1.

Question clauses should not have an explicit confidence attached: they are assumed to have "confidence" 1.

The details of how gk calculates confidences is described in the paper "Confidences for Commonsense Reasoning" "Confidences for Commonsense Reasoning"

Check out these examples from the demo folder:

  • rulemult.js: basic multiplication of confidences
  • cumulate.js: basic cumulation of confidences
  • socialsmoking.js: confidences and rules
  • socialsmoking2.js: confidences and rules
  • near.json: a chain of confidences with transitivity

Exceptions: default rules encoded with blockers

Gk handles rules with exceptions with roughly the same interpretation as default logic.

Rules with exceptions are encoded as ordinary clauses with a special blocker literal with the predicate "$block" added, taking two arguments:

  • a strength of the rule,
  • a literal which blocks the rule in case the literal can be proved.

Consider a default rule "birds typically fly". We encode this as

[["-bird","?:X"],["canfly","?:X"],["$block",0,["$not",["canfly","?:X"]]]]

or equivalently

[["bird","?:X"],"=>",[["canfly","?:X"],"|",["$block",0,["$not",["canfly","?:X"]]]]]

Negations in blocker literals must be encoded with the special "$not" function, and not like "-canfly". A blocker literal must not contain any more complex logic than just a single positive or a "$not"-prefixed negated atom. A blocker literal itself must not be negated.

A clause may contain several blocker literals: in that case it is blocked if some of the blocker literals can be proved.

The following example encodes a closed world assumption for a predicate "p", i.e. "p" is false for any object except these for which we can prove that "p" is true:

[["-p","?:X"],["$block",0,["p","?:X"]]]]

Next we will describe the function and effects of the first, strength argument of blocker literals. In the examples above this argument was 0.

The idea of the strength of a blocker is that a weaker blocker is not allowed to participate in a proof of a stronger blocker. Typically we assume that blockers involving concepts lower in a taxonomy (i.e. more specific objects) should be stronger than blockers involving concepts higher in a taxonomy. A simple example saying that while arbitrary objects typically do not fly, birds do typically fly:

[["-bird","?:X"], ["flies","?:X"],  ["$block", 3, ["$not", ["flies", "?:X"]]]],
[["-object","?:X"],["-flies","?:X"],["$block", 2, ["flies", "?:X"]]]

A blocker with a larger strength number is stronger than one with a smaller number: proofs of blockers with a strength N are not allowed to contain blockers with a strength smaller than N.

An exception to this principle is strength 0: this strength is assumed to be not comparable to other strengths, i.e. no blocker is stronger or weaker than a blocker with a strength 0.

Blockers with equal first arguments do not block each other mutually. This can easily create potentially infinite blocker checking chains like for the classic Nixon triangle example:

[["-quaker","?:X"],["pacifist","?:X"],["$block",1, ["$not", ["pacifist", "?:X"]]]],
[["-republican","?:X"],["-pacifist","?:X"],["$block",1, ["pacifist", "?:X"]]]

Such chains are solved by gk using both (a) diminishing time resources for deeper checks (b) explicit limit on recursive blocker checks (c) explicit checks for cycles in recursive blocker checks. Note: the case (c) of explicit checks is in active development and the exact behaviour may change.

We will now describe additional mechanisms implemented in gk for encoding and comparing blocker strengths:

  • taxonomy class numbers
  • concept names

In case gk is run with the -default key, or a knowledge base used was read in with the -default key (do not do both!), gk will read and use two files with the special names:

  • gk_name_number.txt
  • gk_taxonomy_packed.txt

The first of these files associates taxonomy class numbers to predicates (an example row: bird.n.01,61598) and the second encodes a topologically sorted taxonomy of these class numbers. The files in the demo folder encode these for Wordnet synsets: a word followed by noun/verb/etc indicator character plus a number, like in bird.n.01,61598. Loading such a file will additionally add a class number to "pure" words like bird: the class number of the first occurrence of a "pure" word in a synset form is assigned to the pure word.

These class numbers can be used instead of numeric strengths with a special wrapping ["$",taxonomy_class] in a blocker like this:

["$block",["$",61598], ...]

or directly as predicate names like this:

["$block",["$","bird"], ...]

Additionally, for giving different strengths to blockers with the same associated predicate, it is possible to write ["$",taxonomy_class_or_predicate,integer]. For example, the first of the following two blocker strengths is stronger than the second:

["$block",["$","bird",3], ...]
["$block",["$","bird",2], ...]

To summarize, the first blocker argument can have following alternative forms:

  • integer: if 0, no blockers are stronger or weaker than this. if >0, larger number is considered stronger/more specific

  • ["$",integer] where integer is a class number in the taxonomy

  • ["$",string] where string is converted to a class number in the taxonomy

  • ["$",integer/string,integer2] where

    integer/string is like in the previous case, class number or word; integer2 encodes strength directly like in the first case

    When comparing this form with the first single integer form above, the integer2 is used for comparison and integer/string is ignored.

    When comparing two blockers of this/last form, the integer2 of both is used for comparison in case the integer/string2 is mutually incomparable (no one blocks another)

Also recall the taxonomy class / concept name mechanism assumes that either proof search uses a knowledge base which was read in with the additional -defaults flag or the proof search command uses the same -defaults flag itself, and the files gk_name_number.txt and gk_taxonomy_packed.txt were available. An attempt to use the -defaults flag if these files are not available will terminate with an error message.

Using the strengths like ["$",integer] or ["$",string] without the -default key is allowed and does not generate an error, but the strengths will be incomparable.

Try out a deeper version of the penguin example (with some uncommon flying penguins) first with the direct strength numbers:

./gk penguin2.js

and the same version with strengths given using the ["$",string] strength to be used along with the taxonomy information:

./gk penguin3.js -defaults

Notice the use of ["$","penguin",3] strength for uncommon flying penguins and ["$","penguin",2]`, for normal penguins: the point being that the taxonomy used does not contain a notion of "flying penguins" and we need to use the additional numbers (3 and 2 above) to essentially extend the known taxonomy.

There is a useful utility option for comparing the generality of predicates according to their taxonomy number:

./gk -task 'taxcompare,bird,object' -defaults

will give the taxonomy class numbers and the comparison result for "bird" and "object", while

./gk -task 'taxcompare,61598,6811' -defaults

compares the taxonomy class numbers.

Check out these examples from the demo folder:

  • penguin.js: default rules and taxonomies
  • penguin2.js: deeper branches with default rules and taxonomies
  • penguin3.js: penguin2.js version to be called with the -defaults key
  • nixon.js: contradictory default rules
  • people_room.js: describing a situation-event chain with default rules

Exceptions combined with confidences

First, clauses encoding rules with exceptions can be given a lower confidence than the default 1, just like any other clauses.

Second, the combination of blockers and confidences (from whichever sources) leads to the following important question: suppose we are proving a blocker like

    ["$block", 2, ["flies", "bird_1"]]]

to determine whether a proof containing this blocker should be discarded. Suppose we can prove this with a low confidence 0.2. Is this low confidence a sufficient reason for discarding a proof?

The approach currently implemented in gk is to consider blocker proofs with a confidence 0.5 or higher as valid, and blocker proofs with a confidence below 0.5 as not being sufficient grounds for discarding a checked proof. This principle could be modified in later versions of gk.

Check out this example from the demo folder:

  • classify.js: confidences and default rules for simple classification

Similarities

Gk contains an initial experimental implementation of reasoning by similarity. This implementation is not yet well suited for larger knowledge bases and is currently being worked on.

A small demo of the similarities-based reasoning is the file kinqueen.js in the demo folder: without similarities gk derives that the queen q1 is female, allowing similarities makes gk also derive that the queen q1 is rich like king, but without deriving that it is also male, like king:

../gk kingqueen.js -similarities

The -similarities key reads in a file (has to be with this exact name!)
gk_similarity.txt present in the demo folder. The file contains rows indicating similar words along with the strength of similarity and exception words which should not occur in a proof using this similarity:

queen,12,king,0.9,male$female
king,12,queen,0.9,male$female
penguin,12,ostrich,0.9,,emu,0.9,
ostrich,12,penguin,0.9,,emu,0.9,
flies,12,inair,0.8,
inair,12,flies,0.8,

Only the first two rows are relevant to the kingqueen.js file, the rest are examples for simpler cases. Each row contains:

  • the main word
  • placeholder (here 12) for future use
  • triples of the form: similar word, similarity number, exceptions separated by dollar $

Example from the demo folder:

  • kingqueen.js : the example which was described above

Reading a knowledge base to memory

For answering several questions over large knowledge bases (and only then) it is advisable to split the input into several files, using the same JSON-LD-LOGIC syntax:

  • a knowledge base file
  • a question file

and then first read the knowledge base into persistant shared memory like this:

./gk -readkb knowledgebasefile -seconds 100

where the -seconds 100 key is optional, but is recommended to avoid timeouts for very large files. You may also consider the -mbsize 10000 or -mbsize 1000 keys or similar if the file is too large or memory limited (default is 5 gigabytes).

After successfully reading the knowledge base gk will terminate and by default output

{"result": "db ready in shared memory, name 1000, size 5000000000 bytes"}

The name 1000 and size 5000000000 bytes are informative and reflect the actual name and size: run ./gk -help to check out the keys -mbnr and -mbsize which determine the memory database number and size. The default memory base number (name) is 1000.

The memory database thus read can then be used during question answering with a -usekb key (and optionally also -mbnr mbnumber) like this:

./gk -usekb questionfile 

In case the memory database size is not sufficient while reading a large file, gk will give an error like this:

{"error": "db memory allocation error:  ...."}

Some useful notes: (a) you can give gk several input files, not just one, (b) reading a knowledge base with the -readkb again will delete the existing memory database and build a new one, (c) you can have many different memory databases simultaneously with different -mbnr value keys; an actual proof search can only use one at a time, chosen with the -mbnr value key.

Gk contains special machinery for efficiently comparing the priorities of default rules using taxonomies. For this you need to have two special files in your folder, with exactly these names (present in the demo folder):

  • gk_name_number.txt
  • gk_taxonomy_packed.txt

and then run gk with the -defaults key to read them in and later use:

./gk -readkb knowledgebasefile -defaults -seconds 100

You can also use a key -datafolder foldername to indicate a concrete folder where these two special files (and other potential special files) are located.

The keys described here are useful for exploration and experimenting.

We will first give a brief overview of the structure of the search gk conducts: a more detailed overview will be given in a later section.

Gk splits the whole search process into four parts, with roughly equal time resources (ca 1/4 of the overall time limit) for each:

  • Find a list of candidate answers to a query: a list of different proofs.
  • Eliminate these candidates which contain blockers which can be proved.
  • Find negative-answer proofs (negative evidence) for surviving proofs.
  • Eliminate these negative proofs which contain blockers which can be proved.

A negative evidence search is additionally conducted during searches for proving blockers. Blocker searches are, in general, recursive, with diminishing time limits.

By default, a list of candidate answers is limited to 10 elements.

The following command line keys switch off parts of this search structure:

  • -firstanswer : do not attempt to find multiple derivations and answers. Thus whenever one answer (proof) is found in any of the sub-searches, this search is successfully terminated and the whole process continues immediately.

  • -nocheck : do not check blockers.
    Search parts two and four listed above will be skipped. The end result may contain answers which actually should be blocked.

  • -nonegative : do not search for negative evidence. The third and fourth search parts listed above will be skipped, along with any negative evidence search during the checking of blockers.

  • -nocumulate : do not cumulate confidences. This key does not switch off any large search parts, but disables cumulation of evidence normally performed after each elementary sub-search has been finished. No ["cumul",...] steps should appear in the proofs.

In order to quickly check whether gk finds anything useful for a query, it is advisable to run it with all these keys (and the forthcoming -confidence) like this:

./gk problemfile -firstanswer -nocheck -nonegative -nocumulate -confidence 0 

As a consequence, if at least one candidate answer is found in the first part of the search structure, gk will terminate and output this candidate answer.

NB! These keys do not influence time management by gk. In effect, when gk has, say, a 10 second time limit, and you use all the keys as shown above, gk will allocate just ca 2.4 seconds to search.

Additional keys for confidence limits with a global effect for search:

  • -confidence <nr> : lowest allowed confidence for results: integer 2..100 or float 0..1; default 0.1 As mentioned above, by default gk will discard and not show answers with a combined confidence below 0.1. This key changes that limit.

  • -keepconfidence <nr> lowest allowed confidence for derived clauses: integer 2..100 or float 0..1; default 0. Setting this limit will discard all clauses derived during proof search whose confidence is below that limit.

Keys for additional output

The amount of output given by gk is determined by the -print N key with an integer N between 0...100. The default value is 10.

There is an additional -derived key which forces gk to print out all the derived and kept clauses during resolution sub-searches, regardless of the -print N key value. This key is useful only in fairly specific cases.

A larger N in -print N gives, in general, more output. Not all increments give additional information. The main useful values are:

  • 1: just says {"result": "answer found"} or a reason why not

  • 10: (default) gives JSON answers and proofs as shown in previous sections.

  • 11: adds dependency lists used for cumulation to proofs. An example result for the trivial.js file would be:

      {"result": "answer found",
      "answers": [
      {
      "answer": true,
      "confidence": 1,
      "positive proof":
      [
      [1,      4, ["in", "frm_4", "goal", 1, []], [["-p","a"]]],
      [2,      1, ["in", "frm_1", "axiom", 1, []], [["p","a"]]],
      [3,      6, ["simp", 1, 2, "fromgoal", 1, [1]], false]
      ]}
      ]}
    

    Here [] and [1] in ["in", "frm_4", "goal", 1, []] and ["simp", 1, 2, "fromgoal", 1, [1]] indicate which clauses the derived clause depends upon. The id-s in these lists are in the second column of clauses in the proofs: this column is not present at the default proof output level. Observe that they are different from the clause numbers (first column) in the proof referred to in proof steps like ["simp", 1, 2, ...]. The id reflects the order of the actual derivation of clauses in the search.

  • 12: prints out a trace of the whole search along with the proofs found in sub-searches.

    NB! Level 12 and higher do not give JSON output and will change with the future versions of gk.

  • 13: shows an automatically created search strategy (if none explicitly given) used for individual searches and more details about the whole search.

  • 14, 15: even more details about the whole search.

  • 16: adds given clauses picked during each resolution sub-search.

    Level 16 and above will give additional information about the details of sub-searches and are mostly useful for debugging and exploring possibilities for optimizations.

  • 21: adds given clause candidates along with resolvability information and actually derived clauses during each resolution sub-search.

  • 50: adds active clauses attempted for clause derivation together with a given clause, plus additional information about derivation steps.

  • 50: adds more information about early phases of each derivation step.

  • 100: adds information about selections / substeps for clause derivation.

Search strategies for individual sub-searches

By default gk automatically selects a simple generally OK search strategy for individual sub-searches. The selection algorithm and hence the automatic strategy is likely to be changed in the future versions of gk.

Differently from gkc, the current version of gk uses a single, non-parallel search for each individual sub-search. This is also likely to be changed in the future versions of gk.

You can change the search strategy by using the key -strategy filename where the filename should contain a strategy definition in JSON.

For large knowledge bases it is recommended to explicitly give gk the following strategy (strat_large.js in the demo folder):

{"strategy":["query_focus"], "query_preference":1, "max_answers":10}

For small knowledge bases it is recommended to explicitly give gk the following strategy (strat_small.js in the demo folder):

{"strategy":["negative_pref"], "query_preference":1, "max_answers":10}

The strategy file keys used by gk mostly, but not totally overlaps the strategy keys for gkc described in the gkc README : details (mostly the same) will be given in the next section.

In particular, "print", "print_level" and "max_seconds" used for gkc are not used by gk.

An important key for gk is "max_answers", by default 10. Changing this number tells gk the number of different initial answers it should try to collect during each concrete sub-search. Observe that the final number of answers output depends on several factors and may be smaller than the indicated number.

A key specific to gk is "independence", default 100, which has to be given as an integer percentage 0...100. This number is an estimate of the overall independence of facts and rules and influences cumulation of evidence. High independence increases the cumulative value, low independence decreases. Setting independence to 0 prohibits any cumulation, analogously to the -nocumulate command line key.

Nomenclature and effect of strategy keys

  • "max_answers" : see above
  • "independence" : see above
  • "equality" : in case "equality" is not set to 0, gk uses reflexivity, paramodulation and demodulation with knuth-bendix ordering for handling equality.

Limits on derived clauses (clauses over limit are discarded), with the default value 0, indicating that no limit is set:

  • "max_size": total atom/subterm count
  • "max_depth": maximal term depth
  • "max_length": number of literals

The list "strategy": [...] contains the main search strategy indicators, default off:

  • "query_focus" : use a goal-oriented set-of-support strategy with binary resolution
  • "negative_pref" : use binary resolution with negative literals preferred
  • "positive_pref" : use binary resolution with positive literals preferred
  • "hardness_pref" : use binary resolution with "hardest" (similar to weight) literals preferred
  • "knuthbendix_pref" : use binary resolution with knuth-bendix ordering of literals
  • "hyper" : use hyperresolution, with negative literals preferred
  • "posunitpara": perform paramodulation from units only
  • "prohibit_nested_para": disallow paramodulation if either parent is derived by paramodulation
  • "max_ground_weight": use the weight of the heaviest literal as the base weight of a clause
  • "unit", "double" or "triple" : use binary unit resolution or its generalization: (one of the arguments must be unit, a two-literal or three-literal clause, respectively. These may be added to the list in addition to the previous strategy indicators, for example, like ["query_focus","unit"].

Other useful parameters:

  • "max_size", "max_length", "max_depth", "max_weight" indicate limits on kept clauses, defaults are 0.
  • "equality" : 1 or 0, with 1 being default and 0 prohibiting equality handling.
  • "rewrite" : 1 or 0, with 1 being default and 0 prohibiting using equations for rewriting.
  • "weight_select_ratio": N indicating the ratio of picking by order derived / clause weight, default is 5.
  • "reverse_clauselist": N either default 0 or 1, where 1 sets a non-standard order for input clauses.
  • "depth_penalty": additional penalty for clause depth, default 1
  • "length_penalty": additional penalty for clause length, default 1
  • "var_weight": weight of a variable, default 5
  • "var_weight": weight of a repeated variable, default 7
  • "query_preference": N being 0, 1, 2 or 3 indicates which parts of the problem are treated as goals, assumptions or axioms:
    • 0 stands for no goal/assumption preference.
    • 1 stands for input preference (the assumption and conjecture formulas of fof)
    • 2 stands for making non-included formulas assumptions
    • 3 stands for considering only the negative clauses from conjecture to be goals

For "max_seconds"<2 gkc will automatically use immediate check for contradiction when a clause is derived. For problems with less than 1000 input clauses, gkc will use sine for input clause ordering.

Experiments with Quasimodo

The demo folder contains a relatively small subset multisource.js of the knowledge base built by Priit Järv from the Quasimodo database, see https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/commonsense/quasimodo

There are several simple example queries for this knowledge base to try out and experiment with. Not all of them give answers based on multisource.js : you may want to use a larger subset, change keys (try out -confidence 0) etc.

Run these examples first either as

./gk multisource.js quasi_N.js

or load the knowledge base first into memory (highly recommended for larger Quasimodo subsets) and then run the questions like:

./gk -readkb multisource.js -seconds 100
./gk -usekb quasi_N.js

It is also recommended to add the key -seconds 1 to the command line to avoid long waits for additional potential answers.

  • quasi_yawning.js
  • quasi_babyhair.js
  • quasi_birdfly.js
  • quasi_penguinfly.js