The Darl Language

'Darl' stands for Doctor Andy's Rule Language. It is pronounced like the name 'Daryl'.

Introduction

The DARL language is intended to permit both software tools and humans to record and re-use knowledge in the form of "if..then" rules. The language is simple and intuitive. Given the appropriate tooling, non-programmers should be able to use it.

Primitive elements

The following primitive elements are defined in DARL.

Identifiers

Identifiers are names used to identify inputs, outputs, constants, strings, rulesets, mapinputs and mapoutputs. They follow the convention of C#, namely:

  • They cannot start with a number
  • They cannot contain spaces
  • They are case sensitive
  • They can only contain letters, numbers and the "_" character.

Numeric literals

These follow C#, so they can be integers or floating point numbers with optional exponent.

Infinity

Unbounded sets are denoted by positive or negative infinity as the first or last value. Darl recognises Infinity, -Infinity, -∞ and .

String literals

These are pieces of text delineated with the " or ' characters.

Terminators

These are characters that delineate different sections of DARL. they are:

  • ; - marking the end of a rule or definition
  • { } - marking the beginning and end of a block - used in set and ruleset definitions
  • ( ) - marking the beginning and end of a list in functions
  • , - separating elements of a list

Keywords

These are pieces of text that have specific meaning in the DARL language. they are:

"{", "}", "(", ")", ",", ";", ".", "if", "then", "will", "be", "confidence", "input", "output", "numeric", "categorical","arity","presence","string","constant", "or", "and","not","is","*","/","-","+","%","^",";" ">","<",">=","<=","anything","textual", "maximum","minimum","sum","product","fuzzytuple","sigmoid","normprob","round","ruleset","wire","mapinput","mapoutput","pattern"

Keywords are always lower case.

Composite identifiers

When connecting to inputs or outputs of a ruleset it is necessary to use the name of the ruleset and the input or output name. This is achieved by concatenating the ruleset name and the i/o name with a "." separator.

rulesetname.inputname

Comments

Comments are freely permitted in code. They follow the C#/Java form:

//single line comment
/*bounded  comment */

Top level elements

These elements are concerned with inputs and outputs at the highest level, and the definition of rulesets. Darl permits the creation of multiple rule sets within a file, like functional blocks in a schematic diagram, and these can be wired to "connector" elements that act as inputs and outputs for the aggregate document, called a map. If a darl source file contains only a single ruleset, then these input output and wire definitions can be left out, and the runtime will automatically generate these elements.

Mapinputs

These define the inputs at the top or "map" level. These will accept external values during the machine learning or inference process. They take two parameters, the first, which is mandatory, is an identifier naming the mapinput. The second optional parameter is a string containing navigation text for machine learning.

mapinput fred "/fred/text()"

Mapoutputs

These define the outputs at the top or "map" level. These will generate values during the machine learning or inference process. They take two parameters, the first, which is mandatory, is an identifier naming the mapoutput. The second optional parameter is a string containing navigation text for machine learning.

mapoutput bill "/bill/text()"

Wires

These connect up the various elements at the top level. They take two parameters, the first is an identifier naming the source, the second parameter is an identifier naming the destination. Both may be composite identifiers Sources can be mapinputs or ruleset outputs. Destinations may be mapoutputs or ruleset inputs.

wire input1 ruleset1.input1
wire ruleset2.output1 ruleset3.input2

The DARL parser checks for impossible combinations, such as connecting two ruleset outputs. The online DARL designer makes these checks dynamically. A further constraint is that connections between rulesets are only possible where the data types match.

Rulesets

These define the block of code corresponding to a ruleset. A ruleset is a unit of processing, with an analogy to the class in conventional programming. They take two parameters, the first, which is mandatory, is an identifier naming the ruleset, the second optional parameter is a keyword identifying if the ruleset will be subject to machine learning. Choices here are:

  • manual - The ruleset contents will be created by hand - default, not required.
  • supervised - The ruleset will be created by supervised machine learning
  • unsupervised - The ruleset will be created by unsupervised learning
  • reinforcement - The ruleset will be created by reinforcement learning
ruleset ruleset1 supervised 
{ 
//ruleset content 
}

The pattern

When the contents of one or more rule sets are to be set by machine learning, then a data source must be connected to the DARL file. Machine learning is based on learning associations between or within inputs and/or outputs. Generally these data values are presented as a series of patterns. The optional pattern element permits you to define how to find those patterns in the language appropriate to the data source. This might be XPath for an XML source, or SQL for a database. There should only be one pattern element in a DARL source file. Patterns take one parameter, the string containing the navigation text to identify the pattern.

pattern "//pattern"

Ruleset level elements

These elements are concerned with inputs and outputs within a ruleset and the contents of the rules.

inputs

These define the inputs at the ruleset level. These will accept external values during the machine learning or inference process. They take several parameters, the first, which is mandatory, is the data type of the input, the second also mandatory parameter is an identifier naming the input. Choices here are:

  • numeric - the input value will be a number.
  • categorical - the input value will be text falling into one of a set of categories.
  • textual - the input value will be a string.
input numeric fred; 

Numeric inputs can optionally contain a series of fuzzy set definitions, while categorical inputs can have a series of categories.

Fuzzy set definitions

Fuzzy sets require an identifier to name the set and between 1 and 4 numeric literals, in ascending order, to define the set.

input numeric fred {{small, 1, 2, 3},{medium, 2, 3, 4},{large, 3, 4, 5}};

Inputs can have any number of fuzzy sets, though each set name must be unique. Ideally the ruleset writer will choose meaningful names and ensure that the set ranges match the names. So, as above, the medium set ought to be greater than the small set. This is an element of style that the DARL parser cannot enforce. If machine learning of a ruleset is employed, fuzzy sets will be generated automatically from the data.

Category definitions

Categorical variables are very common. They are multi-choice variables, like male/female or single/married/separated/divorced. A category definition is a sequence of categories, that can be either identifiers or string literals.

input categorical fred {true,false};

If machine learning of a ruleset is employed, categories will be generated automatically from the data.

Outputs

These define the outputs at the ruleset level. These will generate values during the machine learning or inference process. They take several parameters, the first, which is mandatory, is the data type of the output, the second also mandatory parameter is an identifier naming the input. Choices here are:

  • numeric - the input value will be a number.
  • categorical - the input value will be text falling into one of a set of categories.
output numeric fred; 

Numeric inputs can optionally contain a series of fuzzy set definitions, while categorical inputs can have a series of categories.

Fuzzy set definitions

Fuzzy sets require an identifier to name the set and between 1 and 4 numeric literals, in ascending order, to define the set.

output numeric fred {{small, 1, 2, 3},{medium, 2, 3, 4},{large, 3, 4, 5}};

Outputs can have any number of fuzzy sets, though each set name must be unique. Ideally the ruleset writer will choose meaningful names and ensure that the set ranges match the names. So, as above, the medium set ought to be greater than the small set. This is an element of style that the DARL parser cannot enforce. If machine learning of a ruleset is employed, fuzzy sets will be generated automatically from the data.

Category definitions

Categorical variables are very common. They are multi-choice variables, like male/female or single/married/separated/divorced. A category definition is a sequence of categories, that can be either identifiers or string literals.

output categorical fred {true,false};

If machine learning of a ruleset is employed, categories will be generated automatically from the data.

Constants

This permits the definition of numeric constants. Since constants are frequently re-used and it is better practice to keep them in one place, DARL does prefers to avoid the use of numeric literals inside rules. Instead you should define a numeric constant and use the name of that constant. A constant has two parameters, the identifier naming the constant and a numeric literal.

constant Age_related_income_limit 24000;

String Constants

This permits the definition of string constants. Since constants are frequently re-used and it is better practice to keep them in one place, DARL does not allow the use of string literals inside rules. Instead you should define a string constant and use the name of that constant. A string constant has two parameters, the identifier naming the constant and a string literal.

string regex1 "*A|B";

Sequence Constants

This permits the definition of sequence constants. Since constants are frequently re-used and it is better practice to keep them in one place, DARL does not allow the use of sequence literals inside rules. Instead you should define a sequence constant and use the name of that constant. A sequence constant has two parameters, the identifier naming the constant and a sequence literal. The latter defines a sequence of string literals and/or lists of string literals.

sequence seq1 {"fred",{"jane","samantha"},"bill"};

The above is interpreted as a sequence of literals as, "fred", followed by "jane" or "samantha", followed by "bill". Sequences are an extension to DARL used in DAPL.

Rule level elements

Part of the flexibility of DARL is the ability to represent many kinds of relationships with one structure The fundamental structure of a rule is as follows:

if < conditional expression > then < output identifier > will be < RHS expression > < optional confidence value >;

Since outputs can be numeric or categorical, the RHS expression syntax depends on the type of the output. Choices are:

  • Categorical output: The RHS can only be a category defined in the list of categories for that output
  • Numerical output: The RHS can be a fuzzy set defined in the list of sets for that output
  • Numerical output: The RHS can be a numeric expression that is evaluated dynamically.

Conditional Expressions

This is a fuzzy or Boolean logic expression. The degree of truth associated with it as it is evaluated is used to determine the rules precedence against other rules containing the same output identifier. There are several logical operators that may be used at the top level:

  • anything: If used this must be the only operator in the top-level conditional expression and always evaluates to truth 1.0.
if anything then a will be b;
  • and: gives the fuzzy logic "and" of the operands either side, implemented as the minimum of their degrees of truth.
if a is b and c is d then f will be q;
  • or: gives the fuzzy logic "or" of the operands either side, implemented as the maximum of their degrees of truth.
if a is b or c is d then f will be q;
  • not: gives the fuzzy logic "not" of the single operand following, implemented as 1 - its degrees of truth.
if not a is b then f will be q;
  • is: evaluates the input's or output's fuzzy value on the left hand side against the expression on the right hand side.
if a is b then f will be q;

Since the input or output could be numeric, categorical or textual (inputs only) there are several possible combinations.

  • The input or output is categorical, the RHS can only be a category defined for that input or output.
if a is false then b will be true;
  • The input or output is numeric, the RHS can be a set defined for that input or output.
if a is large then b will be true;
  • The input or output is numeric, the RHS can be a comparison operation followed by a numeric expression.
if a is < c + d then b will be true;
  • Comparison operators can be >, <, >=, <=, =, !=, interpreted as greater than, less than, greater than or equal to, less than or equal to, equal to and not equal to respectively.
  • The input is textual, the RHS can only be a textual comparison operation.
if a is match(string) then b will be true;

Default behaviour

A single rule defining default behaviour that is only triggereed for a given output if no other rule associated with that output triggers can be created by putting otherwise before that rule.

if a is match(string) then b will be true;
otherwise if anything then b will be false;

Numeric Expressions

These are algebraic expressions, following the rules of Fuzzy Arithmetic. See Introduction to Fuzzy Arithmetic

The operands can only be numeric inputs or outputs, numeric constants, built in functions or other numeric operators. The operators available are:

  • +: Addition.
if a is = b + c then d will be true;
  • -: Subtraction.
if a is = b - c then d will be true;
  • *: Multiplication.
if a is = b * c then d will be true;
  • /: Division.
if a is = b / c then d will be true;
  • ^: Power.
if a is = b ^ c then d will be true;
  • %: Modulo.
if a is = b % c then d will be true;

Built in functions

For built in functions, the parameters are separated by commas and can each be numeric expressions. Built in functions are:

  • sum: The sum of a set of values.
if a is = sum(b,c,d) then p will be q;
  • product: The product of a set of values.
if a is = product(b,c,d) then p will be q;
  • maximum: The greatest of a set of values.
if a is = maximum(b,c,d) then p will be q;
  • minimum: The smallest of a set of values.
if a is = minimum(b,c,d) then p will be q;
  • normprob: The Gaussian probability of a normalized value (average = 0, SD = 1).
if a is = normprob(b) then p will be q;

Note a single operand.

  • round: the first operand is rounded to the accuracy set by the 2nd. Non-fuzzy
if a is = round(b,c) then p will be q;

Note only two operands.

  • randomtext: Randomly selects from an array of text items;
if anything then p will be randomtext("one","two","three");
  • document: Substitutes data items into a text document. The first parameter is a string literal or textual i/o containing the text template, the second parameter is a list of the inputs on d outputs that are used in the template.

The template language permits two kinds of modification: Simple substitution is performed by placing a tag like %% data_type %% in the template text. In this case the value of an input or output called data_type will replace the text. Sections of text can be retained or excluded based on categorical i/o.

%% { interEU.false %% text if false %% interEU.false } %%
%% { interEU.true %% text if true %% interEU.true } %%

In the above case interEU is a categorical input or output with the categories true,false. If interEU is true the text "text if true" is included in the output, otherwise if false "text if false" will be included.

Text replacement tags can be nested to any depth, and simple substitutions can occur inside text selections.

input textual fred;
input textual a;
input numeric b;
output textual p;

if anything then p will be document(fred,{a,b});

In the case above the value of the output p will contain the template text fred with a and b selecting or substituting.

Optional Confidence Value

Rules can have an optional confidence value, in the range 0.0 to 1.0. If no confidence is specified, the default confidence is 1.0. The confidence value corresponds to the maximum degree of truth associated with that rule. For rule sets created via machine learning, this is associated with the support the data gives to this rule. For rules created by hand, you can create default behavior, i.e. you can specify a rule that fires if others don't, by defining a rule with low confidence.

if anything then b will be false confidence 0.5;