Imperative RegExp. Notation

Regular Expressions For All (REFA)

the Basic idea


There are many systems for searching of substrings meets a certain mask. Unfortunately, they lose their power as soon as you have to consider many factors. Designs become gromacki, incomprehensible and trudnootdelyaemoy.
That's what I tried to create an analogue of a – REFA. Regular expressions for everyone.
His idea in the following. As soon as the regular expression ceases to be obvious – to break it into two. The optimizer if possible, still bring him into one, thus in the rate of loss will not, but the code will become clearer.

For easy reading ge.tt/9snPkzG/v/0 (format \.odt)

Examples


Search functions C++

Find the implementation of all methods of the class dummy.

It is considered that the input is big string with all the code of the project. You can make reading from a file, but this will complicate the understanding of example.
the
PROGRAM “FindMethods”
^name^ = ~\w?[\w|\d]*~
BLOCK “FindClass” // Declaration of class
PUSH
BLOCKVAR $regexp = “class ”+%classname%+”\s*\{.*\}.*;”
MATCH $regexp
CATCH MATCH_FAIL
RETURN array() AS $list;
RETURN array() AS $result;
FINISH
BLOCKVAR $class_code = MATCHED
INCOMING = $class_code
BLOCKVAR $method = ^name^+~\w*~+^name^+~\([\^name\^\w*\^name\^\w*\,?]*\)\w*~
BLOCKVAR $declarations = array();
BLOCKVAR $realisations = array();
TRY
WHILE 1 
MATCH PASS LIMIT 1 $method
IF select(0,1) INCOMING != “;”
CALL “SearchEndOfFunction” REMAINED
$realisations ADD (MATCHED + RESULT $body)
ELSE
$ADD the declarations MATCHED
ENDIF
END
ON MATCH_FAIL OR END_OF_STRING
RETURN $declarations AS $list
RETURN $realisations AS $result
FINISH 
POP
ENDBLOCK

BLOCK “SearchEndOfFunction”
BLOCKVAR UINT $level = 0
MATCH ~[\{|\}]~
FOREACH ALL_MATCHED AS $t
IF $t == “{}
$level++;
ELSE
$level--;
ENDIF
IF $level == 0
BLOCKVAR STRING $ret = select(ALL_MATCHED[0], ALL_MATCHED[ITERATION]) INCOMING_BLOCK
RETURN $ret AS $body
ENDIF
END
ENDBLOCK

BLOCK “AddClassName”
MATCH PASS LIMIT 1 ^name^+”\w*”
BLOCKVAR $ret = MATCHED
$ret += “[\^name\^\w*::\w*]*”+%classname%+”\w*::\w*”
$ret += REMAIN
RETURN $ret
ENDBLOCK

BLOCK “SearchDeclaredFunctions”
BLOCKVAR $dec = %declared%
IMPLODE ($dec, “|”) $string
$string = “[“+$string+”]”
MATCH $string
BLOCVAR $realistaions = array()
FOREACH ALL_TILES as $tile
IF ITERATION % 2 == 1
IF select(0,1) INCOMING != “;”
CALL “SearchEndOfFunction” ALL_TILES[ITERATION + 1]
$realisations ADD (ALL_TILES[ITERATION] + RESULT $body)
ENDIF
ENDIF
END
RETURN $realisations AS $result
ENDBLOCK

// program code
BLOCKVAR $classname = $arg1
CALL “FindClass”
BLOCKVAR $ret = RESULT $result
BLOCKVAR $declared = RESULT $list
CALL “SearchDeclaredFunctions”
$ret ADD RESULT $result
RETURN $ret
ENDPROGRAM


The program was not very small but at least bole-less clear. A regular expression similar to this... not recommended.

Documentation


data Types

INT

Type by default. Integer. Range -2^31 to +2^31-1. The default value is 0.
LONG

Integer. Range -2^63 to +2^63-1. The default value is 0.
UINT

ULONG

STRING

String. The maximum length of a UINT. Private fields of START and COUNT.
The default value does not exist and throws an exception.
TILE

Part of a string. Private fields of START, END, COUNT, PARENT_STRING.
Predefined variables

INCOMING

Line for processing. Substituted if no variable specified.
ICOMING is synonymous INCOMING_CURRENT
the
    the
  • INCOMING_PROGRAM – came to the program
  • the
  • INCOMING_BLOCK – came into the unit
  • the
  • INCOMING_CURRENT – current line
  • the
  • INCOMING_LAST – up to last change

MATCHED

First match coming up in the last match.
ALL_MATCHED

An array with all matches of the last expression.
REMAINED

The first character after the MATCHED
ALL_REMAINED

The first character after each ALL_MATCHED
ALL_TILES

All odd is ALL_MATCHED. The rest – the missing lines in the correct order to a string.
ITERATION

The number of the iteration in the current cycle. To obtain the number of iteration in the outer – keep in a separate variable.
CALLSTACK

Call stack with parameters
QUERY_LOG

The command log driving or otherwise on-line. Definitely keep up the lines(I was post-processing). Incoming data to be stored in a single instance.
EXCEPTION_STRING

The string explains the error. Place of vozniknoveniya, incoming parameters, result.
Minimum set

Required for basic use of the system
MATCH [IGNORE {ignore_count|FIRST}] [PASS] [LIMIT {limit_count}] reg_exp [processing_string]
Check reg_exp, move the START processing_string in a MATCHED (default)
IGNORE – skip the first few matches. Default IGNORE 0
PASS – move START to the last ALL_REMAINED
LIMIT – the maximum number of matches, after which the routine terminates. The default LIMIT is 0, so it will run until the end of the file.
reg_exp – can be a regular expression between the ~ can be a variable.
processing_string – string for processing. By default, the INCOMING
Echo

ECHO string
The output of the rows in the result.
A simple example of regular expression replacement:
MATCH PASS ~some_regexp~
FOREACH ALL_TILES AS $tile
IF ITERATION % 2
// all matched pieces to replace the line
ECHO “REPLACED”
ELSE
// all chunks between the matched return unchanged
ECHO $tile
ENDIF
END
IF ELSE ENDIF

IF expr then statements [ELSE else statements] ENDIF
If the expression expr is not equal to zero, then the code is executed, otherwise else
Extended set

PROGRAM

Program – atomically a set of executable commands that performs a useful task. Only programs can have settings other than “default”.
Generally speaking it can be a separate process(or thread) to run in parallel. There is no way to go from one program to another. But you can use(if declared) methods neighboring programs. Programs can call programs.
The program is a scope for all blocks.
By default, all commands are in the program with a null name (it cannot be called from other programs)
PROGRAM name arg0 [arg1 arg2 ... ] code ENDPROGRAM
name – the name of the program
arg0 – a string for processing. Becomes INCOMING_PROGRAM
code – the code of the program, including declarations.
Access code blocks using the
program_name::block_name.
BLOCK

BLOCK name [string]
A dependent piece of code. Edentichny two jumps goto. If you specify a string before you run the change the corresponding INCOMING after return.
PUSH POP

PUSH [var1 var2]
To save the state of system variables. You can also add to save local variables (listing), and to explicitly exclude some system by using design !var
POP – restores the state to the moment before PUSH
BLOCKVAR

A temporary variable is only available in the current scope, and is destroyed upon exit.
RETURN RESULT

Is used to return the values of the time variable of the unit/program.
RETURN name
To access the variable in the calling structure used RESULT name
The value is valid before calling the next block.
error Handling

During the execution of the script there are various exceptional situations that should not affect the implementation process. That there is a system exception.
exceptions: exception_name [OR exception_name ... ]
CATCH FINISH

CATCH exceptions code [CATCH exceptions code ...] FINISH
Necessary to trap an error occurred at line previous to the first CATCH block.
Used in exceptional situations, if the situation in this area is expected and be processed.
TRY ON FINISH

TRY code ON code exceptions [ON exceptions code ...] FINISH
THROW

THROW exception
To generate the error manually
Types of errors

the
    the
  • MATCH_FAIL – failed to find any occurrences of a regexp
  • the
  • END_OF_STRING – end is reached before that managed to find(implies MATCH_FAIL)
  • the
  • WRONG_REGEXP – failed to compile regular expression
  • the
  • VARIABLE_OVERFLOW – overflow variable
  • the
  • UNSIGNED_NEGATIVE – entering a negative value into an unsigned integer
  • the
  • WRONG_STRING_INDEXES – attempt to access a string index out-of-bounds line
  • the
  • OUT_OF_ARRAY – attempt to access nonexistent array elements (outside)


Special design

~regexp~

Content – regular expression
%name%

During the execution of paymentsa a copy of the value of the variable $name. (the short stack)
#name#

Analog define
^name^

Link to regular expression. Works inside ~ ~ \^
~\^hello\^ world~

Working with strings

array{tile} SPLIT(delimeter) [string]
tile SELECT(start, end) [string]
PASS(count) [&string]
CUT(count) [&string]
CUT_AFTER(index) [&string]
IMPLODE (array[, delimeter]) &string
Article based on information from habrahabr.ru

Комментарии

Популярные сообщения из этого блога

mSearch: search + filter for MODX Revolution

Emulator data from GNSS receiver NMEA

The game Let's Twist: the Path into the unknown