Imperative RegExp. Notation
Regular Expressions For All (REFA)
There are many systems for searching of substrings meets a certain mask. Unfortunately, they lose their power as soon as you have to consider many factors. Designs become gromacki, incomprehensible and trudnootdelyaemoy.
That's what I tried to create an analogue of a – REFA. Regular expressions for everyone.
His idea in the following. As soon as the regular expression ceases to be obvious – to break it into two. The optimizer if possible, still bring him into one, thus in the rate of loss will not, but the code will become clearer.
For easy reading ge.tt/9snPkzG/v/0 (format \.odt)
It is considered that the input is big string with all the code of the project. You can make reading from a file, but this will complicate the understanding of example.
the
The program was not very small but at least bole-less clear. A regular expression similar to this... not recommended.
Type by default. Integer. Range -2^31 to +2^31-1. The default value is 0.
Integer. Range -2^63 to +2^63-1. The default value is 0.
String. The maximum length of a UINT. Private fields of START and COUNT.
The default value does not exist and throws an exception.
Part of a string. Private fields of START, END, COUNT, PARENT_STRING.
Line for processing. Substituted if no variable specified.
ICOMING is synonymous INCOMING_CURRENT
the
First match coming up in the last match.
An array with all matches of the last expression.
The first character after the MATCHED
The first character after each ALL_MATCHED
All odd is ALL_MATCHED. The rest – the missing lines in the correct order to a string.
The number of the iteration in the current cycle. To obtain the number of iteration in the outer – keep in a separate variable.
Call stack with parameters
The command log driving or otherwise on-line. Definitely keep up the lines(I was post-processing). Incoming data to be stored in a single instance.
The string explains the error. Place of vozniknoveniya, incoming parameters, result.
Required for basic use of the system
MATCH [IGNORE {ignore_count|FIRST}] [PASS] [LIMIT {limit_count}] reg_exp [processing_string]
Check reg_exp, move the START processing_string in a MATCHED (default)
IGNORE – skip the first few matches. Default IGNORE 0
PASS – move START to the last ALL_REMAINED
LIMIT – the maximum number of matches, after which the routine terminates. The default LIMIT is 0, so it will run until the end of the file.
reg_exp – can be a regular expression between the ~ can be a variable.
processing_string – string for processing. By default, the INCOMING
ECHO string
The output of the rows in the result.
A simple example of regular expression replacement:
MATCH PASS ~some_regexp~
FOREACH ALL_TILES AS $tile
IF ITERATION % 2
// all matched pieces to replace the line
ECHO “REPLACED”
ELSE
// all chunks between the matched return unchanged
ECHO $tile
ENDIF
END
IF expr then statements [ELSE else statements] ENDIF
If the expression expr is not equal to zero, then the code is executed, otherwise else
Program – atomically a set of executable commands that performs a useful task. Only programs can have settings other than “default”.
Generally speaking it can be a separate process(or thread) to run in parallel. There is no way to go from one program to another. But you can use(if declared) methods neighboring programs. Programs can call programs.
The program is a scope for all blocks.
By default, all commands are in the program with a null name (it cannot be called from other programs)
PROGRAM name arg0 [arg1 arg2 ... ] code ENDPROGRAM
name – the name of the program
arg0 – a string for processing. Becomes INCOMING_PROGRAM
code – the code of the program, including declarations.
Access code blocks using the
program_name::block_name.
BLOCK name [string]
A dependent piece of code. Edentichny two jumps goto. If you specify a string before you run the change the corresponding INCOMING after return.
PUSH [var1 var2]
To save the state of system variables. You can also add to save local variables (listing), and to explicitly exclude some system by using design !var
POP – restores the state to the moment before PUSH
A temporary variable is only available in the current scope, and is destroyed upon exit.
Is used to return the values of the time variable of the unit/program.
RETURN name
To access the variable in the calling structure used RESULT name
The value is valid before calling the next block.
During the execution of the script there are various exceptional situations that should not affect the implementation process. That there is a system exception.
exceptions: exception_name [OR exception_name ... ]
CATCH exceptions code [CATCH exceptions code ...] FINISH
Necessary to trap an error occurred at line previous to the first CATCH block.
Used in exceptional situations, if the situation in this area is expected and be processed.
TRY code ON code exceptions [ON exceptions code ...] FINISH
THROW exception
To generate the error manually
the
Content – regular expression
During the execution of paymentsa a copy of the value of the variable $name. (the short stack)
Analog define
Link to regular expression. Works inside ~ ~ \^
~\^hello\^ world~
array{tile} SPLIT(delimeter) [string]
tile SELECT(start, end) [string]
PASS(count) [&string]
CUT(count) [&string]
CUT_AFTER(index) [&string]
IMPLODE (array[, delimeter]) &string
Article based on information from habrahabr.ru
the Basic idea
There are many systems for searching of substrings meets a certain mask. Unfortunately, they lose their power as soon as you have to consider many factors. Designs become gromacki, incomprehensible and trudnootdelyaemoy.
That's what I tried to create an analogue of a – REFA. Regular expressions for everyone.
His idea in the following. As soon as the regular expression ceases to be obvious – to break it into two. The optimizer if possible, still bring him into one, thus in the rate of loss will not, but the code will become clearer.
For easy reading ge.tt/9snPkzG/v/0 (format \.odt)
Examples
Search functions C++
Find the implementation of all methods of the class dummy.
It is considered that the input is big string with all the code of the project. You can make reading from a file, but this will complicate the understanding of example.
the
PROGRAM “FindMethods”
^name^ = ~\w?[\w|\d]*~
BLOCK “FindClass” // Declaration of class
PUSH
BLOCKVAR $regexp = “class ”+%classname%+”\s*\{.*\}.*;”
MATCH $regexp
CATCH MATCH_FAIL
RETURN array() AS $list;
RETURN array() AS $result;
FINISH
BLOCKVAR $class_code = MATCHED
INCOMING = $class_code
BLOCKVAR $method = ^name^+~\w*~+^name^+~\([\^name\^\w*\^name\^\w*\,?]*\)\w*~
BLOCKVAR $declarations = array();
BLOCKVAR $realisations = array();
TRY
WHILE 1
MATCH PASS LIMIT 1 $method
IF select(0,1) INCOMING != “;”
CALL “SearchEndOfFunction” REMAINED
$realisations ADD (MATCHED + RESULT $body)
ELSE
$ADD the declarations MATCHED
ENDIF
END
ON MATCH_FAIL OR END_OF_STRING
RETURN $declarations AS $list
RETURN $realisations AS $result
FINISH
POP
ENDBLOCK
BLOCK “SearchEndOfFunction”
BLOCKVAR UINT $level = 0
MATCH ~[\{|\}]~
FOREACH ALL_MATCHED AS $t
IF $t == “{}
$level++;
ELSE
$level--;
ENDIF
IF $level == 0
BLOCKVAR STRING $ret = select(ALL_MATCHED[0], ALL_MATCHED[ITERATION]) INCOMING_BLOCK
RETURN $ret AS $body
ENDIF
END
ENDBLOCK
BLOCK “AddClassName”
MATCH PASS LIMIT 1 ^name^+”\w*”
BLOCKVAR $ret = MATCHED
$ret += “[\^name\^\w*::\w*]*”+%classname%+”\w*::\w*”
$ret += REMAIN
RETURN $ret
ENDBLOCK
BLOCK “SearchDeclaredFunctions”
BLOCKVAR $dec = %declared%
IMPLODE ($dec, “|”) $string
$string = “[“+$string+”]”
MATCH $string
BLOCVAR $realistaions = array()
FOREACH ALL_TILES as $tile
IF ITERATION % 2 == 1
IF select(0,1) INCOMING != “;”
CALL “SearchEndOfFunction” ALL_TILES[ITERATION + 1]
$realisations ADD (ALL_TILES[ITERATION] + RESULT $body)
ENDIF
ENDIF
END
RETURN $realisations AS $result
ENDBLOCK
// program code
BLOCKVAR $classname = $arg1
CALL “FindClass”
BLOCKVAR $ret = RESULT $result
BLOCKVAR $declared = RESULT $list
CALL “SearchDeclaredFunctions”
$ret ADD RESULT $result
RETURN $ret
ENDPROGRAM
The program was not very small but at least bole-less clear. A regular expression similar to this... not recommended.
Documentation
data Types
INT
Type by default. Integer. Range -2^31 to +2^31-1. The default value is 0.
LONG
Integer. Range -2^63 to +2^63-1. The default value is 0.
UINT
ULONG
STRING
String. The maximum length of a UINT. Private fields of START and COUNT.
The default value does not exist and throws an exception.
TILE
Part of a string. Private fields of START, END, COUNT, PARENT_STRING.
Predefined variables
INCOMING
Line for processing. Substituted if no variable specified.
ICOMING is synonymous INCOMING_CURRENT
the
-
the
- INCOMING_PROGRAM – came to the program the
- INCOMING_BLOCK – came into the unit the
- INCOMING_CURRENT – current line the
- INCOMING_LAST – up to last change
MATCHED
First match coming up in the last match.
ALL_MATCHED
An array with all matches of the last expression.
REMAINED
The first character after the MATCHED
ALL_REMAINED
The first character after each ALL_MATCHED
ALL_TILES
All odd is ALL_MATCHED. The rest – the missing lines in the correct order to a string.
ITERATION
The number of the iteration in the current cycle. To obtain the number of iteration in the outer – keep in a separate variable.
CALLSTACK
Call stack with parameters
QUERY_LOG
The command log driving or otherwise on-line. Definitely keep up the lines(I was post-processing). Incoming data to be stored in a single instance.
EXCEPTION_STRING
The string explains the error. Place of vozniknoveniya, incoming parameters, result.
Minimum set
Required for basic use of the system
MATCH [IGNORE {ignore_count|FIRST}] [PASS] [LIMIT {limit_count}] reg_exp [processing_string]
Check reg_exp, move the START processing_string in a MATCHED (default)
IGNORE – skip the first few matches. Default IGNORE 0
PASS – move START to the last ALL_REMAINED
LIMIT – the maximum number of matches, after which the routine terminates. The default LIMIT is 0, so it will run until the end of the file.
reg_exp – can be a regular expression between the ~ can be a variable.
processing_string – string for processing. By default, the INCOMING
Echo
ECHO string
The output of the rows in the result.
A simple example of regular expression replacement:
MATCH PASS ~some_regexp~
FOREACH ALL_TILES AS $tile
IF ITERATION % 2
// all matched pieces to replace the line
ECHO “REPLACED”
ELSE
// all chunks between the matched return unchanged
ECHO $tile
ENDIF
END
IF ELSE ENDIF
IF expr then statements [ELSE else statements] ENDIF
If the expression expr is not equal to zero, then the code is executed, otherwise else
Extended set
PROGRAM
Program – atomically a set of executable commands that performs a useful task. Only programs can have settings other than “default”.
Generally speaking it can be a separate process(or thread) to run in parallel. There is no way to go from one program to another. But you can use(if declared) methods neighboring programs. Programs can call programs.
The program is a scope for all blocks.
By default, all commands are in the program with a null name (it cannot be called from other programs)
PROGRAM name arg0 [arg1 arg2 ... ] code ENDPROGRAM
name – the name of the program
arg0 – a string for processing. Becomes INCOMING_PROGRAM
code – the code of the program, including declarations.
Access code blocks using the
program_name::block_name.
BLOCK
BLOCK name [string]
A dependent piece of code. Edentichny two jumps goto. If you specify a string before you run the change the corresponding INCOMING after return.
PUSH POP
PUSH [var1 var2]
To save the state of system variables. You can also add to save local variables (listing), and to explicitly exclude some system by using design !var
POP – restores the state to the moment before PUSH
BLOCKVAR
A temporary variable is only available in the current scope, and is destroyed upon exit.
RETURN RESULT
Is used to return the values of the time variable of the unit/program.
RETURN name
To access the variable in the calling structure used RESULT name
The value is valid before calling the next block.
error Handling
During the execution of the script there are various exceptional situations that should not affect the implementation process. That there is a system exception.
exceptions: exception_name [OR exception_name ... ]
CATCH FINISH
CATCH exceptions code [CATCH exceptions code ...] FINISH
Necessary to trap an error occurred at line previous to the first CATCH block.
Used in exceptional situations, if the situation in this area is expected and be processed.
TRY ON FINISH
TRY code ON code exceptions [ON exceptions code ...] FINISH
THROW
THROW exception
To generate the error manually
Types of errors
the
-
the
- MATCH_FAIL – failed to find any occurrences of a regexp the
- END_OF_STRING – end is reached before that managed to find(implies MATCH_FAIL) the
- WRONG_REGEXP – failed to compile regular expression the
- VARIABLE_OVERFLOW – overflow variable the
- UNSIGNED_NEGATIVE – entering a negative value into an unsigned integer the
- WRONG_STRING_INDEXES – attempt to access a string index out-of-bounds line the
- OUT_OF_ARRAY – attempt to access nonexistent array elements (outside)
Special design
~regexp~
Content – regular expression
%name%
During the execution of paymentsa a copy of the value of the variable $name. (the short stack)
#name#
Analog define
^name^
Link to regular expression. Works inside ~ ~ \^
~\^hello\^ world~
Working with strings
array{tile} SPLIT(delimeter) [string]
tile SELECT(start, end) [string]
PASS(count) [&string]
CUT(count) [&string]
CUT_AFTER(index) [&string]
IMPLODE (array[, delimeter]) &string
Комментарии
Отправить комментарий