LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Copyright  ( c )  2020 ,  Emanuel  Sprung  < emanuel . sprung @ gmail . com > 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  All  rights  reserved . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  Redistribution  and  use  in  source  and  binary  forms ,  with  or  without 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  modification ,  are  permitted  provided  that  the  following  conditions  are  met : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  1.  Redistributions  of  source  code  must  retain  the  above  copyright  notice ,  this 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *     list  of  conditions  and  the  following  disclaimer . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  2.  Redistributions  in  binary  form  must  reproduce  the  above  copyright  notice , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *     this  list  of  conditions  and  the  following  disclaimer  in  the  documentation 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *     and / or  other  materials  provided  with  the  distribution . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  THIS  SOFTWARE  IS  PROVIDED  BY  THE  COPYRIGHT  HOLDERS  AND  CONTRIBUTORS  " AS IS " 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  AND  ANY  EXPRESS  OR  IMPLIED  WARRANTIES ,  INCLUDING ,  BUT  NOT  LIMITED  TO ,  THE 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  IMPLIED  WARRANTIES  OF  MERCHANTABILITY  AND  FITNESS  FOR  A  PARTICULAR  PURPOSE  ARE 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  DISCLAIMED .  IN  NO  EVENT  SHALL  THE  COPYRIGHT  HOLDER  OR  CONTRIBUTORS  BE  LIABLE 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  FOR  ANY  DIRECT ,  INDIRECT ,  INCIDENTAL ,  SPECIAL ,  EXEMPLARY ,  OR  CONSEQUENTIAL 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  DAMAGES  ( INCLUDING ,  BUT  NOT  LIMITED  TO ,  PROCUREMENT  OF  SUBSTITUTE  GOODS  OR 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  SERVICES ;  LOSS  OF  USE ,  DATA ,  OR  PROFITS ;  OR  BUSINESS  INTERRUPTION )  HOWEVER 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  CAUSED  AND  ON  ANY  THEORY  OF  LIABILITY ,  WHETHER  IN  CONTRACT ,  STRICT  LIABILITY , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  OR  TORT  ( INCLUDING  NEGLIGENCE  OR  OTHERWISE )  ARISING  IN  ANY  WAY  OUT  OF  THE  USE 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 *  OF  THIS  SOFTWARE ,  EVEN  IF  ADVISED  OF  THE  POSSIBILITY  OF  SUCH  DAMAGE . 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# include  "RegexByteCode.h" 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# include  "AK/StringBuilder.h" 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# include  "RegexDebug.h" 
  
						 
					
						
							
								
									
										
										
										
											2021-01-17 16:06:30 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# include  <AK/Debug.h> 
  
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# include  <ctype.h> 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								namespace  regex  {  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								const  char *  OpCode : : name ( OpCodeId  opcode_id )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    switch  ( opcode_id )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# define __ENUMERATE_OPCODE(x) \ 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    case  OpCodeId : : x :          \
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  # x ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        ENUMERATE_OPCODES 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# undef __ENUMERATE_OPCODE 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    default : 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-23 20:42:32 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        VERIFY_NOT_REACHED ( ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								        return  " <Unknown> " ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								const  char *  OpCode : : name ( )  const  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  name ( opcode_id ( ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								const  char *  execution_result_name ( ExecutionResult  result )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    switch  ( result )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# define __ENUMERATE_EXECUTION_RESULT(x) \ 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    case  ExecutionResult : : x :             \
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  # x ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        ENUMERATE_EXECUTION_RESULTS 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# undef __ENUMERATE_EXECUTION_RESULT 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    default : 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-23 20:42:32 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        VERIFY_NOT_REACHED ( ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								        return  " <Unknown> " ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								const  char *  boundary_check_type_name ( BoundaryCheckType  ty )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    switch  ( ty )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# define __ENUMERATE_BOUNDARY_CHECK_TYPE(x) \ 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    case  BoundaryCheckType : : x :              \
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  # x ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        ENUMERATE_BOUNDARY_CHECK_TYPES 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# undef __ENUMERATE_BOUNDARY_CHECK_TYPE 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    default : 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-23 20:42:32 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        VERIFY_NOT_REACHED ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        return  " <Unknown> " ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								const  char *  character_compare_type_name ( CharacterCompareType  ch_compare_type )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    switch  ( ch_compare_type )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# define __ENUMERATE_CHARACTER_COMPARE_TYPE(x) \ 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    case  CharacterCompareType : : x :              \
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  # x ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        ENUMERATE_CHARACTER_COMPARE_TYPES 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# undef __ENUMERATE_CHARACTER_COMPARE_TYPE 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    default : 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-23 20:42:32 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        VERIFY_NOT_REACHED ( ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								        return  " <Unknown> " ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								static  const  char *  character_class_name ( CharClass  ch_class )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    switch  ( ch_class )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# define __ENUMERATE_CHARACTER_CLASS(x) \ 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    case  CharClass : : x :                  \
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  # x ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        ENUMERATE_CHARACTER_CLASSES 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# undef __ENUMERATE_CHARACTER_CLASS 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    default : 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-23 20:42:32 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        VERIFY_NOT_REACHED ( ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								        return  " <Unknown> " ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								HashMap < u32 ,  OwnPtr < OpCode > >  ByteCode : : s_opcodes  { } ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								ALWAYS_INLINE  OpCode *  ByteCode : : get_opcode_by_id ( OpCodeId  id )  const  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( ! s_opcodes . size ( ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        for  ( u32  i  =  ( u32 ) OpCodeId : : First ;  i  < =  ( u32 ) OpCodeId : : Last ;  + + i )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            switch  ( ( OpCodeId ) i )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            case  OpCodeId : : Exit : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                s_opcodes . set ( i ,  make < OpCode_Exit > ( * const_cast < ByteCode * > ( this ) ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            case  OpCodeId : : Jump : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                s_opcodes . set ( i ,  make < OpCode_Jump > ( * const_cast < ByteCode * > ( this ) ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            case  OpCodeId : : Compare : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                s_opcodes . set ( i ,  make < OpCode_Compare > ( * const_cast < ByteCode * > ( this ) ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            case  OpCodeId : : CheckEnd : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                s_opcodes . set ( i ,  make < OpCode_CheckEnd > ( * const_cast < ByteCode * > ( this ) ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                break ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            case  OpCodeId : : CheckBoundary : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                s_opcodes . set ( i ,  make < OpCode_CheckBoundary > ( * const_cast < ByteCode * > ( this ) ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                break ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								            case  OpCodeId : : ForkJump : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                s_opcodes . set ( i ,  make < OpCode_ForkJump > ( * const_cast < ByteCode * > ( this ) ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            case  OpCodeId : : ForkStay : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                s_opcodes . set ( i ,  make < OpCode_ForkStay > ( * const_cast < ByteCode * > ( this ) ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                break ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            case  OpCodeId : : FailForks : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                s_opcodes . set ( i ,  make < OpCode_FailForks > ( * const_cast < ByteCode * > ( this ) ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            case  OpCodeId : : Save : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                s_opcodes . set ( i ,  make < OpCode_Save > ( * const_cast < ByteCode * > ( this ) ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            case  OpCodeId : : Restore : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                s_opcodes . set ( i ,  make < OpCode_Restore > ( * const_cast < ByteCode * > ( this ) ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            case  OpCodeId : : GoBack : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                s_opcodes . set ( i ,  make < OpCode_GoBack > ( * const_cast < ByteCode * > ( this ) ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                break ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								            case  OpCodeId : : CheckBegin : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                s_opcodes . set ( i ,  make < OpCode_CheckBegin > ( * const_cast < ByteCode * > ( this ) ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            case  OpCodeId : : SaveLeftCaptureGroup : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                s_opcodes . set ( i ,  make < OpCode_SaveLeftCaptureGroup > ( * const_cast < ByteCode * > ( this ) ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            case  OpCodeId : : SaveRightCaptureGroup : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                s_opcodes . set ( i ,  make < OpCode_SaveRightCaptureGroup > ( * const_cast < ByteCode * > ( this ) ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            case  OpCodeId : : SaveLeftNamedCaptureGroup : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                s_opcodes . set ( i ,  make < OpCode_SaveLeftNamedCaptureGroup > ( * const_cast < ByteCode * > ( this ) ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            case  OpCodeId : : SaveRightNamedCaptureGroup : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                s_opcodes . set ( i ,  make < OpCode_SaveRightNamedCaptureGroup > ( * const_cast < ByteCode * > ( this ) ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( id  >  OpCodeId : : Last ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  nullptr ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  const_cast < OpCode * > ( s_opcodes . get ( ( u32 ) id ) . value ( ) ) - > set_bytecode ( * const_cast < ByteCode * > ( this ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								OpCode *  ByteCode : : get_opcode ( MatchState &  state )  const  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    OpCode *  op_code ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( state . instruction_position  > =  size ( ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        op_code  =  get_opcode_by_id ( OpCodeId : : Exit ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    }  else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        op_code  =  get_opcode_by_id ( ( OpCodeId ) at ( state . instruction_position ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( op_code ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        op_code - > set_state ( state ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  op_code ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								ALWAYS_INLINE  ExecutionResult  OpCode_Exit : : execute ( const  MatchInput &  input ,  MatchState &  state ,  MatchOutput & )  const  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( state . string_position  >  input . view . length ( )  | |  state . instruction_position  > =  m_bytecode - > size ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  ExecutionResult : : Succeeded ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  ExecutionResult : : Failed ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								ALWAYS_INLINE  ExecutionResult  OpCode_Save : : execute ( const  MatchInput &  input ,  MatchState &  state ,  MatchOutput & )  const  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    input . saved_positions . append ( state . string_position ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  ExecutionResult : : Continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								ALWAYS_INLINE  ExecutionResult  OpCode_Restore : : execute ( const  MatchInput &  input ,  MatchState &  state ,  MatchOutput & )  const  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( input . saved_positions . is_empty ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  ExecutionResult : : Failed ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    state . string_position  =  input . saved_positions . take_last ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  ExecutionResult : : Continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								ALWAYS_INLINE  ExecutionResult  OpCode_GoBack : : execute ( const  MatchInput & ,  MatchState &  state ,  MatchOutput & )  const  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( count ( )  >  state . string_position ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  ExecutionResult : : Failed_ExecuteLowPrioForks ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    state . string_position  - =  count ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  ExecutionResult : : Continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								ALWAYS_INLINE  ExecutionResult  OpCode_FailForks : : execute ( const  MatchInput &  input ,  MatchState & ,  MatchOutput & )  const  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2021-02-23 20:42:32 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    VERIFY ( count ( )  >  0 ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    input . fail_counter  + =  count ( )  -  1 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  ExecutionResult : : Failed_ExecuteLowPrioForks ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								ALWAYS_INLINE  ExecutionResult  OpCode_Jump : : execute ( const  MatchInput & ,  MatchState &  state ,  MatchOutput & )  const  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    state . instruction_position  + =  offset ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  ExecutionResult : : Continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								ALWAYS_INLINE  ExecutionResult  OpCode_ForkJump : : execute ( const  MatchInput & ,  MatchState &  state ,  MatchOutput & )  const  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    state . fork_at_position  =  state . instruction_position  +  size ( )  +  offset ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  ExecutionResult : : Fork_PrioHigh ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								ALWAYS_INLINE  ExecutionResult  OpCode_ForkStay : : execute ( const  MatchInput & ,  MatchState &  state ,  MatchOutput & )  const  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    state . fork_at_position  =  state . instruction_position  +  size ( )  +  offset ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  ExecutionResult : : Fork_PrioLow ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								ALWAYS_INLINE  ExecutionResult  OpCode_CheckBegin : : execute ( const  MatchInput &  input ,  MatchState &  state ,  MatchOutput & )  const  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( 0  = =  state . string_position  & &  ( input . regex_options  &  AllFlags : : MatchNotBeginOfLine ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-19 18:57:39 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        return  ExecutionResult : : Failed_ExecuteLowPrioForks ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( ( 0  = =  state . string_position  & &  ! ( input . regex_options  &  AllFlags : : MatchNotBeginOfLine ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        | |  ( 0  ! =  state . string_position  & &  ( input . regex_options  &  AllFlags : : MatchNotBeginOfLine ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        | |  ( 0  = =  state . string_position  & &  ( input . regex_options  &  AllFlags : : Global ) ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  ExecutionResult : : Continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-11-19 18:57:39 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    return  ExecutionResult : : Failed_ExecuteLowPrioForks ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								ALWAYS_INLINE  ExecutionResult  OpCode_CheckBoundary : : execute ( const  MatchInput &  input ,  MatchState &  state ,  MatchOutput & )  const  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    auto  isword  =  [ ] ( auto  ch )  {  return  isalnum ( ch )  | |  ch  = =  ' _ ' ;  } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    auto  is_word_boundary  =  [ & ]  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( state . string_position  = =  input . view . length ( ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( state . string_position  >  0  & &  isword ( input . view [ state . string_position  -  1 ] ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( state . string_position  = =  0 )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( isword ( input . view [ 0 ] ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  ! ! ( isword ( input . view [ state . string_position ] )  ^  isword ( input . view [ state . string_position  -  1 ] ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    switch  ( type ( ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    case  BoundaryCheckType : : Word :  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( is_word_boundary ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  ExecutionResult : : Continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  ExecutionResult : : Failed_ExecuteLowPrioForks ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    case  BoundaryCheckType : : NonWord :  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( ! is_word_boundary ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  ExecutionResult : : Continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  ExecutionResult : : Failed_ExecuteLowPrioForks ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-23 20:42:32 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    VERIFY_NOT_REACHED ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								ALWAYS_INLINE  ExecutionResult  OpCode_CheckEnd : : execute ( const  MatchInput &  input ,  MatchState &  state ,  MatchOutput & )  const  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( state . string_position  = =  input . view . length ( )  & &  ( input . regex_options  &  AllFlags : : MatchNotEndOfLine ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-19 18:57:39 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        return  ExecutionResult : : Failed_ExecuteLowPrioForks ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( ( state . string_position  = =  input . view . length ( )  & &  ! ( input . regex_options  &  AllFlags : : MatchNotEndOfLine ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        | |  ( state . string_position  ! =  input . view . length ( )  & &  ( input . regex_options  &  AllFlags : : MatchNotEndOfLine  | |  input . regex_options  &  AllFlags : : MatchNotBeginOfLine ) ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-19 18:57:39 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        return  ExecutionResult : : Continue ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-11-19 18:57:39 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    return  ExecutionResult : : Failed_ExecuteLowPrioForks ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								ALWAYS_INLINE  ExecutionResult  OpCode_SaveLeftCaptureGroup : : execute ( const  MatchInput &  input ,  MatchState &  state ,  MatchOutput &  output )  const  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( input . match_index  > =  output . capture_group_matches . size ( ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        output . capture_group_matches . ensure_capacity ( input . match_index ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        auto  capacity  =  output . capture_group_matches . capacity ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        for  ( size_t  i  =  output . capture_group_matches . size ( ) ;  i  < =  capacity ;  + + i ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            output . capture_group_matches . empend ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( id ( )  > =  output . capture_group_matches . at ( input . match_index ) . size ( ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        output . capture_group_matches . at ( input . match_index ) . ensure_capacity ( id ( ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        auto  capacity  =  output . capture_group_matches . at ( input . match_index ) . capacity ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        for  ( size_t  i  =  output . capture_group_matches . at ( input . match_index ) . size ( ) ;  i  < =  capacity ;  + + i ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            output . capture_group_matches . at ( input . match_index ) . empend ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    output . capture_group_matches . at ( input . match_index ) . at ( id ( ) ) . left_column  =  state . string_position ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  ExecutionResult : : Continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								ALWAYS_INLINE  ExecutionResult  OpCode_SaveRightCaptureGroup : : execute ( const  MatchInput &  input ,  MatchState &  state ,  MatchOutput &  output )  const  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    auto &  match  =  output . capture_group_matches . at ( input . match_index ) . at ( id ( ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    auto  start_position  =  match . left_column ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    auto  length  =  state . string_position  -  start_position ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( start_position  <  match . column ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  ExecutionResult : : Continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-02-23 20:42:32 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    VERIFY ( start_position  +  length  < =  input . view . length ( ) ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    auto  view  =  input . view . substring_view ( start_position ,  length ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( input . regex_options  &  AllFlags : : StringCopyMatches )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        match  =  {  view . to_string ( ) ,  input . line ,  start_position ,  input . global_offset  +  start_position  } ;  // create a copy of the original string
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    }  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        match  =  {  view ,  input . line ,  start_position ,  input . global_offset  +  start_position  } ;  // take view to original string
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  ExecutionResult : : Continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								ALWAYS_INLINE  ExecutionResult  OpCode_SaveLeftNamedCaptureGroup : : execute ( const  MatchInput &  input ,  MatchState &  state ,  MatchOutput &  output )  const  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( input . match_index  > =  output . named_capture_group_matches . size ( ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        output . named_capture_group_matches . ensure_capacity ( input . match_index ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        auto  capacity  =  output . named_capture_group_matches . capacity ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        for  ( size_t  i  =  output . named_capture_group_matches . size ( ) ;  i  < =  capacity ;  + + i ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            output . named_capture_group_matches . empend ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    output . named_capture_group_matches . at ( input . match_index ) . ensure ( name ( ) ) . column  =  state . string_position ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  ExecutionResult : : Continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								ALWAYS_INLINE  ExecutionResult  OpCode_SaveRightNamedCaptureGroup : : execute ( const  MatchInput &  input ,  MatchState &  state ,  MatchOutput &  output )  const  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    StringView  capture_group_name  =  name ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( output . named_capture_group_matches . at ( input . match_index ) . contains ( capture_group_name ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        auto  start_position  =  output . named_capture_group_matches . at ( input . match_index ) . ensure ( capture_group_name ) . column ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        auto  length  =  state . string_position  -  start_position ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        auto &  map  =  output . named_capture_group_matches . at ( input . match_index ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-01-23 23:59:27 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        if  constexpr  ( REGEX_DEBUG )  { 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-23 20:42:32 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            VERIFY ( start_position  +  length  < =  input . view . length ( ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-01-17 16:06:30 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            dbgln ( " Save named capture group with name={} and content='{}' " ,  capture_group_name ,  input . view . substring_view ( start_position ,  length ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-02-23 20:42:32 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        VERIFY ( start_position  +  length  < =  input . view . length ( ) ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								        auto  view  =  input . view . substring_view ( start_position ,  length ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( input . regex_options  &  AllFlags : : StringCopyMatches )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            map . set ( capture_group_name ,  {  view . to_string ( ) ,  input . line ,  start_position ,  input . global_offset  +  start_position  } ) ;  // create a copy of the original string
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        }  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            map . set ( capture_group_name ,  {  view ,  input . line ,  start_position ,  input . global_offset  +  start_position  } ) ;  // take view to original string
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    }  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        fprintf ( stderr ,  " Didn't find corresponding capture group match for name=%s, match_index=%lu \n " ,  capture_group_name . to_string ( ) . characters ( ) ,  input . match_index ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  ExecutionResult : : Continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								ALWAYS_INLINE  ExecutionResult  OpCode_Compare : : execute ( const  MatchInput &  input ,  MatchState &  state ,  MatchOutput &  output )  const  
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    bool  inverse  {  false  } ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    bool  temporary_inverse  {  false  } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    bool  reset_temp_inverse  {  false  } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    auto  current_inversion_state  =  [ & ] ( )  - >  bool  {  return  temporary_inverse  ^  inverse ;  } ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    size_t  string_position  =  state . string_position ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    bool  inverse_matched  {  false  } ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-04-01 18:30:47 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    bool  had_zero_length_match  {  false  } ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    size_t  offset  {  state . instruction_position  +  3  } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    for  ( size_t  i  =  0 ;  i  <  arguments_count ( ) ;  + + i )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( state . string_position  >  string_position ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        if  ( reset_temp_inverse )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            reset_temp_inverse  =  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            temporary_inverse  =  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        }  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            reset_temp_inverse  =  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								        auto  compare_type  =  ( CharacterCompareType ) m_bytecode - > at ( offset + + ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( compare_type  = =  CharacterCompareType : : Inverse ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            inverse  =  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        else  if  ( compare_type  = =  CharacterCompareType : : TemporaryInverse )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            // If "TemporaryInverse" is given, negate the current inversion state only for the next opcode.
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            // it follows that this cannot be the last compare element.
 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-23 20:42:32 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            VERIFY ( i  ! =  arguments_count ( )  -  1 ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            temporary_inverse  =  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            reset_temp_inverse  =  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        }  else  if  ( compare_type  = =  CharacterCompareType : : Char )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            u32  ch  =  m_bytecode - > at ( offset + + ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            // We want to compare a string that is longer or equal in length to the available string
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( input . view . length ( )  -  state . string_position  <  1 ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                return  ExecutionResult : : Failed_ExecuteLowPrioForks ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            compare_char ( input ,  state ,  ch ,  current_inversion_state ( ) ,  inverse_matched ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        }  else  if  ( compare_type  = =  CharacterCompareType : : AnyChar )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            // We want to compare a string that is definitely longer than the available string
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( input . view . length ( )  -  state . string_position  <  1 ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                return  ExecutionResult : : Failed_ExecuteLowPrioForks ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-02-23 20:42:32 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            VERIFY ( ! current_inversion_state ( ) ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								            + + state . string_position ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        }  else  if  ( compare_type  = =  CharacterCompareType : : String )  { 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-23 20:42:32 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            VERIFY ( ! current_inversion_state ( ) ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-12-06 17:02:03 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            const  auto &  length  =  m_bytecode - > at ( offset + + ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            StringBuilder  str_builder ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            for  ( size_t  i  =  0 ;  i  <  length ;  + + i ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                str_builder . append ( m_bytecode - > at ( offset + + ) ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            // We want to compare a string that is definitely longer than the available string
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( input . view . length ( )  -  state . string_position  <  length ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                return  ExecutionResult : : Failed_ExecuteLowPrioForks ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-04-01 18:30:47 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            if  ( ! compare_string ( input ,  state ,  str_builder . string_view ( ) . characters_without_null_termination ( ) ,  length ,  had_zero_length_match ) ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								                return  ExecutionResult : : Failed_ExecuteLowPrioForks ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        }  else  if  ( compare_type  = =  CharacterCompareType : : CharClass )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( input . view . length ( )  -  state . string_position  <  1 ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                return  ExecutionResult : : Failed_ExecuteLowPrioForks ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            auto  character_class  =  ( CharClass ) m_bytecode - > at ( offset + + ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-06-09 00:15:09 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            auto  ch  =  input . view [ state . string_position ] ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            compare_character_class ( input ,  state ,  character_class ,  ch ,  current_inversion_state ( ) ,  inverse_matched ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        }  else  if  ( compare_type  = =  CharacterCompareType : : CharRange )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            auto  value  =  ( CharRange ) m_bytecode - > at ( offset + + ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            auto  from  =  value . from ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            auto  to  =  value . to ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            auto  ch  =  input . view [ state . string_position ] ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            compare_character_range ( input ,  state ,  from ,  to ,  ch ,  current_inversion_state ( ) ,  inverse_matched ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        }  else  if  ( compare_type  = =  CharacterCompareType : : Reference )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            auto  reference_number  =  ( size_t ) m_bytecode - > at ( offset + + ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            auto &  groups  =  output . capture_group_matches . at ( input . match_index ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( groups . size ( )  < =  reference_number ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                return  ExecutionResult : : Failed_ExecuteLowPrioForks ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            auto  str  =  groups . at ( reference_number ) . view ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            // We want to compare a string that is definitely longer than the available string
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( input . view . length ( )  -  state . string_position  <  str . length ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                return  ExecutionResult : : Failed_ExecuteLowPrioForks ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-04-01 18:30:47 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            if  ( ! compare_string ( input ,  state ,  str . characters_without_null_termination ( ) ,  str . length ( ) ,  had_zero_length_match ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                return  ExecutionResult : : Failed_ExecuteLowPrioForks ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        }  else  if  ( compare_type  = =  CharacterCompareType : : NamedReference )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            auto  ptr  =  ( const  char * ) m_bytecode - > at ( offset + + ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            auto  length  =  ( size_t ) m_bytecode - > at ( offset + + ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            StringView  name  {  ptr ,  length  } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            auto  group  =  output . named_capture_group_matches . at ( input . match_index ) . get ( name ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( ! group . has_value ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                return  ExecutionResult : : Failed_ExecuteLowPrioForks ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            auto  str  =  group . value ( ) . view ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            // We want to compare a string that is definitely longer than the available string
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( input . view . length ( )  -  state . string_position  <  str . length ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                return  ExecutionResult : : Failed_ExecuteLowPrioForks ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-04-01 18:30:47 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            if  ( ! compare_string ( input ,  state ,  str . characters_without_null_termination ( ) ,  str . length ( ) ,  had_zero_length_match ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                return  ExecutionResult : : Failed_ExecuteLowPrioForks ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        }  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            fprintf ( stderr ,  " Undefined comparison: %i \n " ,  ( int ) compare_type ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-23 20:42:32 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            VERIFY_NOT_REACHED ( ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								            break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( current_inversion_state ( )  & &  ! inverse_matched ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								        + + state . string_position ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-04-01 18:30:47 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( ( ! had_zero_length_match  & &  string_position  = =  state . string_position )  | |  state . string_position  >  input . view . length ( ) ) 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								        return  ExecutionResult : : Failed_ExecuteLowPrioForks ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  ExecutionResult : : Continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-06-09 00:15:09 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								ALWAYS_INLINE  void  OpCode_Compare : : compare_char ( const  MatchInput &  input ,  MatchState &  state ,  u32  ch1 ,  bool  inverse ,  bool &  inverse_matched )  
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2020-06-09 00:15:09 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    u32  ch2  =  input . view [ state . string_position ] ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( input . regex_options  &  AllFlags : : Insensitive )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        ch1  =  tolower ( ch1 ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        ch2  =  tolower ( ch2 ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( ch1  = =  ch2 )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( inverse ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            inverse_matched  =  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            + + state . string_position ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-04-01 18:30:47 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								ALWAYS_INLINE  bool  OpCode_Compare : : compare_string ( const  MatchInput &  input ,  MatchState &  state ,  const  char *  str ,  size_t  length ,  bool &  had_zero_length_match )  
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2020-06-09 00:15:09 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( input . view . is_u8_view ( ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        auto  str_view1  =  StringView ( str ,  length ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        auto  str_view2  =  StringView ( & input . view . u8view ( ) [ state . string_position ] ,  length ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        String  str1 ,  str2 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( input . regex_options  &  AllFlags : : Insensitive )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            str1  =  str_view1 . to_string ( ) . to_lowercase ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            str2  =  str_view2 . to_string ( ) . to_lowercase ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            str_view1  =  str1 . view ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            str_view2  =  str2 . view ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( str_view1  = =  str_view2 )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            state . string_position  + =  length ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-04-01 18:30:47 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            if  ( length  = =  0 ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                had_zero_length_match  =  true ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-06-09 00:15:09 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-06-09 00:15:09 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    return  false ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-11-19 18:45:36 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								ALWAYS_INLINE  void  OpCode_Compare : : compare_character_class ( const  MatchInput &  input ,  MatchState &  state ,  CharClass  character_class ,  u32  ch ,  bool  inverse ,  bool &  inverse_matched )  
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    switch  ( character_class )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    case  CharClass : : Alnum : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( isalnum ( ch ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( inverse ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                inverse_matched  =  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                + + state . string_position ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    case  CharClass : : Alpha : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( isalpha ( ch ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            + + state . string_position ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    case  CharClass : : Blank : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( ch  = =  '   '  | |  ch  = =  ' \t ' )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( inverse ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                inverse_matched  =  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                + + state . string_position ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    case  CharClass : : Cntrl : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( iscntrl ( ch ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( inverse ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                inverse_matched  =  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                + + state . string_position ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    case  CharClass : : Digit : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( isdigit ( ch ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( inverse ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                inverse_matched  =  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                + + state . string_position ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    case  CharClass : : Graph : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( isgraph ( ch ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( inverse ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                inverse_matched  =  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                + + state . string_position ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    case  CharClass : : Lower : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( islower ( ch )  | |  ( ( input . regex_options  &  AllFlags : : Insensitive )  & &  isupper ( ch ) ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( inverse ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                inverse_matched  =  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                + + state . string_position ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    case  CharClass : : Print : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( isprint ( ch ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( inverse ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                inverse_matched  =  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                + + state . string_position ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    case  CharClass : : Punct : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( ispunct ( ch ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( inverse ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                inverse_matched  =  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                + + state . string_position ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    case  CharClass : : Space : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( isspace ( ch ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( inverse ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                inverse_matched  =  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                + + state . string_position ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    case  CharClass : : Upper : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( isupper ( ch )  | |  ( ( input . regex_options  &  AllFlags : : Insensitive )  & &  islower ( ch ) ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( inverse ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                inverse_matched  =  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                + + state . string_position ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        break ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    case  CharClass : : Word : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( isalnum ( ch )  | |  ch  = =  ' _ ' )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( inverse ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                inverse_matched  =  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                + + state . string_position ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        break ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								    case  CharClass : : Xdigit : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( isxdigit ( ch ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( inverse ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                inverse_matched  =  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                + + state . string_position ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-11-19 18:45:36 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								ALWAYS_INLINE  void  OpCode_Compare : : compare_character_range ( const  MatchInput &  input ,  MatchState &  state ,  u32  from ,  u32  to ,  u32  ch ,  bool  inverse ,  bool &  inverse_matched )  
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( input . regex_options  &  AllFlags : : Insensitive )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        from  =  tolower ( from ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        to  =  tolower ( to ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        ch  =  tolower ( ch ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( ch  > =  from  & &  ch  < =  to )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( inverse ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            inverse_matched  =  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            + + state . string_position ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								const  String  OpCode_Compare : : arguments_string ( )  const  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  String : : format ( " argc=%lu, args=%lu  " ,  arguments_count ( ) ,  arguments_size ( ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								const  Vector < String >  OpCode_Compare : : variable_arguments_to_string ( Optional < MatchInput >  input )  const  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    Vector < String >  result ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    size_t  offset  {  state ( ) . instruction_position  +  3  } ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-06-09 00:15:09 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    RegexStringView  view  =  ( ( input . has_value ( ) )  ?  input . value ( ) . view  :  nullptr ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    for  ( size_t  i  =  0 ;  i  <  arguments_count ( ) ;  + + i )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        auto  compare_type  =  ( CharacterCompareType ) m_bytecode - > at ( offset + + ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        result . empend ( String : : formatted ( " type={} [{}] " ,  ( size_t ) compare_type ,  character_compare_type_name ( compare_type ) ) ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-11-28 23:33:45 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        auto  compared_against_string_start_offset  =  state ( ) . string_position  >  0  ?  state ( ) . string_position  -  1  :  state ( ) . string_position ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								        if  ( compare_type  = =  CharacterCompareType : : Char )  { 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            auto  ch  =  m_bytecode - > at ( offset + + ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            auto  is_ascii  =  isascii ( ch )  & &  isprint ( ch ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( is_ascii ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                result . empend ( String : : formatted ( " value='{:c}' " ,  static_cast < char > ( ch ) ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                result . empend ( String : : formatted ( " value={:x} " ,  ch ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( ! view . is_null ( )  & &  view . length ( )  >  state ( ) . string_position )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                if  ( is_ascii )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    result . empend ( String : : formatted ( 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                        " compare against: '{}' " , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                        view . substring_view ( compared_against_string_start_offset ,  state ( ) . string_position  >  view . length ( )  ?  0  :  1 ) . to_string ( ) ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                }  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    auto  str  =  view . substring_view ( compared_against_string_start_offset ,  state ( ) . string_position  >  view . length ( )  ?  0  :  1 ) . to_string ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    u8  buf [ 8 ]  {  0  } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    __builtin_memcpy ( buf ,  str . characters ( ) ,  min ( str . length ( ) ,  sizeof ( buf ) ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    result . empend ( String : : formatted ( " compare against: {:x},{:x},{:x},{:x},{:x},{:x},{:x},{:x} " , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                        buf [ 0 ] ,  buf [ 1 ] ,  buf [ 2 ] ,  buf [ 3 ] ,  buf [ 4 ] ,  buf [ 5 ] ,  buf [ 6 ] ,  buf [ 7 ] ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            } 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        }  else  if  ( compare_type  = =  CharacterCompareType : : NamedReference )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            auto  ptr  =  ( const  char * ) m_bytecode - > at ( offset + + ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            auto  length  =  m_bytecode - > at ( offset + + ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            result . empend ( String : : formatted ( " name='{}' " ,  StringView  {  ptr ,  ( size_t ) length  } ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        }  else  if  ( compare_type  = =  CharacterCompareType : : Reference )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            auto  ref  =  m_bytecode - > at ( offset + + ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-01-12 01:01:33 +11:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            result . empend ( String : : formatted ( " number={} " ,  ref ) ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								        }  else  if  ( compare_type  = =  CharacterCompareType : : String )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            auto &  length  =  m_bytecode - > at ( offset + + ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-12-06 17:02:03 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            StringBuilder  str_builder ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            for  ( size_t  i  =  0 ;  i  <  length ;  + + i ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                str_builder . append ( m_bytecode - > at ( offset + + ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            result . empend ( String : : formatted ( " value= \" {} \" " ,  str_builder . string_view ( ) . substring_view ( 0 ,  length ) ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-28 23:33:45 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            if  ( ! view . is_null ( )  & &  view . length ( )  >  state ( ) . string_position ) 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                result . empend ( String : : formatted ( 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    " compare against:  \" {} \" " , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    input . value ( ) . view . substring_view ( compared_against_string_start_offset ,  compared_against_string_start_offset  +  length  >  view . length ( )  ?  0  :  length ) . to_string ( ) ) ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								        }  else  if  ( compare_type  = =  CharacterCompareType : : CharClass )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            auto  character_class  =  ( CharClass ) m_bytecode - > at ( offset + + ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            result . empend ( String : : formatted ( " ch_class={} [{}] " ,  ( size_t ) character_class ,  character_class_name ( character_class ) ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-28 23:33:45 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            if  ( ! view . is_null ( )  & &  view . length ( )  >  state ( ) . string_position ) 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                result . empend ( String : : formatted ( 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    " compare against: '{}' " , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    input . value ( ) . view . substring_view ( compared_against_string_start_offset ,  state ( ) . string_position  >  view . length ( )  ?  0  :  1 ) . to_string ( ) ) ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								        }  else  if  ( compare_type  = =  CharacterCompareType : : CharRange )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            auto  value  =  ( CharRange ) m_bytecode - > at ( offset + + ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            result . empend ( String : : formatted ( " ch_range='{:c}'-'{:c}' " ,  value . from ,  value . to ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-28 23:33:45 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            if  ( ! view . is_null ( )  & &  view . length ( )  >  state ( ) . string_position ) 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                result . empend ( String : : formatted ( 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    " compare against: '{}' " , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    input . value ( ) . view . substring_view ( compared_against_string_start_offset ,  state ( ) . string_position  >  view . length ( )  ?  0  :  1 ) . to_string ( ) ) ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  result ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}