2020-06-09 17:16:04 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								/*
  
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								 *  Copyright  ( c )  2020 ,  Emanuel  Sprung  < emanuel . sprung @ gmail . com > 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  Copyright  ( c )  2020 - 2021 ,  the  SerenityOS  developers . 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								 * 
							 
						 
					
						
							
								
									
										
										
										
											2021-04-22 01:24:48 -07:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								 *  SPDX - License - Identifier :  BSD - 2 - Clause 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								 */ 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# include  "RegexParser.h" 
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								# include  "RegexDebug.h" 
  
						 
					
						
							
								
									
										
										
										
											2021-12-21 18:11:00 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# include  <AK/AnyOf.h> 
  
						 
					
						
							
								
									
										
										
										
											2023-12-16 17:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# include  <AK/ByteString.h> 
  
						 
					
						
							
								
									
										
										
										
											2021-07-22 09:25:58 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# include  <AK/CharacterTypes.h> 
  
						 
					
						
							
								
									
										
										
										
											2023-01-02 21:07:18 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# include  <AK/Debug.h> 
  
						 
					
						
							
								
									
										
										
										
											2021-08-18 07:22:52 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# include  <AK/GenericLexer.h> 
  
						 
					
						
							
								
									
										
										
										
											2022-01-21 14:30:47 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# include  <AK/ScopeGuard.h> 
  
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								# include  <AK/StringBuilder.h> 
  
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# include  <AK/StringUtils.h> 
  
						 
					
						
							
								
									
										
										
										
											2021-08-30 23:24:46 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# include  <AK/TemporaryChange.h> 
  
						 
					
						
							
								
									
										
										
										
											2021-07-20 22:33:00 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# include  <AK/Utf16View.h> 
  
						 
					
						
							
								
									
										
										
										
											2021-07-29 14:18:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								# include  <LibUnicode/CharacterTypes.h> 
  
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								namespace  regex  {  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-07-13 02:13:17 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  constexpr  size_t  s_maximum_repetition_count  =  1024  *  1024 ;  
						 
					
						
							
								
									
										
										
										
											2021-08-12 11:02:46 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  constexpr  u64  s_ecma262_maximum_repetition_count  =  ( 1ull  < <  53 )  -  1 ;  
						 
					
						
							
								
									
										
										
										
											2021-08-11 09:39:10 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  constexpr  auto  s_alphabetic_characters  =  " ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz " sv ;  
						 
					
						
							
								
									
										
										
										
											2021-08-11 16:41:57 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  constexpr  auto  s_decimal_characters  =  " 0123456789 " sv ;  
						 
					
						
							
								
									
										
										
										
											2021-07-13 02:13:17 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-08-11 17:52:57 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								static  constexpr  StringView  identity_escape_characters ( bool  unicode ,  bool  browser_extended )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( unicode ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  " ^$ \\ .*+?()[]{}|/ " sv ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( browser_extended ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  " ^$ \\ .*+?()[| " sv ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  " ^$ \\ .*+?()[]{}| " sv ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								ALWAYS_INLINE  bool  Parser : : set_error ( Error  error )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( m_parser_state . error  = =  Error : : NoError )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        m_parser_state . error  =  error ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        m_parser_state . error_token  =  m_parser_state . current_token ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  false ;  // always return false, that eases the API usage (return set_error(...)) :^)
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								ALWAYS_INLINE  bool  Parser : : done ( )  const  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  match ( TokenType : : Eof ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								ALWAYS_INLINE  bool  Parser : : match ( TokenType  type )  const  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  m_parser_state . current_token . type ( )  = =  type ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								ALWAYS_INLINE  bool  Parser : : match ( char  ch )  const  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  m_parser_state . current_token . type ( )  = =  TokenType : : Char  & &  m_parser_state . current_token . value ( ) . length ( )  = =  1  & &  m_parser_state . current_token . value ( ) [ 0 ]  = =  ch ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								ALWAYS_INLINE  Token  Parser : : consume ( )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    auto  old_token  =  m_parser_state . current_token ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    m_parser_state . current_token  =  m_parser_state . lexer . next ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  old_token ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								ALWAYS_INLINE  Token  Parser : : consume ( TokenType  type ,  Error  error )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( m_parser_state . current_token . type ( )  ! =  type )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        set_error ( error ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-06-02 18:31:43 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        dbgln_if ( REGEX_DEBUG ,  " [PARSER] Error: Unexpected token {}. Expected: {} " ,  m_parser_state . current_token . name ( ) ,  Token : : name ( type ) ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2023-12-16 17:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								ALWAYS_INLINE  bool  Parser : : consume ( ByteString  const &  str )  
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    size_t  potentially_go_back  {  1  } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    for  ( auto  ch  :  str )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( match ( TokenType : : Char ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( m_parser_state . current_token . value ( ) [ 0 ]  ! =  ch )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                m_parser_state . lexer . back ( potentially_go_back ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                m_parser_state . current_token  =  m_parser_state . lexer . next ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        }  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            m_parser_state . lexer . back ( potentially_go_back ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            m_parser_state . current_token  =  m_parser_state . lexer . next ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        consume ( TokenType : : Char ,  Error : : NoError ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        + + potentially_go_back ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-08-18 14:43:11 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								ALWAYS_INLINE  Optional < u32 >  Parser : : consume_escaped_code_point ( bool  unicode )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( match ( TokenType : : LeftCurly )  & &  ! unicode )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        // In non-Unicode mode, this should be parsed as a repetition symbol (repeating the 'u').
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  static_cast < u32 > ( ' u ' ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    m_parser_state . lexer . retreat ( 2  +  ! done ( ) ) ;  // Go back to just before '\u' (+1 char, because we will have consumed an extra character)
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( auto  code_point_or_error  =  m_parser_state . lexer . consume_escaped_code_point ( unicode ) ;  ! code_point_or_error . is_error ( ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        m_parser_state . current_token  =  m_parser_state . lexer . next ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  code_point_or_error . value ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( ! unicode )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        // '\u' is allowed in non-unicode mode, just matches 'u'.
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  static_cast < u32 > ( ' u ' ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    set_error ( Error : : InvalidPattern ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  { } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								ALWAYS_INLINE  bool  Parser : : try_skip ( StringView  str )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( str . starts_with ( m_parser_state . current_token . value ( ) ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        str  =  str . substring_view ( m_parser_state . current_token . value ( ) . length ( ) ,  str . length ( )  -  m_parser_state . current_token . value ( ) . length ( ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    size_t  potentially_go_back  {  0  } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    for  ( auto  ch  :  str )  { 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-18 14:10:08 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        if  ( ! m_parser_state . lexer . consume_specific ( ch ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            m_parser_state . lexer . back ( potentially_go_back ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        + + potentially_go_back ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    m_parser_state . current_token  =  m_parser_state . lexer . next ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								ALWAYS_INLINE  bool  Parser : : lookahead_any ( StringView  str )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2021-12-21 18:11:00 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    return  AK : : any_of ( str ,  [ this ] ( auto  ch )  {  return  match ( ch ) ;  } ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
											
												LibRegex: Treat pattern string characters as unsigned
For example, consider the following pattern:
    new RegExp('\ud834\udf06', 'u')
With this pattern, the regex parser should insert the UTF-8 encoded
bytes 0xf0, 0x9d, 0x8c, and 0x86. However, because these characters are
currently treated as normal char types, they have a negative value since
they are all > 0x7f. Then, due to sign extension, when these characters
are cast to u64, the sign bit is preserved. The result is that these
bytes are inserted as 0xfffffffffffffff0, 0xffffffffffffff9d, etc.
Fortunately, there are only a few places where we insert bytecode with
the raw characters. In these places, be sure to treat the bytes as u8
before they are cast to u64.
											 
										 
										
											2021-08-20 10:22:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								ALWAYS_INLINE  unsigned  char  Parser : : skip ( )  
						 
					
						
							
								
									
										
										
										
											2020-11-28 12:27:59 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
											
												LibRegex: Treat pattern string characters as unsigned
For example, consider the following pattern:
    new RegExp('\ud834\udf06', 'u')
With this pattern, the regex parser should insert the UTF-8 encoded
bytes 0xf0, 0x9d, 0x8c, and 0x86. However, because these characters are
currently treated as normal char types, they have a negative value since
they are all > 0x7f. Then, due to sign extension, when these characters
are cast to u64, the sign bit is preserved. The result is that these
bytes are inserted as 0xfffffffffffffff0, 0xffffffffffffff9d, etc.
Fortunately, there are only a few places where we insert bytecode with
the raw characters. In these places, be sure to treat the bytes as u8
before they are cast to u64.
											 
										 
										
											2021-08-20 10:22:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    unsigned  char  ch ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-28 12:27:59 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( m_parser_state . current_token . value ( ) . length ( )  = =  1 )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        ch  =  m_parser_state . current_token . value ( ) [ 0 ] ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    }  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        m_parser_state . lexer . back ( m_parser_state . current_token . value ( ) . length ( ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-18 14:10:08 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        ch  =  m_parser_state . lexer . consume ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-28 12:27:59 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    m_parser_state . current_token  =  m_parser_state . lexer . next ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  ch ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								ALWAYS_INLINE  void  Parser : : back ( size_t  count )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    m_parser_state . lexer . back ( count ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    m_parser_state . current_token  =  m_parser_state . lexer . next ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								ALWAYS_INLINE  void  Parser : : reset ( )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    m_parser_state . bytecode . clear ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    m_parser_state . lexer . reset ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    m_parser_state . current_token  =  m_parser_state . lexer . next ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    m_parser_state . error  =  Error : : NoError ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 20:18:40 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    m_parser_state . error_token  =  {  TokenType : : Eof ,  0 ,  { }  } ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-04-05 06:25:36 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    m_parser_state . capture_group_minimum_lengths . clear ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    m_parser_state . capture_groups_count  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    m_parser_state . named_capture_groups_count  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    m_parser_state . named_capture_groups . clear ( ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								Parser : : Result  Parser : : parse ( Optional < AllOptions >  regex_options )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2023-07-14 09:15:35 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    ByteCode : : reset_checkpoint_serial_id ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								    reset ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( regex_options . has_value ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        m_parser_state . regex_options  =  regex_options . value ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( parse_internal ( m_parser_state . bytecode ,  m_parser_state . match_length_minimum ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        consume ( TokenType : : Eof ,  Error : : InvalidPattern ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        set_error ( Error : : InvalidPattern ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-05-31 15:08:22 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    dbgln_if ( REGEX_DEBUG ,  " [PARSER] Produced bytecode with {} entries (opcodes + arguments) " ,  m_parser_state . bytecode . size ( ) ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								    return  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        move ( m_parser_state . bytecode ) , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        move ( m_parser_state . capture_groups_count ) , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        move ( m_parser_state . named_capture_groups_count ) , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        move ( m_parser_state . match_length_minimum ) , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        move ( m_parser_state . error ) , 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-18 17:17:18 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        move ( m_parser_state . error_token ) , 
							 
						 
					
						
							
								
									
										
										
										
											2022-01-25 13:30:27 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        m_parser_state . named_capture_groups . keys ( ) , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        m_parser_state . regex_options , 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								    } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-11-29 19:10:46 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								ALWAYS_INLINE  bool  Parser : : match_ordinary_characters ( )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    // NOTE: This method must not be called during bracket and repetition parsing!
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    // FIXME: Add assertion for that?
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    auto  type  =  m_parser_state . current_token . type ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2023-06-12 20:00:19 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    return  ( ( type  = =  TokenType : : Char  & &  m_parser_state . current_token . value ( )  ! =  " \\ " sv )  // NOTE: Backslash will only be matched as 'char' if it does not form a valid escape.
 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-29 19:10:46 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        | |  type  = =  TokenType : : Comma 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        | |  type  = =  TokenType : : Slash 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        | |  type  = =  TokenType : : EqualSign 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        | |  type  = =  TokenType : : HyphenMinus 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        | |  type  = =  TokenType : : Colon ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								// =============================
  
						 
					
						
							
								
									
										
										
										
											2021-07-10 13:18:10 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								// Abstract Posix Parser
  
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								// =============================
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-07-10 13:18:10 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								ALWAYS_INLINE  bool  AbstractPosixParser : : parse_bracket_expression ( Vector < CompareTypeAndValuePair > &  values ,  size_t &  match_length_minimum )  
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2021-07-10 13:18:10 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    for  ( ;  ! done ( ) ; )  { 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								        if  ( match ( TokenType : : HyphenMinus ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( values . is_empty ( )  | |  ( values . size ( )  = =  1  & &  values . last ( ) . type  = =  CharacterCompareType : : Inverse ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                // first in the bracket expression
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                values . append ( {  CharacterCompareType : : Char ,  ( ByteCodeValueType ) ' - '  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            }  else  if  ( match ( TokenType : : RightBracket ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                // Last in the bracket expression
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                values . append ( {  CharacterCompareType : : Char ,  ( ByteCodeValueType ) ' - '  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            }  else  if  ( values . last ( ) . type  = =  CharacterCompareType : : Char )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                values . append ( {  CharacterCompareType : : RangeExpressionDummy ,  0  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-07-13 02:26:52 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                if  ( done ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    return  set_error ( Error : : MismatchingBracket ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								                if  ( match ( TokenType : : HyphenMinus ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    // Valid range, add ordinary character
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    values . append ( {  CharacterCompareType : : Char ,  ( ByteCodeValueType ) ' - '  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            }  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                return  set_error ( Error : : InvalidRange ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        }  else  if  ( match ( TokenType : : Circumflex ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            auto  t  =  consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( values . is_empty ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                values . append ( {  CharacterCompareType : : Inverse ,  0  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                values . append ( {  CharacterCompareType : : Char ,  ( ByteCodeValueType ) * t . value ( ) . characters_without_null_termination ( )  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        }  else  if  ( match ( TokenType : : LeftBracket ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( match ( TokenType : : Period ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                // FIXME: Parse collating element, this is needed when we have locale support
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                //        This could have impact on length parameter, I guess.
 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-13 02:16:09 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                set_error ( Error : : InvalidCollationElement ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                consume ( TokenType : : Period ,  Error : : InvalidCollationElement ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                consume ( TokenType : : RightBracket ,  Error : : MismatchingBracket ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            }  else  if  ( match ( TokenType : : EqualSign ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                // FIXME: Parse collating element, this is needed when we have locale support
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                //        This could have impact on length parameter, I guess.
 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-13 02:16:09 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                set_error ( Error : : InvalidCollationElement ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                consume ( TokenType : : EqualSign ,  Error : : InvalidCollationElement ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                consume ( TokenType : : RightBracket ,  Error : : MismatchingBracket ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            }  else  if  ( match ( TokenType : : Colon ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                CharClass  ch_class ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                // parse character class
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                if  ( match ( TokenType : : Char ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    if  ( consume ( " alnum " ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                        ch_class  =  CharClass : : Alnum ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    else  if  ( consume ( " alpha " ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                        ch_class  =  CharClass : : Alpha ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    else  if  ( consume ( " blank " ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                        ch_class  =  CharClass : : Blank ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    else  if  ( consume ( " cntrl " ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                        ch_class  =  CharClass : : Cntrl ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    else  if  ( consume ( " digit " ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                        ch_class  =  CharClass : : Digit ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    else  if  ( consume ( " graph " ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                        ch_class  =  CharClass : : Graph ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    else  if  ( consume ( " lower " ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                        ch_class  =  CharClass : : Lower ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    else  if  ( consume ( " print " ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                        ch_class  =  CharClass : : Print ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    else  if  ( consume ( " punct " ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                        ch_class  =  CharClass : : Punct ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    else  if  ( consume ( " space " ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                        ch_class  =  CharClass : : Space ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    else  if  ( consume ( " upper " ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                        ch_class  =  CharClass : : Upper ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    else  if  ( consume ( " xdigit " ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                        ch_class  =  CharClass : : Xdigit ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                        return  set_error ( Error : : InvalidCharacterClass ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    values . append ( {  CharacterCompareType : : CharClass ,  ( ByteCodeValueType ) ch_class  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                }  else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    return  set_error ( Error : : InvalidCharacterClass ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                // FIXME: we do not support locale specific character classes until locales are implemented
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                consume ( TokenType : : Colon ,  Error : : InvalidCharacterClass ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                consume ( TokenType : : RightBracket ,  Error : : MismatchingBracket ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            }  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                return  set_error ( Error : : MismatchingBracket ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        }  else  if  ( match ( TokenType : : RightBracket ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( values . is_empty ( )  | |  ( values . size ( )  = =  1  & &  values . last ( ) . type  = =  CharacterCompareType : : Inverse ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                // handle bracket as ordinary character
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                values . append ( {  CharacterCompareType : : Char ,  ( ByteCodeValueType ) * consume ( ) . value ( ) . characters_without_null_termination ( )  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            }  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                // closing bracket expression
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            } 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-28 21:06:40 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        }  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            values . append ( {  CharacterCompareType : : Char ,  ( ByteCodeValueType ) skip ( )  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        // check if range expression has to be completed...
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( values . size ( )  > =  3  & &  values . at ( values . size ( )  -  2 ) . type  = =  CharacterCompareType : : RangeExpressionDummy )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( values . last ( ) . type  ! =  CharacterCompareType : : Char ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                return  set_error ( Error : : InvalidRange ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            auto  value2  =  values . take_last ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            values . take_last ( ) ;  // RangeExpressionDummy
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            auto  value1  =  values . take_last ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-11-19 18:45:36 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            values . append ( {  CharacterCompareType : : CharRange ,  static_cast < ByteCodeValueType > ( CharRange  {  ( u32 ) value1 . value ,  ( u32 ) value2 . value  } )  } ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-07-13 02:14:09 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( ! values . is_empty ( ) )  { 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								        match_length_minimum  =  1 ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-13 02:14:09 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        if  ( values . first ( ) . type  = =  CharacterCompareType : : Inverse ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            match_length_minimum  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-07-10 13:18:10 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								// =============================
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								// PosixBasic Parser
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								// =============================
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								bool  PosixBasicParser : : parse_internal ( ByteCode &  stack ,  size_t &  match_length_minimum )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  parse_root ( stack ,  match_length_minimum ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								bool  PosixBasicParser : : parse_root ( ByteCode &  bytecode ,  size_t &  match_length_minimum )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    // basic_reg_exp : L_ANCHOR? RE_expression R_ANCHOR?
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( match ( TokenType : : Circumflex ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        bytecode . empend ( ( ByteCodeValueType ) OpCodeId : : CheckBegin ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( ! parse_re_expression ( bytecode ,  match_length_minimum ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( match ( TokenType : : Dollar ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        bytecode . empend ( ( ByteCodeValueType ) OpCodeId : : CheckEnd ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  ! has_error ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								bool  PosixBasicParser : : parse_re_expression ( ByteCode &  bytecode ,  size_t &  match_length_minimum )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    // RE_expression : RE_expression? simple_RE
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    while  ( ! done ( ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( ! parse_simple_re ( bytecode ,  match_length_minimum ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  ! has_error ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								bool  PosixBasicParser : : parse_simple_re ( ByteCode &  bytecode ,  size_t &  match_length_minimum )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    // simple_RE : nondupl_RE RE_dupl_symbol?
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    ByteCode  simple_re_bytecode ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    size_t  re_match_length_minimum  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( ! parse_nonduplicating_re ( simple_re_bytecode ,  re_match_length_minimum ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    // RE_dupl_symbol : '*' | Back_open_brace DUP_COUNT (',' DUP_COUNT?)? Back_close_brace
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( match ( TokenType : : Asterisk ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        ByteCode : : transform_bytecode_repetition_any ( simple_re_bytecode ,  true ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    }  else  if  ( try_skip ( " \\ { " sv ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-10 13:18:10 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        auto  read_number  =  [ & ] ( )  - >  Optional < size_t >  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( ! match ( TokenType : : Char ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                return  { } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            size_t  value  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            while  ( match ( TokenType : : Char ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                auto  c  =  m_parser_state . current_token . value ( ) . substring_view ( 0 ,  1 ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2023-12-23 15:59:14 +13:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                auto  c_value  =  c . to_number < unsigned > ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-10 13:18:10 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                if  ( ! c_value . has_value ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                value  * =  10 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                value  + =  * c_value ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  value ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        size_t  min_limit ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        Optional < size_t >  max_limit ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( auto  limit  =  read_number ( ) ;  ! limit . has_value ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  set_error ( Error : : InvalidRepetitionMarker ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            min_limit  =  * limit ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( match ( TokenType : : Comma ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            max_limit  =  read_number ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        if  ( ! try_skip ( " \\ } " sv ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-10 13:18:10 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            return  set_error ( Error : : MismatchingBrace ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-07-13 02:13:17 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        if  ( max_limit . value_or ( min_limit )  <  min_limit ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  set_error ( Error : : InvalidBraceContent ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( min_limit  >  s_maximum_repetition_count  | |  ( max_limit . has_value ( )  & &  * max_limit  >  s_maximum_repetition_count ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  set_error ( Error : : InvalidBraceContent ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-08-31 22:27:08 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        auto  min_repetition_mark_id  =  m_parser_state . repetition_mark_count + + ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        auto  max_repetition_mark_id  =  m_parser_state . repetition_mark_count + + ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        ByteCode : : transform_bytecode_repetition_min_max ( simple_re_bytecode ,  min_limit ,  max_limit ,  min_repetition_mark_id ,  max_repetition_mark_id ,  true ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-10 13:18:10 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        match_length_minimum  + =  re_match_length_minimum  *  min_limit ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    }  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        match_length_minimum  + =  re_match_length_minimum ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    bytecode . extend ( move ( simple_re_bytecode ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								bool  PosixBasicParser : : parse_nonduplicating_re ( ByteCode &  bytecode ,  size_t &  match_length_minimum )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    // nondupl_RE : one_char_or_coll_elem_RE | Back_open_paren RE_expression Back_close_paren | BACKREF
 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( try_skip ( " \\ ( " sv ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-30 23:24:46 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        TemporaryChange  change  {  m_current_capture_group_depth ,  m_current_capture_group_depth  +  1  } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        // Max number of addressable capture groups is 10, let's just be lenient
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        // and accept 20; anything past that is probably a silly pattern anyway.
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( m_current_capture_group_depth  >  20 ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  set_error ( Error : : InvalidPattern ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-10 13:18:10 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        ByteCode  capture_bytecode ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        size_t  capture_length_minimum  =  0 ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-10 23:07:22 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        auto  capture_group_index  =  + + m_parser_state . capture_groups_count ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-10 13:18:10 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( ! parse_re_expression ( capture_bytecode ,  capture_length_minimum ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        if  ( ! try_skip ( " \\ ) " sv ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-10 13:18:10 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            return  set_error ( Error : : MismatchingParen ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        match_length_minimum  + =  capture_length_minimum ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( capture_group_index  < =  number_of_addressable_capture_groups )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            m_capture_group_minimum_lengths [ capture_group_index  -  1 ]  =  capture_length_minimum ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            m_capture_group_seen [ capture_group_index  -  1 ]  =  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            bytecode . insert_bytecode_group_capture_left ( capture_group_index ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        bytecode . extend ( capture_bytecode ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( capture_group_index  < =  number_of_addressable_capture_groups ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            bytecode . insert_bytecode_group_capture_right ( capture_group_index ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    for  ( size_t  i  =  1 ;  i  <  10 ;  + + i )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        char  backref_name [ 2 ]  {  ' \\ ' ,  ' 0 '  } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        backref_name [ 1 ]  + =  i ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( try_skip ( {  backref_name ,  2  } ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( ! m_capture_group_seen [ i  -  1 ] ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                return  set_error ( Error : : InvalidNumber ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            match_length_minimum  + =  m_capture_group_minimum_lengths [ i  -  1 ] ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            bytecode . insert_bytecode_compare_values ( {  {  CharacterCompareType : : Reference ,  ( ByteCodeValueType ) i  }  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  parse_one_char_or_collation_element ( bytecode ,  match_length_minimum ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								bool  PosixBasicParser : : parse_one_char_or_collation_element ( ByteCode &  bytecode ,  size_t &  match_length_minimum )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    // one_char_or_coll_elem_RE : ORD_CHAR | QUOTED_CHAR | '.' | bracket_expression
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( match ( TokenType : : Period ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        bytecode . insert_bytecode_compare_values ( {  {  CharacterCompareType : : AnyChar ,  0  }  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        match_length_minimum  + =  1 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-11-13 03:18:40 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    // Dollars are special if at the end of a pattern.
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( match ( TokenType : : Dollar ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        // If we are at the end of a pattern, emit an end check instruction.
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( match ( TokenType : : Eof ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            bytecode . empend ( ( ByteCodeValueType ) OpCodeId : : CheckEnd ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        // We are not at the end of the string, so we should roll back and continue as normal.
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        back ( 2 ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2023-06-12 20:00:19 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( match ( TokenType : : Char ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        auto  ch  =  consume ( ) . value ( ) [ 0 ] ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( ch  = =  ' \\ ' )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( m_parser_state . regex_options . has_flag_set ( AllFlags : : Extra ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                return  set_error ( Error : : InvalidPattern ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            // This was \<ORD_CHAR>, the spec does not define any behaviour for this but glibc regex ignores it - and so do we.
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        bytecode . insert_bytecode_compare_values ( {  {  CharacterCompareType : : Char ,  ( ByteCodeValueType ) ch  }  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        match_length_minimum  + =  1 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-07-10 13:18:10 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    // None of these are special in BRE.
 
							 
						 
					
						
							
								
									
										
										
										
											2023-06-12 20:00:19 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( match ( TokenType : : Questionmark )  | |  match ( TokenType : : RightParen )  | |  match ( TokenType : : HyphenMinus ) 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-10 13:18:10 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        | |  match ( TokenType : : Circumflex )  | |  match ( TokenType : : RightCurly )  | |  match ( TokenType : : Comma )  | |  match ( TokenType : : Colon ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        | |  match ( TokenType : : Dollar )  | |  match ( TokenType : : EqualSign )  | |  match ( TokenType : : LeftCurly )  | |  match ( TokenType : : LeftParen ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        | |  match ( TokenType : : Pipe )  | |  match ( TokenType : : Slash )  | |  match ( TokenType : : RightBracket )  | |  match ( TokenType : : RightParen ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        auto  ch  =  consume ( ) . value ( ) [ 0 ] ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        bytecode . insert_bytecode_compare_values ( {  {  CharacterCompareType : : Char ,  ( ByteCodeValueType ) ch  }  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        match_length_minimum  + =  1 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( match ( TokenType : : EscapeSequence ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( m_parser_state . current_token . value ( ) . is_one_of ( " \\ ) " sv ,  " \\ } " sv ,  " \\ ( " sv ,  " \\ { " sv ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        auto  ch  =  consume ( ) . value ( ) [ 1 ] ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        bytecode . insert_bytecode_compare_values ( {  {  CharacterCompareType : : Char ,  ( ByteCodeValueType ) ch  }  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        match_length_minimum  + =  1 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    Vector < CompareTypeAndValuePair >  values ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    size_t  bracket_minimum_length  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-07-10 22:02:18 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( match ( TokenType : : LeftBracket ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( ! AbstractPosixParser : : parse_bracket_expression ( values ,  bracket_minimum_length ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        consume ( TokenType : : RightBracket ,  Error : : MismatchingBracket ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-07-13 02:16:19 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        if  ( ! has_error ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            bytecode . insert_bytecode_compare_values ( move ( values ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-10 22:02:18 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        match_length_minimum  + =  bracket_minimum_length ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  ! has_error ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  set_error ( Error : : InvalidPattern ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-10 13:18:10 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								// =============================
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								// PosixExtended Parser
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								// =============================
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								bool  PosixExtendedParser : : parse_internal ( ByteCode &  stack ,  size_t &  match_length_minimum )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  parse_root ( stack ,  match_length_minimum ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								ALWAYS_INLINE  bool  PosixExtendedParser : : match_repetition_symbol ( )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    auto  type  =  m_parser_state . current_token . type ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  ( type  = =  TokenType : : Asterisk 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        | |  type  = =  TokenType : : Plus 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        | |  type  = =  TokenType : : Questionmark 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        | |  type  = =  TokenType : : LeftCurly ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								ALWAYS_INLINE  bool  PosixExtendedParser : : parse_repetition_symbol ( ByteCode &  bytecode_to_repeat ,  size_t &  match_length_minimum )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( match ( TokenType : : LeftCurly ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        StringBuilder  number_builder ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        while  ( match ( TokenType : : Char ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            number_builder . append ( consume ( ) . value ( ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2023-12-23 15:59:14 +13:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        auto  maybe_minimum  =  number_builder . to_byte_string ( ) . to_number < unsigned > ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-10 13:18:10 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        if  ( ! maybe_minimum . has_value ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  set_error ( Error : : InvalidBraceContent ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        auto  minimum  =  maybe_minimum . value ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        match_length_minimum  * =  minimum ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-07-13 02:13:17 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        if  ( minimum  >  s_maximum_repetition_count ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  set_error ( Error : : InvalidBraceContent ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-07-10 13:18:10 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        if  ( match ( TokenType : : Comma ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        }  else  { 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-12 11:02:46 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            auto  repetition_mark_id  =  m_parser_state . repetition_mark_count + + ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-07-10 13:18:10 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            ByteCode  bytecode ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-12 11:02:46 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            bytecode . insert_bytecode_repetition_n ( bytecode_to_repeat ,  minimum ,  repetition_mark_id ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-10 13:18:10 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            bytecode_to_repeat  =  move ( bytecode ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            consume ( TokenType : : RightCurly ,  Error : : MismatchingBrace ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  ! has_error ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-08-12 11:02:46 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        Optional < u32 >  maybe_maximum  { } ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-10 13:18:10 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        number_builder . clear ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        while  ( match ( TokenType : : Char ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            number_builder . append ( consume ( ) . value ( ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( ! number_builder . is_empty ( ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2023-12-23 15:59:14 +13:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            auto  value  =  number_builder . to_byte_string ( ) . to_number < unsigned > ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-13 02:13:17 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            if  ( ! value . has_value ( )  | |  minimum  >  value . value ( )  | |  * value  >  s_maximum_repetition_count ) 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-10 13:18:10 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                return  set_error ( Error : : InvalidBraceContent ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            maybe_maximum  =  value . value ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-08-31 22:27:08 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        auto  min_repetition_mark_id  =  m_parser_state . repetition_mark_count + + ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        auto  max_repetition_mark_id  =  m_parser_state . repetition_mark_count + + ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        ByteCode : : transform_bytecode_repetition_min_max ( bytecode_to_repeat ,  minimum ,  maybe_maximum ,  min_repetition_mark_id ,  max_repetition_mark_id ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-10 13:18:10 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        consume ( TokenType : : RightCurly ,  Error : : MismatchingBrace ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  ! has_error ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-12-21 17:54:45 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( match ( TokenType : : Plus ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-10 13:18:10 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        bool  nongreedy  =  match ( TokenType : : Questionmark ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( nongreedy ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        // Note: don't touch match_length_minimum, it's already correct
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        ByteCode : : transform_bytecode_repetition_min_one ( bytecode_to_repeat ,  ! nongreedy ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  ! has_error ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-12-21 17:54:45 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( match ( TokenType : : Asterisk ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-10 13:18:10 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        match_length_minimum  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        bool  nongreedy  =  match ( TokenType : : Questionmark ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( nongreedy ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        ByteCode : : transform_bytecode_repetition_any ( bytecode_to_repeat ,  ! nongreedy ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  ! has_error ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-12-21 17:54:45 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( match ( TokenType : : Questionmark ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-10 13:18:10 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        match_length_minimum  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        bool  nongreedy  =  match ( TokenType : : Questionmark ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( nongreedy ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        ByteCode : : transform_bytecode_repetition_zero_or_one ( bytecode_to_repeat ,  ! nongreedy ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  ! has_error ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								ALWAYS_INLINE  bool  PosixExtendedParser : : parse_bracket_expression ( ByteCode &  stack ,  size_t &  match_length_minimum )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    Vector < CompareTypeAndValuePair >  values ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( ! AbstractPosixParser : : parse_bracket_expression ( values ,  match_length_minimum ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-07-13 02:16:19 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( ! has_error ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        stack . insert_bytecode_compare_values ( move ( values ) ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  ! has_error ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								ALWAYS_INLINE  bool  PosixExtendedParser : : parse_sub_expression ( ByteCode &  stack ,  size_t &  match_length_minimum )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    ByteCode  bytecode ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    size_t  length  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    bool  should_parse_repetition_symbol  {  false  } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    for  ( ; ; )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( match_ordinary_characters ( ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            Token  start_token  =  m_parser_state . current_token ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            Token  last_token  =  m_parser_state . current_token ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            for  ( ; ; )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                if  ( ! match_ordinary_characters ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                + + length ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                last_token  =  consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( length  >  1 )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                // last character is inserted into 'bytecode' for duplication symbol handling
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                auto  new_length  =  length  -  ( ( match_repetition_symbol ( )  & &  length  >  1 )  ?  1  :  0 ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-12-06 17:02:03 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                stack . insert_bytecode_compare_string ( {  start_token . value ( ) . characters_without_null_termination ( ) ,  new_length  } ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								            } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( ( match_repetition_symbol ( )  & &  length  >  1 )  | |  length  = =  1 )  // Create own compare opcode for last character before duplication symbol
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                bytecode . insert_bytecode_compare_values ( {  {  CharacterCompareType : : Char ,  ( ByteCodeValueType ) last_token . value ( ) . characters_without_null_termination ( ) [ 0 ]  }  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            should_parse_repetition_symbol  =  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2023-06-12 20:00:19 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        if  ( m_parser_state . current_token . value ( )  = =  " \\ " sv )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( m_parser_state . regex_options . has_flag_set ( AllFlags : : Extra ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                return  set_error ( Error : : InvalidPattern ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								        if  ( match_repetition_symbol ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  set_error ( Error : : InvalidRepetitionMarker ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( match ( TokenType : : Period ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            length  =  1 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            bytecode . insert_bytecode_compare_values ( {  {  CharacterCompareType : : AnyChar ,  0  }  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            should_parse_repetition_symbol  =  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( match ( TokenType : : EscapeSequence ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            length  =  1 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            Token  t  =  consume ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-05-31 15:08:22 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            dbgln_if ( REGEX_DEBUG ,  " [PARSER] EscapeSequence with substring {} " ,  t . value ( ) ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            bytecode . insert_bytecode_compare_values ( {  {  CharacterCompareType : : Char ,  ( u32 ) t . value ( ) . characters_without_null_termination ( ) [ 1 ]  }  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            should_parse_repetition_symbol  =  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( match ( TokenType : : LeftBracket ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            ByteCode  sub_ops ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( ! parse_bracket_expression ( sub_ops ,  length )  | |  ! sub_ops . size ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                return  set_error ( Error : : InvalidBracketContent ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-06-12 13:24:45 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            bytecode . extend ( move ( sub_ops ) ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            consume ( TokenType : : RightBracket ,  Error : : MismatchingBracket ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            should_parse_repetition_symbol  =  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( match ( TokenType : : RightBracket ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  set_error ( Error : : MismatchingBracket ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( match ( TokenType : : RightCurly ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  set_error ( Error : : MismatchingBrace ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( match ( TokenType : : Circumflex ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            bytecode . empend ( ( ByteCodeValueType ) OpCodeId : : CheckBegin ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( match ( TokenType : : Dollar ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            bytecode . empend ( ( ByteCodeValueType ) OpCodeId : : CheckEnd ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( match ( TokenType : : RightParen ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( match ( TokenType : : LeftParen ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-13 02:16:09 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            enum  GroupMode  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                Normal , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                Lookahead , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                NegativeLookahead , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                Lookbehind , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                NegativeLookbehind , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            }  group_mode  {  Normal  } ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								            consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            Optional < StringView >  capture_group_name ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            bool  prevent_capture_group  =  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( match ( TokenType : : Questionmark ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                if  ( match ( TokenType : : Colon ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    prevent_capture_group  =  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                }  else  if  ( consume ( " < " ) )  {  // named capturing group
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    Token  start_token  =  m_parser_state . current_token ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    Token  last_token  =  m_parser_state . current_token ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    size_t  capture_group_name_length  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    for  ( ; ; )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                        if  ( ! match_ordinary_characters ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                            return  set_error ( Error : : InvalidNameForCaptureGroup ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                        if  ( match ( TokenType : : Char )  & &  m_parser_state . current_token . value ( ) [ 0 ]  = =  ' > ' )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                            consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                            break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                        + + capture_group_name_length ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                        last_token  =  consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    capture_group_name  =  StringView ( start_token . value ( ) . characters_without_null_termination ( ) ,  capture_group_name_length ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-09-07 14:33:06 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                    + + m_parser_state . named_capture_groups_count ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                }  else  if  ( match ( TokenType : : EqualSign ) )  {  // positive lookahead
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    consume ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-13 02:16:09 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                    group_mode  =  Lookahead ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								                }  else  if  ( consume ( " ! " ) )  {  // negative lookahead
 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-13 02:16:09 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                    group_mode  =  NegativeLookahead ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								                }  else  if  ( consume ( " < " ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    if  ( match ( TokenType : : EqualSign ) )  {  // positive lookbehind
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                        consume ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-13 02:16:09 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                        group_mode  =  Lookbehind ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								                    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    if  ( consume ( " ! " ) )  // negative lookbehind
 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-13 02:16:09 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                        group_mode  =  NegativeLookbehind ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								                }  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    return  set_error ( Error : : InvalidRepetitionMarker ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-09-07 14:33:06 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            auto  current_capture_group  =  m_parser_state . capture_groups_count ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( ! ( m_parser_state . regex_options  &  AllFlags : : SkipSubExprResults  | |  prevent_capture_group ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                bytecode . insert_bytecode_group_capture_left ( current_capture_group ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                m_parser_state . capture_groups_count + + ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            } 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            ByteCode  capture_group_bytecode ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( ! parse_root ( capture_group_bytecode ,  length ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                return  set_error ( Error : : InvalidPattern ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-07-13 02:16:09 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            switch  ( group_mode )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            case  Normal : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                bytecode . extend ( move ( capture_group_bytecode ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            case  Lookahead : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                bytecode . insert_bytecode_lookaround ( move ( capture_group_bytecode ) ,  ByteCode : : LookAroundType : : LookAhead ,  length ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            case  NegativeLookahead : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                bytecode . insert_bytecode_lookaround ( move ( capture_group_bytecode ) ,  ByteCode : : LookAroundType : : NegatedLookAhead ,  length ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            case  Lookbehind : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                bytecode . insert_bytecode_lookaround ( move ( capture_group_bytecode ) ,  ByteCode : : LookAroundType : : LookBehind ,  length ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            case  NegativeLookbehind : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                bytecode . insert_bytecode_lookaround ( move ( capture_group_bytecode ) ,  ByteCode : : LookAroundType : : NegatedLookBehind ,  length ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            } 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            consume ( TokenType : : RightParen ,  Error : : MismatchingParen ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( ! ( m_parser_state . regex_options  &  AllFlags : : SkipSubExprResults  | |  prevent_capture_group ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2021-09-07 14:33:06 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                if  ( capture_group_name . has_value ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    bytecode . insert_bytecode_group_capture_right ( current_capture_group ,  capture_group_name . value ( ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    bytecode . insert_bytecode_group_capture_right ( current_capture_group ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								            } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            should_parse_repetition_symbol  =  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( match_repetition_symbol ( ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( should_parse_repetition_symbol ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            parse_repetition_symbol ( bytecode ,  length ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  set_error ( Error : : InvalidRepetitionMarker ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-06-12 13:24:45 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    stack . extend ( move ( bytecode ) ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								    match_length_minimum  + =  length ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								bool  PosixExtendedParser : : parse_root ( ByteCode &  stack ,  size_t &  match_length_minimum )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    ByteCode  bytecode_left ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    size_t  match_length_minimum_left  {  0  } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( match_repetition_symbol ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  set_error ( Error : : InvalidRepetitionMarker ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    for  ( ; ; )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( ! parse_sub_expression ( bytecode_left ,  match_length_minimum_left ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( match ( TokenType : : Pipe ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            ByteCode  bytecode_right ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            size_t  match_length_minimum_right  {  0  } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( ! parse_root ( bytecode_right ,  match_length_minimum_right )  | |  bytecode_right . is_empty ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                return  set_error ( Error : : InvalidPattern ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            ByteCode  new_bytecode ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            new_bytecode . insert_bytecode_alternation ( move ( bytecode_left ) ,  move ( bytecode_right ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            bytecode_left  =  move ( new_bytecode ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            match_length_minimum_left  =  min ( match_length_minimum_right ,  match_length_minimum_left ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( bytecode_left . is_empty ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        set_error ( Error : : EmptySubExpression ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-06-12 13:24:45 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    stack . extend ( move ( bytecode_left ) ) ; 
							 
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								    match_length_minimum  =  match_length_minimum_left ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  ! has_error ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								// =============================
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								// ECMA262 Parser
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								// =============================
  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								bool  ECMA262Parser : : parse_internal ( ByteCode &  stack ,  size_t &  match_length_minimum )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    auto  unicode  =  m_parser_state . regex_options . has_flag_set ( AllFlags : : Unicode ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:22:07 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    auto  unicode_sets  =  m_parser_state . regex_options . has_flag_set ( AllFlags : : UnicodeSets ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( unicode  | |  unicode_sets )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  parse_pattern ( stack ,  match_length_minimum ,  {  . unicode  =  true ,  . named  =  true ,  . unicode_sets  =  unicode_sets  } ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-12-21 17:54:45 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-12-21 17:54:45 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    ByteCode  new_stack ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    size_t  new_match_length  =  0 ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:22:07 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    auto  res  =  parse_pattern ( new_stack ,  new_match_length ,  {  . unicode  =  false ,  . named  =  false ,  . unicode_sets  =  false  } ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-12-21 17:54:45 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( m_parser_state . named_capture_groups_count  >  0 )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        reset ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:22:07 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        return  parse_pattern ( stack ,  match_length_minimum ,  {  . unicode  =  false ,  . named  =  true ,  . unicode_sets  =  false  } ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
									
										
										
										
											2021-12-21 17:54:45 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( ! res ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    stack . extend ( new_stack ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    match_length_minimum  =  new_match_length ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  res ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								bool  ECMA262Parser : : parse_pattern ( ByteCode &  stack ,  size_t &  match_length_minimum ,  ParseFlags  flags )  
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    return  parse_disjunction ( stack ,  match_length_minimum ,  flags ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								bool  ECMA262Parser : : parse_disjunction ( ByteCode &  stack ,  size_t &  match_length_minimum ,  ParseFlags  flags )  
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2022-02-19 17:34:55 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    size_t  total_match_length_minimum  =  NumericLimits < size_t > : : max ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    Vector < ByteCode >  alternatives ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-02-20 03:25:21 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    while  ( true )  { 
							 
						 
					
						
							
								
									
										
										
										
											2022-02-19 17:34:55 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        ByteCode  alternative_stack ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        size_t  alternative_minimum_length  =  0 ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        auto  alt_ok  =  parse_alternative ( alternative_stack ,  alternative_minimum_length ,  flags ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-02-19 17:34:55 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        if  ( ! alt_ok ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  false ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-02-19 17:34:55 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        alternatives . append ( move ( alternative_stack ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        total_match_length_minimum  =  min ( alternative_minimum_length ,  total_match_length_minimum ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-02-19 17:34:55 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        if  ( ! match ( TokenType : : Pipe ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-02-20 03:25:21 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    Optimizer : : append_alternation ( stack ,  alternatives . span ( ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-02-19 17:34:55 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    match_length_minimum  =  total_match_length_minimum ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  true ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								bool  ECMA262Parser : : parse_alternative ( ByteCode &  stack ,  size_t &  match_length_minimum ,  ParseFlags  flags )  
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    for  ( ; ; )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( match ( TokenType : : Eof ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        if  ( parse_term ( stack ,  match_length_minimum ,  flags ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  ! has_error ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								bool  ECMA262Parser : : parse_term ( ByteCode &  stack ,  size_t &  match_length_minimum ,  ParseFlags  flags )  
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( parse_assertion ( stack ,  match_length_minimum ,  flags ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    ByteCode  atom_stack ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    size_t  minimum_atom_length  =  0 ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    auto  parse_with_quantifier  =  [ & ]  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        bool  did_parse_one  =  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( m_should_use_browser_extended_grammar ) 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            did_parse_one  =  parse_extended_atom ( atom_stack ,  minimum_atom_length ,  flags ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( ! did_parse_one ) 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            did_parse_one  =  parse_atom ( atom_stack ,  minimum_atom_length ,  flags ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( ! did_parse_one ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  false ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        VERIFY ( did_parse_one ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        return  parse_quantifier ( atom_stack ,  minimum_atom_length ,  flags ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( ! parse_with_quantifier ( ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-06-12 13:24:45 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    stack . extend ( move ( atom_stack ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    match_length_minimum  + =  minimum_atom_length ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								bool  ECMA262Parser : : parse_assertion ( ByteCode &  stack ,  [[maybe_unused]]  size_t &  match_length_minimum ,  ParseFlags  flags )  
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( match ( TokenType : : Circumflex ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        stack . empend ( ( ByteCodeValueType ) OpCodeId : : CheckBegin ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( match ( TokenType : : Dollar ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        stack . empend ( ( ByteCodeValueType ) OpCodeId : : CheckEnd ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( try_skip ( " \\ b " sv ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        stack . insert_bytecode_check_boundary ( BoundaryCheckType : : Word ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( try_skip ( " \\ B " sv ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        stack . insert_bytecode_check_boundary ( BoundaryCheckType : : NonWord ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( match ( TokenType : : LeftParen ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        if  ( ! try_skip ( " (? " sv ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-12-28 09:07:17 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        if  ( done ( ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            set_error ( Error : : InvalidCaptureGroup ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        ByteCode  assertion_stack ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        size_t  length_dummy  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        bool  should_parse_forward_assertion  =  ! m_should_use_browser_extended_grammar  | |  flags . unicode ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        if  ( should_parse_forward_assertion  & &  try_skip ( " = " sv ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            if  ( ! parse_inner_disjunction ( assertion_stack ,  length_dummy ,  flags ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            stack . insert_bytecode_lookaround ( move ( assertion_stack ) ,  ByteCode : : LookAroundType : : LookAhead ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        if  ( should_parse_forward_assertion  & &  try_skip ( " ! " sv ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2022-01-21 14:30:47 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            enter_capture_group_scope ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            ScopeGuard  quit_scope  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                [ this ]  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    exit_capture_group_scope ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            } ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            if  ( ! parse_inner_disjunction ( assertion_stack ,  length_dummy ,  flags ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            stack . insert_bytecode_lookaround ( move ( assertion_stack ) ,  ByteCode : : LookAroundType : : NegatedLookAhead ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-01-21 14:30:47 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            clear_all_capture_groups_in_scope ( stack ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        if  ( m_should_use_browser_extended_grammar )  { 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            if  ( ! flags . unicode )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                if  ( parse_quantifiable_assertion ( assertion_stack ,  match_length_minimum ,  flags ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    if  ( ! parse_quantifier ( assertion_stack ,  match_length_minimum ,  flags ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2022-02-19 17:18:23 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                        return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-06-12 13:24:45 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                    stack . extend ( move ( assertion_stack ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                    return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        if  ( try_skip ( " <= " sv ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            if  ( ! parse_inner_disjunction ( assertion_stack ,  length_dummy ,  flags ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            // FIXME: Somehow ensure that this assertion regexp has a fixed length.
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            stack . insert_bytecode_lookaround ( move ( assertion_stack ) ,  ByteCode : : LookAroundType : : LookBehind ,  length_dummy ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        if  ( try_skip ( " <! " sv ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2022-01-21 14:30:47 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            enter_capture_group_scope ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            ScopeGuard  quit_scope  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                [ this ]  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    exit_capture_group_scope ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            } ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            if  ( ! parse_inner_disjunction ( assertion_stack ,  length_dummy ,  flags ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            stack . insert_bytecode_lookaround ( move ( assertion_stack ) ,  ByteCode : : LookAroundType : : NegatedLookBehind ,  length_dummy ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-01-21 14:30:47 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            clear_all_capture_groups_in_scope ( stack ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        // If none of these matched, put the '(?' back.
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        m_parser_state . lexer . back ( 3 ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        m_parser_state . current_token  =  m_parser_state . lexer . next ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								bool  ECMA262Parser : : parse_inner_disjunction ( ByteCode &  bytecode_stack ,  size_t &  length ,  ParseFlags  flags )  
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    auto  disjunction_ok  =  parse_disjunction ( bytecode_stack ,  length ,  flags ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( ! disjunction_ok ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    consume ( TokenType : : RightParen ,  Error : : MismatchingParen ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								bool  ECMA262Parser : : parse_quantifiable_assertion ( ByteCode &  stack ,  size_t & ,  ParseFlags  flags )  
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    VERIFY ( m_should_use_browser_extended_grammar ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    ByteCode  assertion_stack ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    size_t  match_length_minimum  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( try_skip ( " = " sv ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:22:07 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        if  ( ! parse_inner_disjunction ( assertion_stack ,  match_length_minimum ,  {  . unicode  =  false ,  . named  =  flags . named ,  . unicode_sets  =  false  } ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        stack . insert_bytecode_lookaround ( move ( assertion_stack ) ,  ByteCode : : LookAroundType : : LookAhead ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( try_skip ( " ! " sv ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2022-01-21 14:30:47 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        enter_capture_group_scope ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        ScopeGuard  quit_scope  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            [ this ]  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                exit_capture_group_scope ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:22:07 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        if  ( ! parse_inner_disjunction ( assertion_stack ,  match_length_minimum ,  {  . unicode  =  false ,  . named  =  flags . named ,  . unicode_sets  =  false  } ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        stack . insert_bytecode_lookaround ( move ( assertion_stack ) ,  ByteCode : : LookAroundType : : NegatedLookAhead ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-01-21 14:30:47 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        clear_all_capture_groups_in_scope ( stack ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-08-11 09:58:08 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								StringView  ECMA262Parser : : read_digits_as_string ( ReadDigitsInitialZeroState  initial_zero ,  bool  hex ,  int  max_count ,  int  min_count )  
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( ! match ( TokenType : : Char ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  { } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-04-09 19:30:23 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( initial_zero  = =  ReadDigitsInitialZeroState : : Disallow  & &  m_parser_state . current_token . value ( )  = =  " 0 " ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  { } ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    int  count  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    size_t  offset  =  0 ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-12-06 17:03:29 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    auto  start_token  =  m_parser_state . current_token ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    while  ( match ( TokenType : : Char ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2021-11-11 00:55:02 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        auto  const  c  =  m_parser_state . current_token . value ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( max_count  >  0  & &  count  > =  max_count ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-04-10 09:10:44 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        if  ( hex  & &  ! AK : : StringUtils : : convert_to_uint_from_hex ( c ) . has_value ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            break ; 
							 
						 
					
						
							
								
									
										
										
										
											2023-12-23 15:59:14 +13:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        if  ( ! hex  & &  ! c . to_number < unsigned > ( ) . has_value ( ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2021-04-10 09:10:44 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        offset  + =  consume ( ) . value ( ) . length ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        + + count ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-08-11 09:58:08 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( count  <  min_count ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  { } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    return  StringView  {  start_token . value ( ) . characters_without_null_termination ( ) ,  offset  } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-08-11 09:58:08 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								Optional < unsigned >  ECMA262Parser : : read_digits ( ECMA262Parser : : ReadDigitsInitialZeroState  initial_zero ,  bool  hex ,  int  max_count ,  int  min_count )  
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2021-08-11 09:58:08 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    auto  str  =  read_digits_as_string ( initial_zero ,  hex ,  max_count ,  min_count ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( str . is_empty ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  { } ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( hex ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  AK : : StringUtils : : convert_to_uint_from_hex ( str ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2023-12-23 15:59:14 +13:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    return  str . to_number < unsigned > ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								bool  ECMA262Parser : : parse_quantifier ( ByteCode &  stack ,  size_t &  match_length_minimum ,  ParseFlags  flags )  
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    enum  class  Repetition  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        OneOrMore , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        ZeroOrMore , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        Optional , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        Explicit , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        None , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    }  repetition_mark  {  Repetition : : None  } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    bool  ungreedy  =  false ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-12 11:02:46 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    Optional < u64 >  repeat_min ,  repeat_max ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( match ( TokenType : : Asterisk ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        repetition_mark  =  Repetition : : ZeroOrMore ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    }  else  if  ( match ( TokenType : : Plus ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        repetition_mark  =  Repetition : : OneOrMore ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    }  else  if  ( match ( TokenType : : Questionmark ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        repetition_mark  =  Repetition : : Optional ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    }  else  if  ( match ( TokenType : : LeftCurly ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        repetition_mark  =  Repetition : : Explicit ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-10 16:35:45 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        if  ( ! parse_interval_quantifier ( repeat_min ,  repeat_max ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            if  ( flags . unicode )  { 
							 
						 
					
						
							
								
									
										
										
										
											2021-09-30 20:03:41 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                // Invalid interval quantifiers are disallowed in Unicode mod - they must be escaped with '\{'.
 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-10 16:35:45 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                set_error ( Error : : InvalidPattern ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-04-10 09:10:44 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            } 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-10 16:35:45 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            return  ! has_error ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    }  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( match ( TokenType : : Questionmark ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        ungreedy  =  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    switch  ( repetition_mark )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    case  Repetition : : OneOrMore : 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-10 14:18:04 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        ByteCode : : transform_bytecode_repetition_min_one ( stack ,  ! ungreedy ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    case  Repetition : : ZeroOrMore : 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-10 14:18:04 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        ByteCode : : transform_bytecode_repetition_any ( stack ,  ! ungreedy ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        match_length_minimum  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    case  Repetition : : Optional : 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-10 14:18:04 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        ByteCode : : transform_bytecode_repetition_zero_or_one ( stack ,  ! ungreedy ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        match_length_minimum  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        break ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-12 11:02:46 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    case  Repetition : : Explicit :  { 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-31 22:27:08 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        auto  min_repetition_mark_id  =  m_parser_state . repetition_mark_count + + ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        auto  max_repetition_mark_id  =  m_parser_state . repetition_mark_count + + ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        ByteCode : : transform_bytecode_repetition_min_max ( stack ,  repeat_min . value ( ) ,  repeat_max ,  min_repetition_mark_id ,  max_repetition_mark_id ,  ! ungreedy ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        match_length_minimum  * =  repeat_min . value ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        break ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-12 11:02:46 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    case  Repetition : : None : 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-23 20:42:32 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        VERIFY_NOT_REACHED ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-08-12 11:02:46 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								bool  ECMA262Parser : : parse_interval_quantifier ( Optional < u64 > &  repeat_min ,  Optional < u64 > &  repeat_max )  
						 
					
						
							
								
									
										
										
										
											2021-08-10 16:35:45 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    VERIFY ( match ( TokenType : : LeftCurly ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    auto  chars_consumed  =  1 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    auto  low_bound_string  =  read_digits_as_string ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    chars_consumed  + =  low_bound_string . length ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2023-12-23 15:59:14 +13:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    auto  low_bound  =  low_bound_string . to_number < u64 > ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-10 16:35:45 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( ! low_bound . has_value ( ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( ! m_should_use_browser_extended_grammar  & &  done ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  set_error ( Error : : MismatchingBrace ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        back ( chars_consumed  +  ! done ( ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    repeat_min  =  low_bound . value ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( match ( TokenType : : Comma ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        + + chars_consumed ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        auto  high_bound_string  =  read_digits_as_string ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2023-12-23 15:59:14 +13:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        auto  high_bound  =  high_bound_string . to_number < u64 > ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-10 16:35:45 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        if  ( high_bound . has_value ( ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            repeat_max  =  high_bound . value ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            chars_consumed  + =  high_bound_string . length ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    }  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        repeat_max  =  repeat_min ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( ! match ( TokenType : : RightCurly ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( ! m_should_use_browser_extended_grammar  & &  done ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  set_error ( Error : : MismatchingBrace ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        back ( chars_consumed  +  ! done ( ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    + + chars_consumed ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( repeat_max . has_value ( ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( repeat_min . value ( )  >  repeat_max . value ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            set_error ( Error : : InvalidBraceContent ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-08-12 11:02:46 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( ( * repeat_min  >  s_ecma262_maximum_repetition_count )  | |  ( repeat_max . has_value ( )  & &  ( * repeat_max  >  s_ecma262_maximum_repetition_count ) ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  set_error ( Error : : InvalidBraceContent ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-08-10 16:35:45 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								bool  ECMA262Parser : : parse_atom ( ByteCode &  stack ,  size_t &  match_length_minimum ,  ParseFlags  flags )  
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2020-11-28 10:46:30 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( match ( TokenType : : EscapeSequence ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        // Also part of AtomEscape.
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        auto  token  =  consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        match_length_minimum  + =  1 ; 
							 
						 
					
						
							
								
									
										
										
											
												LibRegex: Treat pattern string characters as unsigned
For example, consider the following pattern:
    new RegExp('\ud834\udf06', 'u')
With this pattern, the regex parser should insert the UTF-8 encoded
bytes 0xf0, 0x9d, 0x8c, and 0x86. However, because these characters are
currently treated as normal char types, they have a negative value since
they are all > 0x7f. Then, due to sign extension, when these characters
are cast to u64, the sign bit is preserved. The result is that these
bytes are inserted as 0xfffffffffffffff0, 0xffffffffffffff9d, etc.
Fortunately, there are only a few places where we insert bytecode with
the raw characters. In these places, be sure to treat the bytes as u8
before they are cast to u64.
											 
										 
										
											2021-08-20 10:22:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        stack . insert_bytecode_compare_values ( {  {  CharacterCompareType : : Char ,  ( u8 ) token . value ( ) [ 1 ]  }  } ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-28 10:46:30 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( try_skip ( " \\ " sv ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        // AtomEscape.
 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        return  parse_atom_escape ( stack ,  match_length_minimum ,  flags ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( match ( TokenType : : LeftBracket ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        // Character class.
 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        return  parse_character_class ( stack ,  match_length_minimum ,  flags ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( match ( TokenType : : LeftParen ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        // Non-capturing group, or a capture group.
 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        return  parse_capture_group ( stack ,  match_length_minimum ,  flags ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( match ( TokenType : : Period ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        match_length_minimum  + =  1 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        stack . insert_bytecode_compare_values ( {  {  CharacterCompareType : : AnyChar ,  0  }  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( match ( TokenType : : Circumflex )  | |  match ( TokenType : : Dollar )  | |  match ( TokenType : : RightParen ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        | |  match ( TokenType : : Pipe )  | |  match ( TokenType : : Plus )  | |  match ( TokenType : : Asterisk ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        | |  match ( TokenType : : Questionmark ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( match ( TokenType : : RightBracket )  | |  match ( TokenType : : RightCurly )  | |  match ( TokenType : : LeftCurly ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        if  ( flags . unicode ) 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-11 18:06:33 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            return  set_error ( Error : : InvalidPattern ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        if  ( m_should_use_browser_extended_grammar )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            auto  token  =  consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            match_length_minimum  + =  1 ; 
							 
						 
					
						
							
								
									
										
										
											
												LibRegex: Treat pattern string characters as unsigned
For example, consider the following pattern:
    new RegExp('\ud834\udf06', 'u')
With this pattern, the regex parser should insert the UTF-8 encoded
bytes 0xf0, 0x9d, 0x8c, and 0x86. However, because these characters are
currently treated as normal char types, they have a negative value since
they are all > 0x7f. Then, due to sign extension, when these characters
are cast to u64, the sign bit is preserved. The result is that these
bytes are inserted as 0xfffffffffffffff0, 0xffffffffffffff9d, etc.
Fortunately, there are only a few places where we insert bytecode with
the raw characters. In these places, be sure to treat the bytes as u8
before they are cast to u64.
											 
										 
										
											2021-08-20 10:22:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            stack . insert_bytecode_compare_values ( {  {  CharacterCompareType : : Char ,  ( u8 ) token . value ( ) [ 0 ]  }  } ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
									
										
										
										
											2021-12-21 17:54:45 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        return  false ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-11-29 19:10:46 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( match_ordinary_characters ( ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        auto  token  =  consume ( ) . value ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        match_length_minimum  + =  1 ; 
							 
						 
					
						
							
								
									
										
										
											
												LibRegex: Treat pattern string characters as unsigned
For example, consider the following pattern:
    new RegExp('\ud834\udf06', 'u')
With this pattern, the regex parser should insert the UTF-8 encoded
bytes 0xf0, 0x9d, 0x8c, and 0x86. However, because these characters are
currently treated as normal char types, they have a negative value since
they are all > 0x7f. Then, due to sign extension, when these characters
are cast to u64, the sign bit is preserved. The result is that these
bytes are inserted as 0xfffffffffffffff0, 0xffffffffffffff9d, etc.
Fortunately, there are only a few places where we insert bytecode with
the raw characters. In these places, be sure to treat the bytes as u8
before they are cast to u64.
											 
										 
										
											2021-08-20 10:22:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        stack . insert_bytecode_compare_values ( {  {  CharacterCompareType : : Char ,  ( u8 ) token [ 0 ]  }  } ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    set_error ( Error : : InvalidPattern ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								bool  ECMA262Parser : : parse_extended_atom ( ByteCode & ,  size_t & ,  ParseFlags )  
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    // Note: This includes only rules *not* present in parse_atom()
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    VERIFY ( m_should_use_browser_extended_grammar ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-12-21 17:57:04 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    return  parse_invalid_braced_quantifier ( ) ;  // FAIL FAIL FAIL
 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								bool  ECMA262Parser : : parse_invalid_braced_quantifier ( )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( ! match ( TokenType : : LeftCurly ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    size_t  chars_consumed  =  1 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    auto  low_bound  =  read_digits_as_string ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    StringView  high_bound ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( low_bound . is_empty ( ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-24 13:00:14 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        back ( chars_consumed  +  ! done ( ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    chars_consumed  + =  low_bound . length ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( match ( TokenType : : Comma ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        + + chars_consumed ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        high_bound  =  read_digits_as_string ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        chars_consumed  + =  high_bound . length ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( ! match ( TokenType : : RightCurly ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-24 13:00:14 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        back ( chars_consumed  +  ! done ( ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    set_error ( Error : : InvalidPattern ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								bool  ECMA262Parser : : parse_character_escape ( Vector < CompareTypeAndValuePair > &  compares ,  size_t &  match_length_minimum ,  ParseFlags  flags )  
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    // CharacterEscape > ControlEscape
 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( try_skip ( " f " sv ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        match_length_minimum  + =  1 ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:19:43 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        compares . append ( {  CharacterCompareType : : Char ,  ( ByteCodeValueType ) ' \f '  } ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( try_skip ( " n " sv ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        match_length_minimum  + =  1 ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:19:43 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        compares . append ( {  CharacterCompareType : : Char ,  ( ByteCodeValueType ) ' \n '  } ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( try_skip ( " r " sv ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        match_length_minimum  + =  1 ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:19:43 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        compares . append ( {  CharacterCompareType : : Char ,  ( ByteCodeValueType ) ' \r '  } ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( try_skip ( " t " sv ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        match_length_minimum  + =  1 ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:19:43 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        compares . append ( {  CharacterCompareType : : Char ,  ( ByteCodeValueType ) ' \t '  } ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( try_skip ( " v " sv ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        match_length_minimum  + =  1 ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:19:43 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        compares . append ( {  CharacterCompareType : : Char ,  ( ByteCodeValueType ) ' \v '  } ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    // CharacterEscape > ControlLetter
 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( try_skip ( " c " sv ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-11 09:39:10 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        for  ( auto  c  :  s_alphabetic_characters )  { 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            if  ( try_skip ( {  & c ,  1  } ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                match_length_minimum  + =  1 ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:19:43 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                compares . append ( {  CharacterCompareType : : Char ,  ( ByteCodeValueType ) ( c  %  32 )  } ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-28 12:27:59 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        if  ( flags . unicode )  { 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-11 09:39:10 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            set_error ( Error : : InvalidPattern ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        if  ( m_should_use_browser_extended_grammar )  { 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:19:43 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            back ( 1  +  ( done ( )  ?  0  :  1 ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            compares . append ( {  CharacterCompareType : : Char ,  ( ByteCodeValueType ) ' \\ '  } ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            match_length_minimum  + =  1 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-11-28 12:27:59 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        // Allow '\c' in non-unicode mode, just matches 'c'.
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        match_length_minimum  + =  1 ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:19:43 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        compares . append ( {  CharacterCompareType : : Char ,  ( ByteCodeValueType ) ' c '  } ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-28 12:27:59 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        return  true ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-08-11 16:41:57 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    // '\0'
 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( try_skip ( " 0 " sv ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-11 16:41:57 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        if  ( ! lookahead_any ( s_decimal_characters ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            match_length_minimum  + =  1 ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:19:43 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            compares . append ( {  CharacterCompareType : : Char ,  ( ByteCodeValueType ) 0  } ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-11 16:41:57 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        back ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    // LegacyOctalEscapeSequence
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( m_should_use_browser_extended_grammar )  { 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        if  ( ! flags . unicode )  { 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            if  ( auto  escape  =  parse_legacy_octal_escape ( ) ;  escape . has_value ( ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:19:43 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                compares . append ( {  CharacterCompareType : : Char ,  ( ByteCodeValueType ) escape . value ( )  } ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                match_length_minimum  + =  1 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    // HexEscape
 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( try_skip ( " x " sv ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-11 11:18:57 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        if  ( auto  hex_escape  =  read_digits ( ReadDigitsInitialZeroState : : Allow ,  true ,  2 ,  2 ) ;  hex_escape . has_value ( ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            match_length_minimum  + =  1 ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:19:43 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            compares . append ( {  CharacterCompareType : : Char ,  ( ByteCodeValueType ) hex_escape . value ( )  } ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            return  true ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-12-21 17:54:45 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        if  ( ! flags . unicode )  { 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-28 12:27:59 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            // '\x' is allowed in non-unicode mode, just matches 'x'.
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            match_length_minimum  + =  1 ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:19:43 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            compares . append ( {  CharacterCompareType : : Char ,  ( ByteCodeValueType ) ' x '  } ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-28 12:27:59 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            return  true ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
									
										
										
										
											2021-12-21 17:54:45 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        set_error ( Error : : InvalidPattern ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  false ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( try_skip ( " u " sv ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        if  ( auto  code_point  =  consume_escaped_code_point ( flags . unicode ) ;  code_point . has_value ( ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2020-12-06 17:04:28 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            match_length_minimum  + =  1 ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:19:43 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            compares . append ( {  CharacterCompareType : : Char ,  ( ByteCodeValueType ) code_point . value ( )  } ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-12-06 17:04:28 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-18 14:43:11 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  false ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    // IdentityEscape
 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    for  ( auto  ch  :  identity_escape_characters ( flags . unicode ,  m_should_use_browser_extended_grammar ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-28 10:46:30 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        if  ( try_skip ( {  & ch ,  1  } ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            match_length_minimum  + =  1 ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:19:43 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            compares . append ( {  CharacterCompareType : : Char ,  ( ByteCodeValueType ) ch  } ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-28 10:46:30 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( flags . unicode )  { 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        if  ( try_skip ( " / " sv ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-28 10:46:30 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            match_length_minimum  + =  1 ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:19:43 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            compares . append ( {  CharacterCompareType : : Char ,  ( ByteCodeValueType ) ' / '  } ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-28 10:46:30 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:19:43 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								bool  ECMA262Parser : : parse_atom_escape ( ByteCode &  stack ,  size_t &  match_length_minimum ,  ParseFlags  flags )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( auto  escape_str  =  read_digits_as_string ( ReadDigitsInitialZeroState : : Disallow ) ;  ! escape_str . is_empty ( ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2023-12-23 15:59:14 +13:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        if  ( auto  escape  =  escape_str . to_number < unsigned > ( ) ;  escape . has_value ( ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:19:43 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            // See if this is a "back"-reference (we've already parsed the group it refers to)
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            auto  maybe_length  =  m_parser_state . capture_group_minimum_lengths . get ( escape . value ( ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( maybe_length . has_value ( ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                match_length_minimum  + =  maybe_length . value ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                stack . insert_bytecode_compare_values ( {  {  CharacterCompareType : : Reference ,  ( ByteCodeValueType ) escape . value ( )  }  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            // It's not a pattern seen before, so we have to see if it's a valid reference to a future group.
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( escape . value ( )  < =  ensure_total_number_of_capturing_parenthesis ( ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                // This refers to a future group, and it will _always_ be matching an empty string
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                // So just match nothing and move on.
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( ! m_should_use_browser_extended_grammar )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                set_error ( Error : : InvalidNumber ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        // If not, put the characters back.
 
							 
						 
					
						
							
								
									
										
										
										
											2022-09-06 23:55:09 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        back ( escape_str . length ( )  +  ( done ( )  ?  0  :  1 ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:19:43 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    Vector < CompareTypeAndValuePair >  escape_compares ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( parse_character_escape ( escape_compares ,  match_length_minimum ,  flags ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        stack . insert_bytecode_compare_values ( move ( escape_compares ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( flags . named  & &  try_skip ( " k " sv ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        auto  name  =  read_capture_group_specifier ( true ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( name . is_empty ( ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            set_error ( Error : : InvalidNameForCaptureGroup ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-14 16:28:54 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        auto  maybe_capture_group  =  m_parser_state . named_capture_groups . get ( name ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( ! maybe_capture_group . has_value ( ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            set_error ( Error : : InvalidNameForCaptureGroup ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-14 16:28:54 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        match_length_minimum  + =  maybe_capture_group - > minimum_length ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-08-14 16:28:54 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        stack . insert_bytecode_compare_values ( {  {  CharacterCompareType : : Reference ,  ( ByteCodeValueType ) maybe_capture_group - > group_index  }  } ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( flags . unicode )  { 
							 
						 
					
						
							
								
									
										
										
										
											2021-09-19 23:00:45 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        PropertyEscape  property  { } ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-29 14:18:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        bool  negated  =  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( parse_unicode_property_escape ( property ,  negated ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-02 06:57:10 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            Vector < CompareTypeAndValuePair >  compares ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-29 14:18:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            if  ( negated ) 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-02 06:57:10 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                compares . empend ( CompareTypeAndValuePair  {  CharacterCompareType : : Inverse ,  0  } ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-31 17:46:05 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            property . visit ( 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                [ & ] ( Unicode : : Property  property )  { 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-02 06:57:10 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                    compares . empend ( CompareTypeAndValuePair  {  CharacterCompareType : : Property ,  ( ByteCodeValueType ) property  } ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-31 17:46:05 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                } , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                [ & ] ( Unicode : : GeneralCategory  general_category )  { 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-02 06:57:10 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                    compares . empend ( CompareTypeAndValuePair  {  CharacterCompareType : : GeneralCategory ,  ( ByteCodeValueType ) general_category  } ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-04 06:35:48 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                } , 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-04 07:26:25 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                [ & ] ( Script  script )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    if  ( script . is_extension ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                        compares . empend ( CompareTypeAndValuePair  {  CharacterCompareType : : ScriptExtension ,  ( ByteCodeValueType ) script . script  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                        compares . empend ( CompareTypeAndValuePair  {  CharacterCompareType : : Script ,  ( ByteCodeValueType ) script . script  } ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-13 17:31:39 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                } , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                [ ] ( Empty & )  {  VERIFY_NOT_REACHED ( ) ;  } ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-02 06:57:10 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            stack . insert_bytecode_compare_values ( move ( compares ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            match_length_minimum  + =  1 ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-29 14:18:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            return  true ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-11-30 17:46:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( done ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  set_error ( Error : : InvalidTrailingEscape ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    bool  negate  =  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    auto  ch  =  parse_character_class_escape ( negate ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( ! ch . has_value ( ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        if  ( ! flags . unicode )  { 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-28 12:27:59 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            // Allow all SourceCharacter's as escapes here.
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            auto  token  =  consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            match_length_minimum  + =  1 ; 
							 
						 
					
						
							
								
									
										
										
											
												LibRegex: Treat pattern string characters as unsigned
For example, consider the following pattern:
    new RegExp('\ud834\udf06', 'u')
With this pattern, the regex parser should insert the UTF-8 encoded
bytes 0xf0, 0x9d, 0x8c, and 0x86. However, because these characters are
currently treated as normal char types, they have a negative value since
they are all > 0x7f. Then, due to sign extension, when these characters
are cast to u64, the sign bit is preserved. The result is that these
bytes are inserted as 0xfffffffffffffff0, 0xffffffffffffff9d, etc.
Fortunately, there are only a few places where we insert bytecode with
the raw characters. In these places, be sure to treat the bytes as u8
before they are cast to u64.
											 
										 
										
											2021-08-20 10:22:23 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            stack . insert_bytecode_compare_values ( {  {  CharacterCompareType : : Char ,  ( u8 ) token . value ( ) [ 0 ]  }  } ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-28 12:27:59 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        set_error ( Error : : InvalidCharacterClass ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    Vector < CompareTypeAndValuePair >  compares ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( negate ) 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-28 17:47:12 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        compares . empend ( CompareTypeAndValuePair  {  CharacterCompareType : : Inverse ,  0  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    compares . empend ( CompareTypeAndValuePair  {  CharacterCompareType : : CharClass ,  ( ByteCodeValueType ) ch . value ( )  } ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    match_length_minimum  + =  1 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    stack . insert_bytecode_compare_values ( move ( compares ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								Optional < u8 >  ECMA262Parser : : parse_legacy_octal_escape ( )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    constexpr  auto  all_octal_digits  =  " 01234567 " sv ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    auto  read_octal_digit  =  [ & ] ( auto  start ,  auto  end ,  bool  should_ensure_no_following_octal_digit )  - >  Optional < u8 >  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        for  ( char  c  =  ' 0 '  +  start ;  c  < =  ' 0 '  +  end ;  + + c )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( try_skip ( {  & c ,  1  } ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                if  ( ! should_ensure_no_following_octal_digit  | |  ! lookahead_any ( all_octal_digits ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    return  c  -  ' 0 ' ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                back ( 2 ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                return  { } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  { } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    // OctalDigit(1)
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( auto  digit  =  read_octal_digit ( 0 ,  7 ,  true ) ;  digit . has_value ( ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  digit . value ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    // OctalDigit(2)
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( auto  left_digit  =  read_octal_digit ( 0 ,  3 ,  false ) ;  left_digit . has_value ( ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( auto  right_digit  =  read_octal_digit ( 0 ,  7 ,  true ) ;  right_digit . has_value ( ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  left_digit . value ( )  *  8  +  right_digit . value ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        back ( 2 ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    // OctalDigit(2)
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( auto  left_digit  =  read_octal_digit ( 4 ,  7 ,  false ) ;  left_digit . has_value ( ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( auto  right_digit  =  read_octal_digit ( 0 ,  7 ,  false ) ;  right_digit . has_value ( ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  left_digit . value ( )  *  8  +  right_digit . value ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        back ( 2 ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    // OctalDigit(3)
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( auto  left_digit  =  read_octal_digit ( 0 ,  3 ,  false ) ;  left_digit . has_value ( ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        size_t  chars_consumed  =  1 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( auto  mid_digit  =  read_octal_digit ( 0 ,  7 ,  false ) ;  mid_digit . has_value ( ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            + + chars_consumed ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( auto  right_digit  =  read_octal_digit ( 0 ,  7 ,  false ) ;  right_digit . has_value ( ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                return  left_digit . value ( )  *  64  +  mid_digit . value ( )  *  8  +  right_digit . value ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        back ( chars_consumed ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  { } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								Optional < CharClass >  ECMA262Parser : : parse_character_class_escape ( bool &  negate ,  bool  expect_backslash )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( expect_backslash  & &  ! try_skip ( " \\ " sv ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        return  { } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    // CharacterClassEscape
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    CharClass  ch_class ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( try_skip ( " d " sv ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        ch_class  =  CharClass : : Digit ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    }  else  if  ( try_skip ( " D " sv ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        ch_class  =  CharClass : : Digit ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        negate  =  true ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    }  else  if  ( try_skip ( " s " sv ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        ch_class  =  CharClass : : Space ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    }  else  if  ( try_skip ( " S " sv ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        ch_class  =  CharClass : : Space ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        negate  =  true ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    }  else  if  ( try_skip ( " w " sv ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        ch_class  =  CharClass : : Word ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    }  else  if  ( try_skip ( " W " sv ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        ch_class  =  CharClass : : Word ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        negate  =  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    }  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  { } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  ch_class ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								bool  ECMA262Parser : : parse_character_class ( ByteCode &  stack ,  size_t &  match_length_minimum ,  ParseFlags  flags )  
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    consume ( TokenType : : LeftBracket ,  Error : : InvalidPattern ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    Vector < CompareTypeAndValuePair >  compares ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2024-01-11 12:51:13 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    auto  uses_explicit_or_semantics  =  false ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( match ( TokenType : : Circumflex ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        // Negated charclass
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        consume ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-28 17:47:12 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        compares . empend ( CompareTypeAndValuePair  {  CharacterCompareType : : Inverse ,  0  } ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2024-01-11 12:51:13 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        uses_explicit_or_semantics  =  true ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:22:07 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    // ClassContents :: [empty]
 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( match ( TokenType : : RightBracket ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        consume ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-04-12 10:58:59 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        // Should only have at most an 'Inverse'
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        VERIFY ( compares . size ( )  < =  1 ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        stack . insert_bytecode_compare_values ( move ( compares ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:22:07 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    // ClassContents :: [~UnicodeSetsMode] NonemptyClassRanges[?UnicodeMode]
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( ! flags . unicode_sets  & &  ! parse_nonempty_class_ranges ( compares ,  flags ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    // ClassContents :: [+UnicodeSetsMode] ClassSetExpression
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( flags . unicode_sets  & &  ! parse_class_set_expression ( compares ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2024-01-11 12:51:13 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( uses_explicit_or_semantics  & &  compares . size ( )  >  2 )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        compares . insert ( 1 ,  CompareTypeAndValuePair  {  CharacterCompareType : : Or ,  0  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        compares . empend ( CompareTypeAndValuePair  {  CharacterCompareType : : EndAndOr ,  0  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    match_length_minimum  + =  1 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    stack . insert_bytecode_compare_values ( move ( compares ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								struct  CharClassRangeElement  {  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    union  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        CharClass  character_class ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        u32  code_point  {  0  } ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-29 14:18:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        Unicode : : Property  property ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-31 17:46:05 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        Unicode : : GeneralCategory  general_category ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-04 06:35:48 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        Unicode : : Script  script ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    bool  is_negated  {  false  } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    bool  is_character_class  {  false  } ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-31 17:46:05 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    bool  is_property  {  false  } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    bool  is_general_category  {  false  } ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-04 06:35:48 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    bool  is_script  {  false  } ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-04 07:26:25 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    bool  is_script_extension  {  false  } ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								} ;  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								bool  ECMA262Parser : : parse_nonempty_class_ranges ( Vector < CompareTypeAndValuePair > &  ranges ,  ParseFlags  flags )  
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    auto  read_class_atom_no_dash  =  [ & ] ( )  - >  Optional < CharClassRangeElement >  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( match ( TokenType : : EscapeSequence ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            auto  token  =  consume ( ) . value ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-06-06 08:29:17 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            return  {  CharClassRangeElement  {  . code_point  =  ( u32 ) token [ 1 ] ,  . is_character_class  =  false  }  } ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        if  ( try_skip ( " \\ " sv ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-30 17:46:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            if  ( done ( ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                set_error ( Error : : InvalidTrailingEscape ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                return  { } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            if  ( try_skip ( " f " sv ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2021-06-06 08:29:17 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                return  {  CharClassRangeElement  {  . code_point  =  ' \f ' ,  . is_character_class  =  false  }  } ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            if  ( try_skip ( " n " sv ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2021-06-06 08:29:17 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                return  {  CharClassRangeElement  {  . code_point  =  ' \n ' ,  . is_character_class  =  false  }  } ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            if  ( try_skip ( " r " sv ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2021-06-06 08:29:17 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                return  {  CharClassRangeElement  {  . code_point  =  ' \r ' ,  . is_character_class  =  false  }  } ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            if  ( try_skip ( " t " sv ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2021-06-06 08:29:17 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                return  {  CharClassRangeElement  {  . code_point  =  ' \t ' ,  . is_character_class  =  false  }  } ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            if  ( try_skip ( " v " sv ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2021-06-06 08:29:17 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                return  {  CharClassRangeElement  {  . code_point  =  ' \v ' ,  . is_character_class  =  false  }  } ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            if  ( try_skip ( " b " sv ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2021-06-06 08:29:17 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                return  {  CharClassRangeElement  {  . code_point  =  ' \b ' ,  . is_character_class  =  false  }  } ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            if  ( try_skip ( " / " sv ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2021-06-06 08:29:17 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                return  {  CharClassRangeElement  {  . code_point  =  ' / ' ,  . is_character_class  =  false  }  } ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            // CharacterEscape > ControlLetter
 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            if  ( try_skip ( " c " sv ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-11 09:39:10 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                for  ( auto  c  :  s_alphabetic_characters )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    if  ( try_skip ( {  & c ,  1  } ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2021-06-06 08:29:17 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                        return  {  CharClassRangeElement  {  . code_point  =  ( u32 ) ( c  %  32 ) ,  . is_character_class  =  false  }  } ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-11 09:39:10 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                if  ( flags . unicode )  { 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-11 09:39:10 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                    set_error ( Error : : InvalidPattern ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    return  { } ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                } 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-11 09:39:10 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                if  ( m_should_use_browser_extended_grammar )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    for  ( auto  c  =  ' 0 ' ;  c  < =  ' 9 ' ;  + + c )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                        if  ( try_skip ( {  & c ,  1  } ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2021-06-06 08:29:17 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                            return  {  CharClassRangeElement  {  . code_point  =  ( u32 ) ( c  %  32 ) ,  . is_character_class  =  false  }  } ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                    } 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                    if  ( try_skip ( " _ " sv ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2021-06-06 08:29:17 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                        return  {  CharClassRangeElement  {  . code_point  =  ( u32 ) ( ' _ '  %  32 ) ,  . is_character_class  =  false  }  } ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-08-11 09:39:10 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                    back ( 1  +  ! done ( ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-06-06 08:29:17 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                    return  {  CharClassRangeElement  {  . code_point  =  ' \\ ' ,  . is_character_class  =  false  }  } ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            // '\0'
 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            if  ( try_skip ( " 0 " sv ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-11 16:41:57 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                if  ( ! lookahead_any ( s_decimal_characters ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    return  {  CharClassRangeElement  {  . code_point  =  0 ,  . is_character_class  =  false  }  } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                back ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            // LegacyOctalEscapeSequence
 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            if  ( m_should_use_browser_extended_grammar  & &  ! flags . unicode )  { 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-11 16:41:57 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                if  ( auto  escape  =  parse_legacy_octal_escape ( ) ;  escape . has_value ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    return  {  CharClassRangeElement  {  . code_point  =  escape . value ( ) ,  . is_character_class  =  false  }  } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            } 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            // HexEscape
 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            if  ( try_skip ( " x " sv ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-11 11:18:57 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                if  ( auto  hex_escape  =  read_digits ( ReadDigitsInitialZeroState : : Allow ,  true ,  2 ,  2 ) ;  hex_escape . has_value ( ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2021-06-06 08:29:17 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                    return  {  CharClassRangeElement  {  . code_point  =  hex_escape . value ( ) ,  . is_character_class  =  false  }  } ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                }  else  if  ( ! flags . unicode )  { 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-28 12:27:59 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                    // '\x' is allowed in non-unicode mode, just matches 'x'.
 
							 
						 
					
						
							
								
									
										
										
										
											2021-06-06 08:29:17 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                    return  {  CharClassRangeElement  {  . code_point  =  ' x ' ,  . is_character_class  =  false  }  } ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-28 12:27:59 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                }  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    set_error ( Error : : InvalidPattern ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    return  { } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                } 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            if  ( try_skip ( " u " sv ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                if  ( auto  code_point  =  consume_escaped_code_point ( flags . unicode ) ;  code_point . has_value ( ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2021-06-01 10:01:11 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                    // FIXME: While code point ranges are supported, code point matches as "Char" are not!
 
							 
						 
					
						
							
								
									
										
										
										
											2021-06-06 08:29:17 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                    return  {  CharClassRangeElement  {  . code_point  =  code_point . value ( ) ,  . is_character_class  =  false  }  } ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-12-06 17:04:28 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                } 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-18 14:43:11 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                return  { } ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-08-11 17:52:57 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            // IdentityEscape
 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            for  ( auto  ch  :  identity_escape_characters ( flags . unicode ,  m_should_use_browser_extended_grammar ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-11 17:52:57 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                if  ( try_skip ( {  & ch ,  1  } ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    return  {  CharClassRangeElement  {  . code_point  =  ( u32 ) ch ,  . is_character_class  =  false  }  } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            if  ( flags . unicode )  { 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                if  ( try_skip ( " - " sv ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2021-06-06 08:29:17 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                    return  {  CharClassRangeElement  {  . code_point  =  ' - ' ,  . is_character_class  =  false  }  } ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-09-19 23:00:45 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                PropertyEscape  property  { } ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-29 14:18:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                bool  negated  =  false ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-31 17:46:05 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                if  ( parse_unicode_property_escape ( property ,  negated ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    return  property . visit ( 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                        [ & ] ( Unicode : : Property  property )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                            return  CharClassRangeElement  {  . property  =  property ,  . is_negated  =  negated ,  . is_character_class  =  true ,  . is_property  =  true  } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                        } , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                        [ & ] ( Unicode : : GeneralCategory  general_category )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                            return  CharClassRangeElement  {  . general_category  =  general_category ,  . is_negated  =  negated ,  . is_character_class  =  true ,  . is_general_category  =  true  } ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-04 06:35:48 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                        } , 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-04 07:26:25 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                        [ & ] ( Script  script )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                            if  ( script . is_extension ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                                return  CharClassRangeElement  {  . script  =  script . script ,  . is_negated  =  negated ,  . is_character_class  =  true ,  . is_script_extension  =  true  } ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-12-21 17:54:45 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                            return  CharClassRangeElement  {  . script  =  script . script ,  . is_negated  =  negated ,  . is_character_class  =  true ,  . is_script  =  true  } ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-13 17:31:39 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                        } , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                        [ ] ( Empty & )  - >  CharClassRangeElement  {  VERIFY_NOT_REACHED ( ) ;  } ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-31 17:46:05 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                } 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            if  ( try_skip ( " d " sv ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2021-06-06 08:29:17 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                return  {  CharClassRangeElement  {  . character_class  =  CharClass : : Digit ,  . is_character_class  =  true  }  } ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            if  ( try_skip ( " s " sv ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2021-06-06 08:29:17 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                return  {  CharClassRangeElement  {  . character_class  =  CharClass : : Space ,  . is_character_class  =  true  }  } ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            if  ( try_skip ( " w " sv ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2021-06-06 08:29:17 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                return  {  CharClassRangeElement  {  . character_class  =  CharClass : : Word ,  . is_character_class  =  true  }  } ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            if  ( try_skip ( " D " sv ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2021-06-06 08:29:17 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                return  {  CharClassRangeElement  {  . character_class  =  CharClass : : Digit ,  . is_negated  =  true ,  . is_character_class  =  true  }  } ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            if  ( try_skip ( " S " sv ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2021-06-06 08:29:17 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                return  {  CharClassRangeElement  {  . character_class  =  CharClass : : Space ,  . is_negated  =  true ,  . is_character_class  =  true  }  } ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            if  ( try_skip ( " W " sv ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2021-06-06 08:29:17 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                return  {  CharClassRangeElement  {  . character_class  =  CharClass : : Word ,  . is_negated  =  true ,  . is_character_class  =  true  }  } ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-28 12:27:59 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            if  ( ! flags . unicode )  { 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-28 12:27:59 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                // Any unrecognised escape is allowed in non-unicode mode.
 
							 
						 
					
						
							
								
									
										
										
										
											2021-06-06 08:29:17 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                return  {  CharClassRangeElement  {  . code_point  =  ( u32 ) skip ( ) ,  . is_character_class  =  false  }  } ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-28 12:27:59 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            } 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-11 09:39:10 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            set_error ( Error : : InvalidPattern ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  { } ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2023-12-31 17:44:44 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        if  ( match ( TokenType : : Eof ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            set_error ( Error : : MismatchingBracket ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  { } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        if  ( match ( TokenType : : RightBracket )  | |  match ( TokenType : : HyphenMinus ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  { } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-11-28 12:27:59 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        // Allow any (other) SourceCharacter.
 
							 
						 
					
						
							
								
									
										
										
										
											2021-06-06 08:29:17 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        return  {  CharClassRangeElement  {  . code_point  =  ( u32 ) skip ( ) ,  . is_character_class  =  false  }  } ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    auto  read_class_atom  =  [ & ] ( )  - >  Optional < CharClassRangeElement >  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( match ( TokenType : : HyphenMinus ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            consume ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-06-06 08:29:17 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            return  {  CharClassRangeElement  {  . code_point  =  ' - ' ,  . is_character_class  =  false  }  } ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  read_class_atom_no_dash ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-07-29 14:18:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    auto  empend_atom  =  [ & ] ( auto &  atom )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( atom . is_character_class )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( atom . is_negated ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                ranges . empend ( CompareTypeAndValuePair  {  CharacterCompareType : : TemporaryInverse ,  0  } ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-31 17:46:05 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( atom . is_property ) 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-29 14:18:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                ranges . empend ( CompareTypeAndValuePair  {  CharacterCompareType : : Property ,  ( ByteCodeValueType ) ( atom . property )  } ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-31 17:46:05 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            else  if  ( atom . is_general_category ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                ranges . empend ( CompareTypeAndValuePair  {  CharacterCompareType : : GeneralCategory ,  ( ByteCodeValueType ) ( atom . general_category )  } ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-04 06:35:48 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            else  if  ( atom . is_script ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                ranges . empend ( CompareTypeAndValuePair  {  CharacterCompareType : : Script ,  ( ByteCodeValueType ) ( atom . script )  } ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-04 07:26:25 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            else  if  ( atom . is_script_extension ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                ranges . empend ( CompareTypeAndValuePair  {  CharacterCompareType : : ScriptExtension ,  ( ByteCodeValueType ) ( atom . script )  } ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-29 14:18:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                ranges . empend ( CompareTypeAndValuePair  {  CharacterCompareType : : CharClass ,  ( ByteCodeValueType ) atom . character_class  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        }  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            VERIFY ( ! atom . is_negated ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            ranges . empend ( CompareTypeAndValuePair  {  CharacterCompareType : : Char ,  atom . code_point  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    while  ( ! match ( TokenType : : RightBracket ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-28 09:27:59 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        if  ( match ( TokenType : : Eof ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            set_error ( Error : : MismatchingBracket ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        auto  first_atom  =  read_class_atom ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( ! first_atom . has_value ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( match ( TokenType : : HyphenMinus ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            consume ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-28 12:28:36 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            if  ( match ( TokenType : : RightBracket ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                //  Allow '-' as the last element in a charclass, even after an atom.
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                m_parser_state . lexer . back ( 2 ) ;  // -]
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                m_parser_state . current_token  =  m_parser_state . lexer . next ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                goto  read_as_single_atom ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            } 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            auto  second_atom  =  read_class_atom ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( ! second_atom . has_value ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( first_atom . value ( ) . is_character_class  | |  second_atom . value ( ) . is_character_class )  { 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                if  ( m_should_use_browser_extended_grammar )  { 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                    if  ( flags . unicode )  { 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                        set_error ( Error : : InvalidRange ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                        return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-07-29 14:18:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                    // CharacterRangeOrUnion > !Unicode > CharClass
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    empend_atom ( * first_atom ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                    ranges . empend ( CompareTypeAndValuePair  {  CharacterCompareType : : Char ,  ( ByteCodeValueType ) ' - '  } ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-29 14:18:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                    empend_atom ( * second_atom ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-02-26 22:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                    continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                } 
							 
						 
					
						
							
								
									
										
										
										
											2021-12-21 17:54:45 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                set_error ( Error : : InvalidRange ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                return  false ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( first_atom . value ( ) . code_point  >  second_atom . value ( ) . code_point )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                set_error ( Error : : InvalidRange ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-02-23 20:42:32 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            VERIFY ( ! first_atom . value ( ) . is_negated ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            VERIFY ( ! second_atom . value ( ) . is_negated ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-11-28 17:47:12 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            ranges . empend ( CompareTypeAndValuePair  {  CharacterCompareType : : CharRange ,  CharRange  {  first_atom . value ( ) . code_point ,  second_atom . value ( ) . code_point  }  } ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-11-28 12:28:36 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    read_as_single_atom : ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        auto  atom  =  first_atom . value ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-29 14:18:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        empend_atom ( atom ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    consume ( TokenType : : RightBracket ,  Error : : MismatchingBracket ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:22:07 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								bool  ECMA262Parser : : parse_class_set_expression ( Vector < CompareTypeAndValuePair > &  compares )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    auto  start_position  =  tell ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    // ClassSetExpression :: ClassUnion | ClassIntersection | ClassSubtraction
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( parse_class_subtraction ( compares ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        consume ( TokenType : : RightBracket ,  Error : : MismatchingBracket ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( has_error ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    back ( tell ( )  -  start_position  +  1 ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( parse_class_intersection ( compares ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        consume ( TokenType : : RightBracket ,  Error : : MismatchingBracket ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( has_error ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    back ( tell ( )  -  start_position  +  1 ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( parse_class_union ( compares ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        consume ( TokenType : : RightBracket ,  Error : : MismatchingBracket ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								bool  ECMA262Parser : : parse_class_union ( Vector < regex : : CompareTypeAndValuePair > &  compares )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    auto  start_position  =  tell ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    ArmedScopeGuard  restore_position  {  [ & ]  {  back ( tell ( )  -  start_position  +  1 ) ;  }  } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    auto  first  =  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    // ClassUnion :: ClassSetRange ClassUnion[opt] | ClassSetOperand ClassUnion[opt]
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    for  ( ; ; )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( ! parse_class_set_range ( compares ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( has_error ( )  | |  match ( TokenType : : RightBracket ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( ! parse_class_set_operand ( compares ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                if  ( first  | |  has_error ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        first  =  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    restore_position . disarm ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  ! has_error ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								bool  ECMA262Parser : : parse_class_intersection ( Vector < CompareTypeAndValuePair > &  compares )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    // ClassIntersection :: ClassSetOperand "&&" [lookahead != "&"] ClassSetOperand
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    //                    | ClassIntersection "&&" [lookahead != "&"] ClassSetOperand
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    Vector < CompareTypeAndValuePair >  lhs ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    Vector < CompareTypeAndValuePair >  rhs ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    auto  start_position  =  tell ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    ArmedScopeGuard  restore_position  {  [ & ]  {  back ( tell ( )  -  start_position  +  1 ) ;  }  } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( ! parse_class_set_operand ( lhs ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( ! try_skip ( " && " sv ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    compares . append ( {  CharacterCompareType : : And ,  0  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    compares . extend ( move ( lhs ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    do  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        rhs . clear_with_capacity ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( ! parse_class_set_operand ( rhs ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        compares . extend ( rhs ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( try_skip ( " &&& " sv ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    }  while  ( ! has_error ( )  & &  try_skip ( " && " sv ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    compares . append ( {  CharacterCompareType : : EndAndOr ,  0  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    restore_position . disarm ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								bool  ECMA262Parser : : parse_class_subtraction ( Vector < CompareTypeAndValuePair > &  compares )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    // ClassSubtraction :: ClassSetOperand "--" ClassSetOperand | ClassSubtraction "--" ClassSetOperand
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    Vector < CompareTypeAndValuePair >  lhs ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    Vector < CompareTypeAndValuePair >  rhs ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    auto  start_position  =  tell ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    ArmedScopeGuard  restore_position  {  [ & ]  {  back ( tell ( )  -  start_position  +  1 ) ;  }  } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( ! parse_class_set_operand ( lhs ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( ! try_skip ( " -- " sv ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    compares . append ( {  CharacterCompareType : : And ,  0  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    compares . extend ( move ( lhs ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    do  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        rhs . clear_with_capacity ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( ! parse_class_set_operand ( rhs ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        compares . append ( {  CharacterCompareType : : TemporaryInverse ,  0  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        compares . extend ( rhs ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    }  while  ( ! has_error ( )  & &  try_skip ( " -- " sv ) ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    compares . append ( {  CharacterCompareType : : EndAndOr ,  0  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    restore_position . disarm ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								bool  ECMA262Parser : : parse_class_set_range ( Vector < CompareTypeAndValuePair > &  compares )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    // ClassSetRange :: ClassSetCharacter "-" ClassSetCharacter
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    auto  start_position  =  tell ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    ArmedScopeGuard  restore_position  {  [ & ]  {  back ( tell ( )  -  start_position  +  1 ) ;  }  } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    auto  lhs  =  parse_class_set_character ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( ! lhs . has_value ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( ! match ( TokenType : : HyphenMinus ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    auto  rhs  =  parse_class_set_character ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( ! rhs . has_value ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    compares . append ( { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        CharacterCompareType : : CharRange , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        CharRange  {  lhs . value ( ) ,  rhs . value ( )  } , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    restore_position . disarm ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								Optional < u32 >  ECMA262Parser : : parse_class_set_character ( )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    // ClassSetCharacter :: [lookahead ∉ ClassSetReservedDoublePunctuator] SourceCharacter but not ClassSetSyntaxCharacter
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    //                    | "\" CharacterEscape[+UnicodeMode]
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    //                    | "\" ClassSetReservedPunctuator
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    //                    | "\" b
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    // ClassSetReservedDoublePunctuator :: one of "&&" "!!" "##" "$$" "%%" "**" "++" ",," ".." "::" ";;" "<<" "==" ">>" "??" "@@" "^^" "``" "~~"
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    // ClassSetSyntaxCharacter :: one of "(" ")" "{" "}" "[" "]" "/" "-" "\" "|"
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    // ClassSetReservedPunctuator :: one of "&" "-" "!" "#" "%" "," ":" ";" "<" "=" ">" "@" "`" "~"
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    constexpr  auto  class_set_reserved_double_punctuator  =  Array  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        " && " sv ,  " !! " sv ,  " ## " sv ,  " $$ " sv ,  " %% " sv ,  " ** " sv ,  " ++ " sv ,  " ,, " sv ,  " .. " sv ,  " :: " sv ,  " ;; " sv ,  " << " sv ,  " == " sv ,  " >> " sv ,  " ?? " sv ,  " @@ " sv ,  " ^^ " sv ,  " `` " sv ,  " ~~ " sv 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2023-06-23 11:02:11 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( done ( ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        set_error ( Error : : InvalidPattern ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  { } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:22:07 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    auto  start_position  =  tell ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    ArmedScopeGuard  restore  {  [ & ]  {  back ( tell ( )  -  start_position  +  1 ) ;  }  } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( try_skip ( " \\ " sv ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( done ( ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            set_error ( Error : : InvalidTrailingEscape ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  { } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        // "\" ClassSetReservedPunctuator
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        for  ( auto  const &  reserved  :  class_set_reserved_double_punctuator )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( try_skip ( reserved ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                // "\" ClassSetReservedPunctuator (ClassSetReservedPunctuator)
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                back ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                restore . disarm ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                return  reserved [ 0 ] ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        // "\" b
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( try_skip ( " b " sv ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            restore . disarm ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  ' \b ' ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        // "\" CharacterEscape[+UnicodeMode]
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        Vector < CompareTypeAndValuePair >  compares ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        size_t  minimum_length  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( parse_character_escape ( compares ,  minimum_length ,  {  . unicode  =  true  } ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            VERIFY ( compares . size ( )  = =  1 ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            auto &  compare  =  compares . first ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            VERIFY ( compare . type  = =  CharacterCompareType : : Char ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            restore . disarm ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  compare . value ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  { } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    // [lookahead ∉ ClassSetReservedDoublePunctuator] SourceCharacter but not ClassSetSyntaxCharacter
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    auto  lookahead_matches  =  any_of ( class_set_reserved_double_punctuator ,  [ this ] ( auto &  reserved )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  try_skip ( reserved ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( lookahead_matches ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  { } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    for  ( auto  character  :  {  " ( " sv ,  " ) " sv ,  " { " sv ,  " } " sv ,  " [ " sv ,  " ] " sv ,  " / " sv ,  " - " sv ,  " \\ " sv ,  " | " sv  } )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( try_skip ( character ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  { } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    restore . disarm ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  skip ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								bool  ECMA262Parser : : parse_class_set_operand ( Vector < regex : : CompareTypeAndValuePair > &  compares )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    auto  start_position  =  tell ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    // ClassSetOperand :: ClassSetCharacter | ClassStringDisjunction | NestedClass
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( auto  character  =  parse_class_set_character ( ) ;  character . has_value ( ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        compares . append ( {  CharacterCompareType : : Char ,  character . value ( )  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    // NestedClass :: "[" [lookahead != "^"] ClassContents[+UnicodeMode +UnicodeSetsMode] "]"
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    //              | "[" "^" ClassContents[+UnicodeMode +UnicodeSetsMode] "]"
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    //              | "\" CharacterClassEscape[+UnicodeMode]
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( parse_nested_class ( compares ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( has_error ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    auto  negated  =  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( auto  ch  =  parse_character_class_escape ( negated ,  true ) ;  ch . has_value ( ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( negated ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            compares . append ( {  CharacterCompareType : : TemporaryInverse ,  1  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        compares . append ( {  CharacterCompareType : : CharClass ,  ( ByteCodeValueType ) ch . value ( )  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    PropertyEscape  property  { } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( parse_unicode_property_escape ( property ,  negated ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( negated ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            compares . empend ( CompareTypeAndValuePair  {  CharacterCompareType : : Inverse ,  0  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        property . visit ( 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            [ & ] ( Unicode : : Property  property )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                compares . empend ( CompareTypeAndValuePair  {  CharacterCompareType : : Property ,  ( ByteCodeValueType ) property  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            } , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            [ & ] ( Unicode : : GeneralCategory  general_category )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                compares . empend ( CompareTypeAndValuePair  {  CharacterCompareType : : GeneralCategory ,  ( ByteCodeValueType ) general_category  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            } , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            [ & ] ( Script  script )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                if  ( script . is_extension ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    compares . empend ( CompareTypeAndValuePair  {  CharacterCompareType : : ScriptExtension ,  ( ByteCodeValueType ) script . script  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    compares . empend ( CompareTypeAndValuePair  {  CharacterCompareType : : Script ,  ( ByteCodeValueType ) script . script  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            } , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            [ ] ( Empty & )  {  VERIFY_NOT_REACHED ( ) ;  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( has_error ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    // ClassStringDisjunction :: "\q{" ClassStringDisjunctionContents "}"
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    // ClassStringDisjunctionContents :: ClassString | ClassString "|" ClassStringDisjunctionContents
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    // ClassString :: [empty] | NonEmptyClassString
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    // NonEmptyClassString :: ClassCharacter NonEmptyClassString[opt]
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( try_skip ( " \\ q{ " sv ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        // FIXME: Implement this :P
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  set_error ( Error : : InvalidCharacterClass ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    back ( tell ( )  -  start_position  +  1 ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								bool  ECMA262Parser : : parse_nested_class ( Vector < regex : : CompareTypeAndValuePair > &  compares )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    auto  start_position  =  tell ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    // NestedClass :: "[" [lookahead ≠ ^ ] ClassContents [+UnicodeMode, +UnicodeSetsMode] "]"
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    //              | "[" "^" ClassContents[+UnicodeMode, +UnicodeSetsMode] "]"
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    //              | "\" CharacterClassEscape[+UnicodeMode]
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( match ( TokenType : : LeftBracket ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        compares . append ( CompareTypeAndValuePair  {  CharacterCompareType : : Or ,  0  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( match ( TokenType : : Circumflex ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            // Negated charclass
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            compares . empend ( CompareTypeAndValuePair  {  CharacterCompareType : : Inverse ,  0  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        // ClassContents :: [empty]
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( match ( TokenType : : RightBracket ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            // Should only have at most an 'Inverse' (after an 'Or')
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            VERIFY ( compares . size ( )  < =  2 ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            compares . append ( CompareTypeAndValuePair  {  CharacterCompareType : : EndAndOr ,  0  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        // ClassContents :: [+UnicodeSetsMode] ClassSetExpression
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( ! parse_class_set_expression ( compares ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        compares . append ( CompareTypeAndValuePair  {  CharacterCompareType : : EndAndOr ,  0  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( try_skip ( " \\ " sv ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        auto  negated  =  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( auto  char_class  =  parse_character_class_escape ( negated ) ;  char_class . has_value ( ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( negated ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                compares . append ( {  CharacterCompareType : : TemporaryInverse ,  1  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            compares . append ( {  CharacterCompareType : : CharClass ,  ( ByteCodeValueType ) char_class . value ( )  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        PropertyEscape  property  { } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( parse_unicode_property_escape ( property ,  negated ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( negated ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                compares . empend ( CompareTypeAndValuePair  {  CharacterCompareType : : Inverse ,  0  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            property . visit ( 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                [ & ] ( Unicode : : Property  property )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    compares . empend ( CompareTypeAndValuePair  {  CharacterCompareType : : Property ,  ( ByteCodeValueType ) property  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                } , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                [ & ] ( Unicode : : GeneralCategory  general_category )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    compares . empend ( CompareTypeAndValuePair  {  CharacterCompareType : : GeneralCategory ,  ( ByteCodeValueType ) general_category  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                } , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                [ & ] ( Script  script )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    if  ( script . is_extension ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                        compares . empend ( CompareTypeAndValuePair  {  CharacterCompareType : : ScriptExtension ,  ( ByteCodeValueType ) script . script  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                        compares . empend ( CompareTypeAndValuePair  {  CharacterCompareType : : Script ,  ( ByteCodeValueType ) script . script  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                } , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                [ ] ( Empty & )  {  VERIFY_NOT_REACHED ( ) ;  } ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( has_error ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    back ( tell ( )  -  start_position  +  1 ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-07-31 17:46:05 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								bool  ECMA262Parser : : parse_unicode_property_escape ( PropertyEscape &  property ,  bool &  negated )  
						 
					
						
							
								
									
										
										
										
											2021-07-29 14:18:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    negated  =  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( try_skip ( " p " sv ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-29 14:18:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        negated  =  false ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    else  if  ( try_skip ( " P " sv ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-29 14:18:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        negated  =  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    else 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    auto  parsed_property  =  read_unicode_property_escape ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( ! parsed_property . has_value ( ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        set_error ( Error : : InvalidNameForProperty ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-07-31 17:46:05 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    property  =  move ( * parsed_property ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-29 14:18:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-07-31 17:46:05 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    return  property . visit ( 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        [ this ] ( Unicode : : Property  property )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( ! Unicode : : is_ecma262_property ( property ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                set_error ( Error : : InvalidNameForProperty ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } , 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-04 06:35:48 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        [ ] ( Unicode : : GeneralCategory )  {  return  true ;  } , 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-13 17:31:39 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        [ ] ( Script )  {  return  true ;  } , 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        [ ] ( Empty & )  - >  bool  {  VERIFY_NOT_REACHED ( ) ;  } ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-29 14:18:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2023-01-08 19:23:00 -05:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								DeprecatedFlyString  ECMA262Parser : : read_capture_group_specifier ( bool  take_starting_angle_bracket )  
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
									
										
										
										
											2021-12-19 17:03:08 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    static  auto  id_start_category  =  Unicode : : property_from_string ( " ID_Start " sv ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    static  auto  id_continue_category  =  Unicode : : property_from_string ( " ID_Continue " sv ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2024-01-11 12:51:13 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    static  constexpr  u32  const  REPLACEMENT_CHARACTER  =  0xFFFD ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    constexpr  u32  const  ZERO_WIDTH_NON_JOINER  {  0x200C  } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    constexpr  u32  const  ZERO_WIDTH_JOINER  {  0x200D  } ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-12-19 17:03:08 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( take_starting_angle_bracket  & &  ! consume ( " < " ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  { } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-08-18 17:17:18 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    StringBuilder  builder ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-12-19 17:03:08 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    auto  consume_code_point  =  [ & ]  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        Utf8View  utf_8_view  {  m_parser_state . lexer . source ( ) . substring_view ( m_parser_state . lexer . tell ( )  -  1 )  } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( utf_8_view . is_empty ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  REPLACEMENT_CHARACTER ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        u32  code_point  =  * utf_8_view . begin ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        auto  characters  =  utf_8_view . byte_offset_of ( 1 ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        while  ( characters - -  >  0 ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  code_point ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        // The first character is limited to: https://tc39.es/ecma262/#prod-RegExpIdentifierStart
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        //  RegExpIdentifierStart[UnicodeMode] ::
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        //      IdentifierStartChar
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        //      \ RegExpUnicodeEscapeSequence[+UnicodeMode]
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        //      [~UnicodeMode] UnicodeLeadSurrogate UnicodeTrailSurrogate
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        auto  code_point  =  consume_code_point ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( code_point  = =  ' \\ '  & &  match ( ' u ' ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( auto  maybe_code_point  =  consume_escaped_code_point ( true ) ;  maybe_code_point . has_value ( ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                code_point  =  * maybe_code_point ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            }  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                set_error ( Error : : InvalidNameForCaptureGroup ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                return  { } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( is_ascii ( code_point ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            // The only valid ID_Start unicode characters in ascii are the letters.
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( ! is_ascii_alpha ( code_point )  & &  code_point  ! =  ' $ '  & &  code_point  ! =  ' _ ' )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                set_error ( Error : : InvalidNameForCaptureGroup ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                return  { } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        }  else  if  ( id_start_category . has_value ( )  & &  ! Unicode : : code_point_has_property ( code_point ,  * id_start_category ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            set_error ( Error : : InvalidNameForCaptureGroup ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  { } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        builder . append_code_point ( code_point ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    bool  hit_end  =  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    // Any following characters are limited to:
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    //  RegExpIdentifierPart[UnicodeMode] ::
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    //      IdentifierPartChar
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    //      \ RegExpUnicodeEscapeSequence[+UnicodeMode]
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    //      [~UnicodeMode] UnicodeLeadSurrogate UnicodeTrailSurrogate
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-08-18 17:17:18 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    while  ( match ( TokenType : : Char )  | |  match ( TokenType : : Dollar )  | |  match ( TokenType : : LeftCurly )  | |  match ( TokenType : : RightCurly ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2021-12-19 17:03:08 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        auto  code_point  =  consume_code_point ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( code_point  = =  ' > ' )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            hit_end  =  true ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            break ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-12-19 17:03:08 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-18 17:17:18 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-12-19 17:03:08 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        if  ( code_point  = =  ' \\ ' )  { 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            if  ( ! try_skip ( " u " sv ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2021-12-19 17:03:08 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                set_error ( Error : : InvalidNameForCaptureGroup ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                return  { } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( auto  maybe_code_point  =  consume_escaped_code_point ( true ) ;  maybe_code_point . has_value ( ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                code_point  =  * maybe_code_point ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-18 17:17:18 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            }  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                set_error ( Error : : InvalidNameForCaptureGroup ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                return  { } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
									
										
										
										
											2021-12-19 17:03:08 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( is_ascii ( code_point ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            // The only valid ID_Continue unicode characters in ascii are the letters and numbers.
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( ! is_ascii_alphanumeric ( code_point )  & &  code_point  ! =  ' $ '  & &  code_point  ! =  ' _ ' )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                set_error ( Error : : InvalidNameForCaptureGroup ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                return  { } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        }  else  if  ( code_point  ! =  ZERO_WIDTH_JOINER  & &  code_point  ! =  ZERO_WIDTH_NON_JOINER )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( id_continue_category . has_value ( )  & &  ! Unicode : : code_point_has_property ( code_point ,  * id_continue_category ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                set_error ( Error : : InvalidNameForCaptureGroup ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                return  { } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        builder . append_code_point ( code_point ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2023-12-16 17:49:34 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    DeprecatedFlyString  name  =  builder . to_byte_string ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-12-19 17:03:08 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( ! hit_end  | |  name . is_empty ( ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        set_error ( Error : : InvalidNameForCaptureGroup ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  name ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-07-31 17:46:05 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								Optional < ECMA262Parser : : PropertyEscape >  ECMA262Parser : : read_unicode_property_escape ( )  
						 
					
						
							
								
									
										
										
										
											2021-07-29 14:18:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    consume ( TokenType : : LeftCurly ,  Error : : InvalidPattern ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-07-31 18:06:53 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    auto  read_until  =  [ & ] < typename . . .  Ts > ( Ts & & . . .  terminators )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        auto  start_token  =  m_parser_state . current_token ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        size_t  offset  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        while  ( match ( TokenType : : Char ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( m_parser_state . current_token . value ( ) . is_one_of ( forward < Ts > ( terminators ) . . . ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            offset  + =  consume ( ) . value ( ) . length ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  StringView  {  start_token . value ( ) . characters_without_null_termination ( ) ,  offset  } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    StringView  property_type ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    StringView  property_name  =  read_until ( " = " sv ,  " } " sv ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( try_skip ( " = " sv ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( property_name . is_empty ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  { } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        property_type  =  property_name ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        property_name  =  read_until ( " } " sv ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-29 14:18:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    consume ( TokenType : : RightCurly ,  Error : : InvalidPattern ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-07-31 18:06:53 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( property_type . is_empty ( ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( auto  property  =  Unicode : : property_from_string ( property_name ) ;  property . has_value ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  {  * property  } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( auto  general_category  =  Unicode : : general_category_from_string ( property_name ) ;  general_category . has_value ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  {  * general_category  } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    }  else  if  ( ( property_type  = =  " General_Category " sv )  | |  ( property_type  = =  " gc " sv ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( auto  general_category  =  Unicode : : general_category_from_string ( property_name ) ;  general_category . has_value ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  {  * general_category  } ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-04 06:35:48 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    }  else  if  ( ( property_type  = =  " Script " sv )  | |  ( property_type  = =  " sc " sv ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( auto  script  =  Unicode : : script_from_string ( property_name ) ;  script . has_value ( ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-04 07:26:25 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            return  Script  {  * script ,  false  } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    }  else  if  ( ( property_type  = =  " Script_Extensions " sv )  | |  ( property_type  = =  " scx " sv ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( auto  script  =  Unicode : : script_from_string ( property_name ) ;  script . has_value ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  Script  {  * script ,  true  } ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-31 18:06:53 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-07-31 17:46:05 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    return  { } ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-29 14:18:51 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								bool  ECMA262Parser : : parse_capture_group ( ByteCode &  stack ,  size_t &  match_length_minimum ,  ParseFlags  flags )  
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    consume ( TokenType : : LeftParen ,  Error : : InvalidPattern ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-07-23 19:37:18 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    auto  register_capture_group_in_current_scope  =  [ & ] ( auto  identifier )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        m_capture_groups_in_scope . last ( ) . empend ( identifier ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( match ( TokenType : : Questionmark ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        // Non-capturing group or group with specifier.
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( match ( TokenType : : Colon ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            ByteCode  noncapture_group_bytecode ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            size_t  length  =  0 ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-23 19:37:18 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            enter_capture_group_scope ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            if  ( ! parse_disjunction ( noncapture_group_bytecode ,  length ,  flags ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                return  set_error ( Error : : InvalidPattern ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-01-21 14:30:47 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            clear_all_capture_groups_in_scope ( stack ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-23 19:37:18 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            exit_capture_group_scope ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            consume ( TokenType : : RightParen ,  Error : : MismatchingParen ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-06-12 13:24:45 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            stack . extend ( move ( noncapture_group_bytecode ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            match_length_minimum  + =  length ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        if  ( consume ( " < " ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            + + m_parser_state . named_capture_groups_count ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-04-05 06:25:57 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            auto  group_index  =  + + m_parser_state . capture_groups_count ;  // Named capture groups count as normal capture groups too.
 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            auto  name  =  read_capture_group_specifier ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            if  ( name . is_empty ( ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                set_error ( Error : : InvalidNameForCaptureGroup ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-12-19 02:31:21 +01:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            if  ( m_parser_state . named_capture_groups . contains ( name ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                set_error ( Error : : DuplicateNamedCapture ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            ByteCode  capture_group_bytecode ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            size_t  length  =  0 ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-23 19:37:18 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            enter_capture_group_scope ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            if  ( ! parse_disjunction ( capture_group_bytecode ,  length ,  flags ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                return  set_error ( Error : : InvalidPattern ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-01-21 14:30:47 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            clear_all_capture_groups_in_scope ( stack ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-23 19:37:18 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            exit_capture_group_scope ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            register_capture_group_in_current_scope ( group_index ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            consume ( TokenType : : RightParen ,  Error : : MismatchingParen ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-04-05 06:25:57 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            stack . insert_bytecode_group_capture_left ( group_index ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-06-12 13:24:45 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            stack . extend ( move ( capture_group_bytecode ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-18 17:17:18 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            stack . insert_bytecode_group_capture_right ( group_index ,  name . view ( ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            match_length_minimum  + =  length ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2021-04-05 06:25:57 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            m_parser_state . capture_group_minimum_lengths . set ( group_index ,  length ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-08-14 16:28:54 -04:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            m_parser_state . named_capture_groups . set ( name ,  {  group_index ,  length  } ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        set_error ( Error : : InvalidCaptureGroup ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  false ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    auto  group_index  =  + + m_parser_state . capture_groups_count ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-23 19:37:18 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    enter_capture_group_scope ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    ByteCode  capture_group_bytecode ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    size_t  length  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-07-20 23:16:53 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    if  ( ! parse_disjunction ( capture_group_bytecode ,  length ,  flags ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        return  set_error ( Error : : InvalidPattern ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-01-21 14:30:47 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    clear_all_capture_groups_in_scope ( stack ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-07-23 19:37:18 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    exit_capture_group_scope ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    register_capture_group_in_current_scope ( group_index ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    stack . insert_bytecode_group_capture_left ( group_index ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-06-12 13:24:45 +02:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								    stack . extend ( move ( capture_group_bytecode ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2020-11-27 19:33:53 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    m_parser_state . capture_group_minimum_lengths . set ( group_index ,  length ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    consume ( TokenType : : RightParen ,  Error : : MismatchingParen ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    stack . insert_bytecode_group_capture_right ( group_index ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    match_length_minimum  + =  length ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  true ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
									
										
										
										
											2021-04-01 18:30:47 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								size_t  ECMA262Parser : : ensure_total_number_of_capturing_parenthesis ( )  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								{  
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    if  ( m_total_number_of_capturing_parenthesis . has_value ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        return  m_total_number_of_capturing_parenthesis . value ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    GenericLexer  lexer  {  m_parser_state . lexer . source ( )  } ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    size_t  count  =  0 ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    while  ( ! lexer . is_eof ( ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        switch  ( lexer . peek ( ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        case  ' \\ ' : 
							 
						 
					
						
							
								
									
										
										
										
											2023-04-13 21:12:59 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            lexer . consume ( min ( lexer . tell_remaining ( ) ,  2 ) ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-04-01 18:30:47 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        case  ' [ ' : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            while  ( ! lexer . is_eof ( ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2022-09-06 23:56:12 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                if  ( lexer . consume_specific ( ' \\ ' ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2023-04-13 21:12:59 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                    if  ( lexer . is_eof ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                        break ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-04-01 18:30:47 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                    lexer . consume ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-09-06 23:56:12 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                    continue ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                if  ( lexer . consume_specific ( ' ] ' ) )  { 
							 
						 
					
						
							
								
									
										
										
										
											2021-04-01 18:30:47 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                    break ; 
							 
						 
					
						
							
								
									
										
										
										
											2022-09-06 23:56:12 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                } 
							 
						 
					
						
							
								
									
										
										
										
											2023-04-13 21:12:59 +03:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                if  ( lexer . is_eof ( ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    break ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-04-01 18:30:47 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                lexer . consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								        case  ' ( ' : 
							 
						 
					
						
							
								
									
										
										
										
											2021-09-14 19:44:38 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            lexer . consume ( ) ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-04-01 18:30:47 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								            if  ( lexer . consume_specific ( ' ? ' ) )  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                // non-capturing group '(?:', lookaround '(?<='/'(?<!', or named capture '(?<'
 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                if  ( ! lexer . consume_specific ( ' < ' ) ) 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                    break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
									
										
										
										
											2022-07-11 17:32:29 +00:00 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                if  ( lexer . next_is ( is_any_of ( " =! " sv ) ) ) 
							 
						 
					
						
							
								
									
										
										
										
											2021-04-01 18:30:47 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								                    break ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                + + count ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            }  else  { 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								                + + count ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            break ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-09-14 19:44:38 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        default : 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            lexer . consume ( ) ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								            break ; 
							 
						 
					
						
							
								
									
										
										
										
											2021-04-01 18:30:47 +04:30 
										
									 
								 
							 
							
								
									
										 
								
							 
							
								 
							
							
								        } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    } 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    m_total_number_of_capturing_parenthesis  =  count ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								    return  count ; 
							 
						 
					
						
							
								
							 
							
								
							 
							
								 
							
							
								}  
						 
					
						
							
								
									
										
											 
										
											
												LibRegex: Add a regular expression library
This commit is a mix of several commits, squashed into one because the
commits before 'Move regex to own Library and fix all the broken stuff'
were not fixable in any elegant way.
The commits are listed below for "historical" purposes:
- AK: Add options/flags and Errors for regular expressions
Flags can be provided for any possible flavour by adding a new scoped enum.
Handling of flags is done by templated Options class and the overloaded
'|' and '&' operators.
- AK: Add Lexer for regular expressions
The lexer parses the input and extracts tokens needed to parse a regular
expression.
- AK: Add regex Parser and PosixExtendedParser
This patchset adds a abstract parser class that can be derived to implement
different parsers. A parser produces bytecode to be executed within the
regex matcher.
- AK: Add regex matcher
This patchset adds an regex matcher based on the principles of the T-REX VM.
The bytecode pruduced by the respective Parser is put into the matcher and
the VM will recursively execute the bytecode according to the available OpCodes.
Possible improvement: the recursion could be replaced by multi threading capabilities.
To match a Regular expression, e.g. for the Posix standard regular expression matcher
use the following API:
```
Pattern<PosixExtendedParser> pattern("^.*$");
auto result = pattern.match("Well, hello friends!\nHello World!"); // Match whole needle
EXPECT(result.count == 1);
EXPECT(result.matches.at(0).view.starts_with("Well"));
EXPECT(result.matches.at(0).view.end() == "!");
result = pattern.match("Well, hello friends!\nHello World!", PosixFlags::Multiline); // Match line by line
EXPECT(result.count == 2);
EXPECT(result.matches.at(0).view == "Well, hello friends!");
EXPECT(result.matches.at(1).view == "Hello World!");
EXPECT(pattern.has_match("Well,....")); // Just check if match without a result, which saves some resources.
```
- AK: Rework regex to work with opcodes objects
This patchsets reworks the matcher to work on a more structured base.
For that an abstract OpCode class and derived classes for the specific
OpCodes have been added. The respective opcode logic is contained in
each respective execute() method.
- AK: Add benchmark for regex
- AK: Some optimization in regex for runtime and memory
- LibRegex: Move regex to own Library and fix all the broken stuff
Now regex works again and grep utility is also in place for testing.
This commit also fixes the use of regex.h in C by making `regex_t`
an opaque (-ish) type, which makes its behaviour consistent between
C and C++ compilers.
Previously, <regex.h> would've blown C compilers up, and even if it
didn't, would've caused a leak in C code, and not in C++ code (due to
the existence of `OwnPtr` inside the struct).
To make this whole ordeal easier to deal with (for now), this pulls the
definitions of `reg*()` into LibRegex.
pros:
- The circular dependency between LibC and LibRegex is broken
- Eaiser to test (without accidentally pulling in the host's libc!)
cons:
- Using any of the regex.h functions will require the user to link -lregex
- The symbols will be missing from libc, which will be a big surprise
  down the line (especially with shared libs).
Co-Authored-By: Ali Mohammad Pur <ali.mpfard@gmail.com>
											 
										 
										
											2020-04-26 14:45:10 +02:00 
										
									 
								 
							 
							
								
							 
							
								 
							
							
								}