PHP PREG_JIT_STACKLIMIT_ERROR - 低效的正则表达式

coder 2024-04-12 原文

我在处理稍长的字符串时在 preg_replace_callback() 函数中收到 PREG_JIT_STACKLIMIT_ERROR 错误。超过 2000 个字符是无效的(超过 2000 个匹配正则表达式的字符，而不是 2000 个字符串)。
我已经读到它是由效率低下的正则表达式引起的，但我不能让我的正则表达式更简单。这是我的正则表达式:

/\{@([a-z0-9_]+)-(((%?[a-z0-9_]+(:[a-z0-9_]+)*)+)\| (((?R)|.)*)@\}/我们

它应该匹配如下字符串:

1) {@if-statement|echo this|echo otherwise@}

2) {@if-statement:sub|echo this|echo otherwise@}

3) {@if-statement%statament2:sub|echo this@}

也像这样嵌套:

4) {@if-statement|echo this| {@if-statement2|回显这个|否则回显@} @

我试图将其简化为:

/\{@([a-z0-9_]+)-([a-z0-9_]+)\|(((?R)|.)*)@\}/我们

但看起来错误是由 (((?R)|.)*) 部分引起的。有什么建议吗？

测试代码:

$string = '{@if-is_not_logged_homepage|
<header id="header_home">
    <div class="in">
        <div class="top">
            <h1 class="logo"><a href="/"><img src="/img/logo-home.png" alt=""></a></h1>
            <div class="login_outer_wrapper">
                <button id="login"><div class="a"><i class="stripe"><i></i></i>Log in</div></button>
                <div id="login_wrapper">
                    <form method="post" action="{^login^}" id="form_login_global">
                        <div class="form_field no_description">
                            <label>{!auth:login_email!}</label>
                            <div class="input"><input type="text" name="form[login]"></div>
                        </div>
                        <div class="form_field no_description password">
                            <label>{!auth:password!}</label>
                            <div class="input"><input type="password" name="form[password]"></div>
                        </div>
                        <div class="remember">
                            <input type="checkbox" name="remember" id="remember_me_check" checked>
                            <label for="remember_me_check"><i class="fa fa-check" aria-hidden="true"></i>Remember</label>
                        </div>
                        <div class="submit_box">
                            <button class="btn btn_check">Log in</button>
                        </div>
                    </form>
                </div>
            </div>
        </div>
        <div class="content clr">
            <div class="main_menu">
                <a href="">
                    <i class="ico a"><i class="fa fa-lightbulb-o" aria-hidden="true"></i></i>
                    <span>Idea</span>
                    <div>&nbsp;</div>
                </a>
                <a href="">
                    <i class="ico b"><i class="fa fa-user" aria-hidden="true"></i></i>
                    <span>FFa</span>
                </a>
                <a href="">
                    <i class="ico c"><i class="fa fa-briefcase" aria-hidden="true"></i></i>
                    <span>Buss</span>
                </a>
            </div>
            <div class="text_wrapper">

                <div>
                    <div class="register_wrapper">
                        <a id="main_register" class="btn register">Załóż konto</a>
                        <form method="post" action="{^login^}" id="form_register_home">
                            <div class="form_field no_description">
                                <label>{!auth:email!}</label>
                                <div class="input"><input type="text" name="form2[email]"></div>
                            </div>
                            <div class="form_field no_description password">
                                <label>{!auth:password!}</label>
                                <div class="input tooltip"><input type="password" name="form2[password]"><i class="fa fa-info-circle tooltip_open" aria-hidden="true" title="{!auth:password_format!}"></i></div>

                            </div>
                            <div class="form_field terms no_description">
                                <div class="input">
                                    <input type="checkbox" name="form2[terms]" id="terms_check">
                                    <label for="terms_check"><i class="fa fa-check" aria-hidden="true"></i>Agree</label>
                                </div>
                            </div>
                            <div class="form_field no_description">
                                <div class="input captcha_wrapper">
                                    <div class="g-recaptcha" data-sitekey="{%captcha_public_key%}"></div>
                                </div>
                            </div>
                            <div class="submit_box">
                                <button class="btn btn_check">{!auth:register_btn!}</button>
                            </div>
                        </form>
                    </div>
                </div>
            </div>
        </div>
    </div>
</header>
@}';

$if_counter = 0;

$parsed_view = preg_replace_callback( '/\{@([a-z0-9_]+)-((%?[a-z0-9_]+(:[a-z0-9_]+)*)+)\|(((?R)|.)*)@\}/Us',
        function( $match ) use( &$if_counter ){
            return '<-{'. ( $if_counter ++ ) .'}->';
        }, $string );


var_dump($parsed_view); // NULL

最佳答案

什么是 PCRE JIT ？

Just-in-time compiling is a heavyweight optimization that can greatly speed up pattern matching. However, it comes at the cost of extra processing before the match is performed. Therefore, it is of most benefit when the same pattern is going to be matched many times.

它基本上是如何工作的？

PCRE (and JIT) is a recursive, depth-first engine, so it needs a stack where the local data of the current node is pushed before checking its child nodes... When the compiled JIT code runs, it needs a block of memory to use as a stack. By default, it uses 32K on the machine stack. However, some large or complicated patterns need more than this. The error PCRE_ERROR_JIT_STACKLIMIT is given when there is not enough stack.

通过第一句话，您将了解 JIT 是 PHP [v7.*] PCRE 中默认启用的可选功能。所以你可以很容易地关闭它:pcre.jit = 0(虽然不推荐)

但是，当收到 preg_* 函数的错误代码 #6 时，这意味着 JIT 可能达到堆栈大小限制。

由于捕获组比非捕获组消耗更多的内存(根据聚类的量词类型，打算使用更多的内存):

正在捕获组 OP_CBRA ( pcre_jit_compile.c:#1138 ) -(实际内存不止于此):

case OP_CBRA:
case OP_SCBRA:
bracketlen = 1 + LINK_SIZE + IMM2_SIZE;
break;

非捕获组 OP_BRA ( pcre_jit_compile.c:#1134 ) - (real 内存不止于此):

case OP_BRA:
bracketlen = 1 + LINK_SIZE;
break;

因此，在您自己的 RegEx 中将捕获组更改为非捕获组可以提供正确的输出(我不知道这究竟节省了多少内存)

但似乎您需要捕获组，而且它们是必要的。然后你应该为了性能重写你的正则表达式。回溯几乎是 RegEx 中应该考虑的所有内容。

更新#1

解决方案:

(?(DEFINE)
  (?<recurs>
    (?! {@|@} ) [^|] [^{@|\\]* ( \\.[^{@|\\]* )* | (?R)
  )
)
{@
(?<If> \w+)-
(?<Condition> (%?\w++ (:\w+)*)* )
(?<True> [|] [^{@|]*+ (?&recurs)* )
(?<False> [|] (?&recurs)* )?
\s*@}

Live demo

PHP 代码(观察反斜杠转义):

preg_match_all('/(?(DEFINE)
  (?<recurs>
    (?! {@|@} ) [^|] [^{@|\\\\]* ( \\\\.[^{@|\\\\]* )* | (?R)
  )
)
{@
(?<If> \w+ )-
(?<Condition> (%?\w++ (:\w+)*)* )
(?<True> [|] [^{@|]*+ (?&recurs)* )
(?<False> [|] (?&recurs)* )?
\s*@}/x', $string, $matches);

这是您自己的正则表达式，它经过优化以减少回溯步骤。因此，任何本应由您自己的匹配的内容也将与此匹配。

不遵循嵌套 if block 的正则表达式:

{@
(?<If> \w+)-
(?<Condition> (%?\w++ (:\w+)*)* )
(?<True> [|] [^|\\]* (?: \\.[^|\\]* )* )
(?<False> [|] \X*)?
@}

Live demo

大多数量词都是通过在其后附加 + 来写的(避免回溯)。

关于PHP PREG_JIT_STACKLIMIT_ERROR - 低效的正则表达式，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/39685883/

低效 PREG_JIT_STACKLIMIT_ERROR 34 gt lt php regex php-7

有关PHP PREG_JIT_STACKLIMIT_ERROR - 低效的正则表达式的更多相关文章

ruby-on-rails - Ruby on Rails : . 常量化 : wrong constant name error? - 2
我正在使用这个:4.times{|i|assert_not_equal("content#{i+2}".constantize,object.first_content)}我之前声明过局部变量content1content2content3content4content5我得到的错误NameError:wrongconstantnamecontent2这个错误是什么意思？我很确定我想要content2=\ 最佳答案你必须用一个大字母来调用ruby常量:Content2而不是content2。Aconstantnamestart
ruby 正则表达式 - 如何替换字符串中匹配项的第 n 个实例 - 2
在我的应用程序中，我需要能够找到所有数字子字符串，然后扫描每个子字符串，找到第一个匹配范围(例如5到15之间)的子字符串，并将该实例替换为另一个字符串“X”。我的测试字符串s="1foo100bar10gee1"我的初始模式是1个或多个数字的任何字符串，例如，re=Regexp.new(/\d+/)matches=s.scan(re)给出["1","100","10","1"]如果我想用“X”替换第N个匹配项，并且只替换第N个匹配项，我该怎么做？例如，如果我想替换第三个匹配项“10”(匹配项[2])，我不能只说s[matches[2]]="X"因为它做了两次替换“1fooX0barXg
ruby-on-rails - 错误 : Error installing pg: ERROR: Failed to build gem native extension - 2
我克隆了一个rails仓库，我现在正尝试捆绑安装背景:OSXElCapitanruby2.2.3p173(2015-08-18修订版51636)[x86_64-darwin15]rails-v在您的Gemfile中列出的或native可用的任何gem源中找不到gem'pg(>=0)ruby'。运行bundleinstall以安装缺少的gem。bundleinstallFetchinggemmetadatafromhttps://rubygems.org/............Fetchingversionmetadatafromhttps://rubygems.org/...Fe
ruby - RVM "ERROR: Unable to checkout branch ."单用户 - 2
我在新的Debian6VirtualBoxVM上安装RVM时遇到问题。我已经安装了所有需要的包并使用下载了安装脚本(curl-shttps://rvm.beginrescueend.com/install/rvm)>rvm，但以单个用户身份运行时bashrvm我收到以下错误消息:ERROR:Unabletocheckoutbranch.安装在这里停止，并且(据我所知)没有安装RVM的任何文件。如果我以root身份运行脚本(对于多用户安装)，我会收到另一条消息:Successfullycheckedoutbranch''安装程序继续并指示成功，但未添加.rvm目录，甚至在修改我的.bas
ruby - 正则表达式将非英文字母匹配为非单词字符 - 2
@raw_array[i]=~/[\W]/非常简单的正则表达式。当我用一些非拉丁字母(具体来说是俄语)尝试时，条件是错误的。我能用它做什么？最佳答案 @raw_array[i]=~/[\p{L}]/使用西里尔字符进行测试。引用:http://www.regular-expressions.info/unicode.html#prop 关于ruby-正则表达式将非英文字母匹配为非单词字符，我们在StackOverflow上找到一个类似的问题： https://
ruby - 正则表达式在哪个位置失败？ - 2
我需要一个非常简单的字符串验证器来显示第一个符号与所需格式不对应的位置。我想使用正则表达式，但在这种情况下，我必须找到与表达式相对应的字符串停止的位置，但我找不到可以做到这一点的方法。(这一定是一种相当简单的方法……也许没有？)例如，如果我有正则表达式:/^Q+E+R+$/带字符串:"QQQQEEE2ER"期望的结果应该是7 最佳答案一个想法:你可以做的是标记你的模式并用可选的嵌套捕获组编写它:^(Q+(E+(R+($)?)?)?)?然后你只需要计算你获得的捕获组的数量就可以知道正则表达式引擎在模式中停止的位置，你可以确定匹配结束
ruby - 有没有办法从 ruby case 语句中访问表达式？ - 2
我想从then子句中访问case语句表达式，即food="cheese"casefoodwhen"dip"then"carrotsticks"when"cheese"then"#{expr}crackers"else"mayo"end在这种情况下，expr是食物的当前值(value)。在这种情况下，我知道，我可以简单地访问变量food，但是在某些情况下，该值可能无法再访问(array.shift等)。除了将expr移出到局部变量然后访问它之外，是否有直接访问caseexpr值的方法？罗亚附注我知道这个具体示例很简单，只是一个示例场景。最佳答案
ruby - 正则表达式 - 排除一个字符 - 2
这是一个例子:s="abcd+subtext@example.com"s.match(/+[^@]*/)Result=>"+subtext"问题是，我不想在其中包含“+”。我希望结果是“潜台词”，没有+ 最佳答案您可以在正则表达式中使用括号来创建匹配组:s="abcd+subtext@example.com"s=~/\+([^@]*)/&&$1=>"subtext" 关于ruby-正则表达式-排除一个字符，我们在StackOverflow上找到一个类似的问题：
ruby - 如何遍历 Ruby 中所有正则表达式匹配的字符串？ - 2
我们有一个字符串:“”这个正则表达式://i如何从当前字符串中获取所有匹配项？最佳答案 "".scan(//)参见scan在ruby-docs上关于ruby-如何遍历Ruby中所有正则表达式匹配的字符串？，我们在StackOverflow上找到一个类似的问题： https://stackoverflow.com/questions/6857852/
Ruby 正则表达式匹配逗号，但忽略括号中的逗号 - 2
我正在尝试通过正则表达式拆分参数列表。这是一个带有我的参数列表的字符串:"a=b,c=3,d=[1,3,5,7],e,f=g"我想要的是:["a=b","c=3","d=[1,3,5,7]","e","f=g"]我试过先行，但Ruby不允许使用动态范围后行，所以这行不通:/(?如何让正则表达式忽略方括号中的所有内容？最佳答案也许这样的东西对你有用:str.scan(/(?:\[.*?\]|[^,])+/)编辑再三考虑。简单的非贪婪匹配器在某些嵌套括号的情况下会失败。关于Ruby正则

PHP PREG_JIT_STACKLIMIT_ERROR - 低效的正则表达式

更新#1

有关PHP PREG_JIT_STACKLIMIT_ERROR - 低效的正则表达式的更多相关文章

随机推荐