xml - XPath 拉出不止一场比赛

coder 2024-06-23 原文

(BaseX) 错误

我在 BaseX 中对一个大型数据集运行查询，但一个 XQuery 使我的程序崩溃，错误 [XPTY0004] Item expected, sequence found: (attribute begin {"6"}, ...). .

在我的查询中，我试图通过比较 begin 来确保一个元素排在另一个元素之前。 - 存在于 XML 中的属性 - 带有 number() .但是每当我在我的数据集上尝试最基本的 XQueries(返回匹配节点)(例如使用 this online tool )时，我都会收到一个类似于我之前遇到的错误:

[Error] SaxonCE.XSLT20Processor 14:08:39.692 SEVERE: XPathException in invokeTransform: A sequence of more than one item is not allowed as the first argument of number() ("6", "10")

所以我猜测节点的兄弟节点有问题，即这些节点不止一个，并且不清楚应该比较哪个节点。示例如下。

为什么顺序很重要？

XPath 用于树库的查询引擎:语言注释语料库。在某些情况下，我们希望节点按顺序匹配，有时这并不重要。举一个简单的例子:有时我们想要匹配一些特定的东西，比如相关的人，其中订单冠词、形容词、名词很重要。在其他查询中，这无关紧要，我们也希望匹配诸如可用时间之类的短语，其中冠词、形容词、名词的顺序可以是任何顺序。

换句话说，在第一种情况下应该尊重元素的顺序，在第二种情况下则不应该。下面是这种包含冠词、形容词和名词的结构的可能的 XPath 表示。

node[@cat="np" and node[@pt="art"] and node[@pt="adj"] and node[@pt="n"]]

默认情况下，XPath 不关心这些元素的顺序并进行贪婪搜索，即它还会匹配诸如可用时间之类的项目( art 、 n 、 adj )。但是我想重新编写上面的 XPath 以确保尊重节点的顺序，因此诸如可用时间( art 、 n 、 adj )之类的构造与相关人员不匹配人 ( art , adj , n ) 是。

# Possible representation of *the time available*
<node id="0" begin="1" cat="np">
    <node id="1" begin="1" pt="art" text="the" />        
    <node id="2" begin="2" pt="n" text="time" />
    <node id="3" begin="3" pt="adj" text="available" />
</node>

# Possible representation of *the concerned man*
<node id="0" begin="1" cat="np">
    <node id="1" begin="1" pt="art" text="the" />        
    <node id="2" begin="2" pt="adj" text="concerned" />
    <node id="3" begin="3" pt="n" text="man" />
</node>

一种方法是使用 begin 的数字比较。语料库中可用的属性。是数值升序，所以如果要保证XPath的顺序不变，可以说@cat="np"的每个子节点的数值。使用 number() 应该小于下一个.但是正如我上面所展示的，这会导致一个错误——这个错误在我刚刚展示的简单示例代码中不会发生。

另一个例子。

<node id="0" begin="2">
    <node id="1" begin="2">
        <node id="2" begin="2"/>
        <node id="3" begin="3"/>
    </node>
    <node id="4" begin="5">
        <node id="5" begin="5"/>
    </node>
    <node id="6" begin="6"/>
</node>

这个 XPath 应该匹配:

/node/node[number(@begin) < number(../node/@begin)]

但是当通过 XQuery 处理器时，您会得到上述错误。 A sequence of more than one item is not allowed as the first argument of number() ("2", "5", ...) .

我尝试了@Michael Kay 提供的解决方案，但似乎出现了同样的问题。

查询

for $node in node[every $n in node[position() lt last()] satisfies (number($n/@begin) lt number($n/following-sibling::node/@begin))]
return $node

数据

<node id="0" begin="2">
    <node id="1" begin="2">
        <node id="2" begin="2"/>
        <node id="3" begin="3"/>
    </node>
    <node id="4" begin="5">
        <node id="5" begin="5"/>
    </node>
    <node id="6" begin="6"/>
</node>

错误

SaxonCE.XSLT20Processor 14:48:49.809 SEVERE: XPathException in invokeTransform: A sequence of more than one item is not allowed as the first argument of number() ("5", "6")

2017 年 4 月 19 日更新

今天遇到了一些意想不到的行为，这使得@har07提供的解决方案已经不够用了。我错误地认为 not()子句仅对 XPath 中的节点(而不是 XML 中的所有节点)产生影响。换句话说，当 not()子句被添加到 XPath 的最顶层节点，它在 XML 中的所有子节点都将具有固定的、排序的词序。 (现在我这样读，似乎很正常。)但是，我真正想要的是仅在 XPath 中指定的节点上设置词序，而不可能在匹配的 XML 中的其他节点上设置。希望和示例将使这一点更加清楚。

假设我想匹配以下 XPath，一个 cat="np"包含 rel="det" pt="vnw" lemma="die"并且至少两次 rel="mod" pt="adj" .

//node[@cat="np" and node[@rel="det" and @pt="vnw" and @lemma="die"] and count(node[@rel="mod" and @pt="adj"]) > 1]

但是增加了遵循此 XPath 的顺序的要求，即

//node[
    @cat="np" and 
    not(node[
        position() < last()
    ][number(@begin) > following-sibling::node/number(@begin)]) and 
    node[
        @rel="det" and 
        @pt="vnw" and 
        @lemma="die"
    ] and 
    count(node[
        @rel="mod" and 
        @pt="adj"
    ]) > 1
]

所以rel="det"必须发生在这两个之前 rel="mod"在 XML 中。这工作正常，并且所有匹配项都是正确的，但并未找到所有预期的匹配项。原因是not() line 显然针对所有 XML 节点，而不是 XPath 指定的节点。如果在线下发现不遵守 not 的节点规则，不会有匹配项 - 即使该节点未在 XPath 中指定。例如，上面的 XPath 将与下面的 XML 不匹配，因为在 cat="np" 中存在一个开始属性大于其下一个兄弟节点的节点，这是not 不允许的。规则。

<node begin="4" cat="np" id="8" rel="obj1">
    <node begin="4" id="9" pos="det" pt="vnw" rel="det" word="die" lemma="die" />
    <node begin="5" id="10" pos="adj" pt="adj" rel="mod" word="veelzijdige" />
    <node begin="6" id="11" pos="adj" pt="adj" rel="mod" word="getalenteerde" />
    <node begin="7" id="12" pos="noun" pt="n" rel="hd" word="figuren" />
    <node begin="8" id="31" index="1" rel="obj1" />
    <node begin="2" id="32" index="2" rel="obj2" />
</node>

但是，我想要这个cat="np"匹配，并使 not()功能不那么激进，即只要求在 XPath 中指定的节点(在本例中为 rel="det" pt="vnw" lemma="die" 和两个 rel="mod" pt="adj" 节点)遵循开始属性应该小于 XPath 结构的下一项的顺序要求。里面的其他元素cat="np"未在 XPath 中指定的属性允许具有大于其下一个同级的属性。

请注意，XPath 结构的最后一项(在示例 XML 中与 id="11" 匹配)不一定必须具有低于其在 XML 中的后续节点(未在 XPath 中指定)的 begin 属性。

和以前一样，我对如何使用纯 XPath 选项解决这个问题特别感兴趣，但也欢迎使用 XQuery 替代方案。最好是将 XPath 结构作为输入的函数，并将“词序”应用于其最顶层节点及其所有后代。鼓励示例代码和使用此处显示的 XPath 作为示例。

最佳答案

我认为我理解的问题部分是这样的:

假设我想匹配 XML，其中根的每个直接子级的属性 begin 都小于下一个兄弟级。

<node id="0" begin="2">
    <node id="1" begin="2">
        <node id="2" begin="2"/>
        <node id="3" begin="3"/>
    </node>
    <node id="4" begin="5">
        <node id="5" begin="5"/>
    </node>
    <node id="6" begin="6"/>
</node>

这个 XPath 应该匹配:

/node/node[number(@begin) < number(../node/@begin)]

现在，很清楚为什么会出现错误。在谓词中，..选择 id=0 的节点，它有三个子节点(id 为 1、4 和 6)，每个子节点都有一个 @begin属性，所以 number(../node/@begin)正在选择三个属性的序列。

您的查询似乎与散文要求没有任何关系，即

其中根的每个直接子级都有一个比下一个兄弟级小的属性 begin

这样做的条件是

node[every $n in node[position() lt last()] satisfies (number($n/@begin) lt number($n/following-sibling::node/@begin)]

关于xml - XPath 拉出不止一场比赛，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/42673237/

XPath xml 34 code node xquery

有关xml - XPath 拉出不止一场比赛的更多相关文章

ruby - 如何使用 Nokogiri 的 xpath 和 at_xpath 方法 - 2
我正在学习如何使用Nokogiri，根据这段代码我遇到了一些问题:require'rubygems'require'mechanize'post_agent=WWW::Mechanize.newpost_page=post_agent.get('http://www.vbulletin.org/forum/showthread.php?t=230708')puts"\nabsolutepathwithtbodygivesnil"putspost_page.parser.xpath('/html/body/div/div/div/div/div/table/tbody/tr/td/div
ruby-on-rails - 如何从 format.xml 中删除 <hash></hash> - 2
我有一个对象has_many应呈现为xml的子对象。这不是问题。我的问题是我创建了一个Hash包含此数据，就像解析器需要它一样。但是rails自动将整个文件包含在.........我需要摆脱type="array"和我该如何处理？我没有在文档中找到任何内容。最佳答案我遇到了同样的问题；这是我的XML:我在用这个:entries.to_xml将散列数据转换为XML，但这会将条目的数据包装到中所以我修改了:entries.to_xml(root:"Contacts")但这仍然将转换后的XML包装在“联系人”中，将我的XML代码修改为
ruby-on-rails - Nokogiri:使用 XPath 搜索 <div> - 2
我使用Nokogiri(Rubygem)css搜索寻找某些在我的html里面。看起来Nokogiri的css搜索不喜欢正则表达式。我想切换到Nokogiri的xpath搜索，因为这似乎支持搜索字符串中的正则表达式。如何在xpath搜索中实现下面提到的(伪)css搜索？require'rubygems'require'nokogiri'value=Nokogiri::HTML.parse(ABBlaCD3"HTML_END#my_blockisgivenmy_bl="1"#my_eqcorrespondstothisregexmy_eq="\/[0-9]+\/"#FIXMEThefoll
ruby-on-rails - capybara ::ElementNotFound:无法找到 xpath "/html" - 2
我正在学习http://ruby.railstutorial.org/chapters/static-pages上的RubyonRails教程并遇到以下错误StaticPagesHomepageshouldhavethecontent'SampleApp'Failure/Error:page.shouldhave_content('SampleApp')Capybara::ElementNotFound:Unabletofindxpath"/html"#(eval):2:in`text'#./spec/requests/static_pages_spec.rb:7:in`(root)'
ruby - 如何从 nokogiri 对象获取标识符(例如 xpath)？ - 2
遍历nokogiri并使用xpath选择元素相当容易。我需要这个反之亦然，意思是:我需要例如在nokogiri节点上调用.to_xpath以获取元素的完整xpath以将其存储在记录中。有人知道这样做的方法吗？最佳答案我能想到的最简单的方法是:Nokogiri::CSS.xpath_fornode.css_path编辑:您也可以尝试使用path方法。关于ruby-如何从nokogiri对象获取标识符(例如xpath)？，我们在StackOverflow上找到一个类似的问题：
ruby-on-rails - 如何在 Rails 3 中禁用 XML 解析 - 2
我想禁用HTTP参数的自动XML解析。但我发现命令仅适用于Rails2.x，它们都不适用于3.0:config.action_controller.param_parsers.deleteMime::XML(application.rb)ActionController::Base.param_parsers.deleteMime::XMLRails3.0中的等价物是什么？最佳答案根据CVE-2013-0156的最新安全公告你可以将它用于Rails3.0。3.1和3.2ActionDispatch::ParamsParser::
ruby - 如何使用 Nokogiri::XML::Builder 生成动态标签？ - 2
我正在遍历数组中的一组标签名称，我想使用构建器打印每个标签名称，而不是求助于“我认为:builder=Nokogiri::XML::Builder.newdo|xml|fortagintagsxml.tag!tag,somevalendend会这样做，但它只是创建名称为“tag”的标签，并将标签变量作为元素的文本值。有人可以帮忙吗？这个看起来应该比较简单，我刚刚在搜索引擎上找不到答案。我可能没有以正确的方式提问。最佳答案尝试以下操作。如果我没记错的话，我添加了一个根节点，因为Nokogiri需要一个。builder=Nokogi
ruby - 如何让 Nokogiri 解析并返回 XML 文档？ - 2
这是一些奇怪的例子:#!/usr/bin/rubyrequire'rubygems'require'open-uri'require'nokogiri'print"withoutread:",Nokogiri(open('http://weblog.rubyonrails.org/')).class,"\n"print"withread:",Nokogiri(open('http://weblog.rubyonrails.org/').read).class,"\n"运行此返回:withoutread:Nokogiri::XML::Documentwithread:Nokogiri::
ruby - 模式加载时出现 Nokogiri::XML::Schema SyntaxError - 2
我正在尝试加载SAML协议(protocol)架构(具体来说:https://www.oasis-open.org/committees/download.php/3407/oasis-sstc-saml-schema-protocol-1.1.xsd)，但在执行此操作之后:schema=Nokogiri::XML::Schema(File.read('saml11_schema.xsd'))我得到这个输出:Nokogiri::XML::SyntaxErrorException:Element'{http://www.w3.org/2001/XMLSchema}element',att
ruby-on-rails - 如何使用 JSON 的 Nokogiri 中的 XPath 从 eBay 和亚马逊抓取图像 - 2
我正在尝试使用Nokogiri和XPath从网站上抓取图像，但到目前为止收效甚微。对于其HTML具有img和src的典型网站，我可以使用:tmp2=Nokogiri::HTML(open(site_url))tmp2.xpath("//img/@src").eachdo|src|...dowhateverend但是，某些网站(如Amazon和eBay)仅使用JavaScript触发特定图像。如果我查看代码，我可以看到数组中的数据。例如，来自Amazon:P.when('jQuery','cf').execute(function($,cf){P.load.js('http://z-ec

xml - XPath 拉出不止一场比赛

有关xml - XPath 拉出不止一场比赛的更多相关文章

随机推荐