java - 在 Solr(J) 中搜索嵌套文档

coder 2024-03-16 原文

我为 PDF 文档实现了一个简单的添加工具。我首先创建一个包含所有文档的主文档 (SolrInputDocument)。它获取作者、文件哈希、关键字、'content_type=document' 等字段。

之后我为每个页面生成一个 SolrInputDocument，这个对象得到一个像 'parentID_p01' 这样的 id，页面作为一个字段值，'content_type=page' usw。

最后，我使用 addChildDocument() 将所有页面文档添加到我的主文档中。

现在我的问题是，如何在所有文档的所有页面中执行给定单词的搜索，例如:

Document1.pdf 'this is my doc1 title' [2 matches]
    [Page 14] 'Example phrase for special a <em>word</em> given by.... 
    [Page 55] 'another <em>word</em> for this test

Document2.pdf 'doc2 title' [X matches]
    [Page 1] 'given <em>word</em> in this text 
    [Page 2] '<em>words</em> hit more than fists
    [Page 99] 'some <em>words</em> of wisdom

我的第一个想法是使用“text:word~”进行简单搜索，然后按父文档分组，但我没有找到好的方法:-(

似乎嵌套文档对 solr 来说有点新，我没有找到 SolrJ 的简单解决方案。

提前致谢

最佳答案

我在 solr 上创建了以下格式的父子关系数据。其中一个 insurance_accounts 有多人的车辆保险。一个人可以拥有多辆车，如汽车、自行车等。我将人作为父文件，将车辆作为子文件。

<insurance_accounts>
            <person>
                <firstname>John</firstname>
                <lastname>Jones</lastname>
                ...other details...
                <cars>
                    <car>
                        <make>honda</make>
                        <model>accord</model>
                    </car>
                    <car>
                        <make>Nissan</make>
                        <model>Maxima</model>
                    </car>
                </cars>
                <bikes>
                    <bike>
                        <make>yamaha</make>
                        <model>passion</model>
                    </bike>
                    <bike>
                        <make>Peugeot</make>
                        <model>Vivacity</model>
                    </bike>
                </bikes>
            </person>
            <person>
                ...
            </person>
          </insurance_accounts>

在下面的 java 代码中，我使用 solrj 4.9 创建文档并在 SOLR 上运行搜索查询。我已经处理了 QueryResponse 以显示所需的结果并给出了 solr 查询 URL。

您可以从给定的代码片段中获取帮助，让我知道它是否有效。

import java.util.ArrayList;
import java.util.Collection;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import java.util.LinkedHashMap;
import org.apache.solr.client.solrj.SolrQuery;
import org.apache.solr.client.solrj.impl.HttpSolrServer;
import org.apache.solr.client.solrj.response.QueryResponse;
import org.apache.solr.common.SolrDocument;
import org.apache.solr.common.SolrDocumentList;
import org.apache.solr.common.SolrInputDocument;

public class SolrNestedSearch {

    static final String SOLR_URL = "http://localhost:8983/solr/collection1";

    public static void main ( String[] args ) throws Exception {

        HttpSolrServer solr = new HttpSolrServer( SOLR_URL );

        // Docs to submit
        Collection<SolrInputDocument> batch = new ArrayList<SolrInputDocument>();

        // Parent Doc 1, a person mamed John Jones
        SolrInputDocument person1 = new SolrInputDocument();
        person1.addField( "id",            "john_jones" );
        person1.addField( "content_type",  "person"     );
        // "_t" suffix tells Solr that it's text
        person1.addField( "first_name_t",  "John"       );
        person1.addField( "last_name_t",   "Jones"      );
        // states and history used in edismax examples
        person1.addField( "states_t",      "California Nevada Idaho Maine" );
        person1.addField( "history_t",     "safe accident accident accident accident accident" );

        // child docs, the vehicles he owns
        SolrInputDocument p1_car1 = new SolrInputDocument();
        p1_car1.addField( "id",            "jj_car1"    );
        p1_car1.addField( "content_type",  "car"        );
        // For cars "make" is an alias for "manufacturer"
        p1_car1.addField( "make_t",        "Honda"      );
        p1_car1.addField( "model_t",       "Accord"     );

        SolrInputDocument p1_car2 = new SolrInputDocument();
        p1_car2.addField( "id",            "jj_car2"    );
        p1_car2.addField( "content_type",  "car"        );
        p1_car2.addField( "make_t",        "Nissan"     );
        p1_car2.addField( "model_t",       "Maxima"     );

        SolrInputDocument p1_bike1 = new SolrInputDocument();
        p1_bike1.addField( "id",           "jj_bike1"   );
        p1_bike1.addField( "content_type", "bike"       );
        p1_bike1.addField( "make_t",       "Yamaha"     );
        p1_bike1.addField( "model_t",      "Passion"    );

        SolrInputDocument p1_bike2 = new SolrInputDocument();
        p1_bike2.addField( "id",           "jj_bike2"   );
        p1_bike2.addField( "content_type", "bike"       );
        p1_bike2.addField( "make_t",       "Peugeot"    );
        p1_bike2.addField( "model_t",      "Vivacity"   );

        // Add children to parent
        person1.addChildDocument( p1_car1  );
        person1.addChildDocument( p1_car2  );
        person1.addChildDocument( p1_bike1 );
        person1.addChildDocument( p1_bike2 );

        // Add parent to batch
        batch.add( person1 );


        // Parent Doc 2, person mamed Satish Smith
        SolrInputDocument person2 = new SolrInputDocument();
        person2.addField( "id",           "satish_smith" );
        person2.addField( "content_type", "person"       );
        person2.addField( "first_name_t", "Satish"       );
        person2.addField( "last_name_t",  "Smith"        );
        person2.addField( "states_t",     "California Texas California Maine Vermont Connecticut" );
        person2.addField( "history_t",    "safe safe safe safe safe safe safe safe accident" );

        // Vehicles (child docs)
        SolrInputDocument p2_car1 = new SolrInputDocument();
        p2_car1.addField( "id",            "ss_car1"     );
        p2_car1.addField( "content_type",  "car"         );
        p2_car1.addField( "make_t",        "Peugeot"     );
        p2_car1.addField( "model_t",       "iOn"         );
        SolrInputDocument p2_bike1 = new SolrInputDocument();
        p2_bike1.addField( "id",           "ss_bike1"    );
        p2_bike1.addField( "content_type", "bike"        );
        p2_bike1.addField( "make_t",       "Honda"       );
        p2_bike1.addField( "model_t",      "Spree"       );
        // link objects and add to batch
        person2.addChildDocument( p2_car1  );
        person2.addChildDocument( p2_bike1 );
        batch.add( person2 );

        System.out.println( "Adding batch of " + batch.size() + " parent docs" );

        // Submit as a group
        solr.add( batch );
        solr.commit();



        Map<String,String> params = new LinkedHashMap<>();
        params.put( "parent_filter", "content_type:person" );
        params.put( "defType", "edismax" );
        params.put( "qf", "history_t states_t^100" );
        doQuery(solr,
                "eDismax3: Drivers matching accidents and/or California, and all of their vehicles, boost on State",
                "California accident report",
                "{!parent which=$parent_filter}",
                "*,[child parentFilter=$parent_filter]",
                params );

    }


    static void doQuery( HttpSolrServer solr, String description, String queryStr, String optFilter,
            String optFields, Map<String,String>extraParams ) throws Exception
    {
        SolrQuery q = new SolrQuery( queryStr );
        if ( null!=optFilter ) {
            q.addFilterQuery( optFilter );
        }
        if ( null!=optFields ) { 
            q.setParam( "fl", optFields );   
        }
        else {
            q.addField( "*" );  
        }
        if ( null!=extraParams ) {
            for ( Entry<String,String> param : extraParams.entrySet() ) {
                q.set( param.getKey(), param.getValue() );
            }
        }

        // Run and show results
        QueryResponse rsp = solr.query( q );
        SolrDocumentList docs = rsp.getResults();
        long numFound = docs.getNumFound();
        System.out.println( "Matched: " + numFound );
        int docCounter = 0;
        for (SolrDocument doc : docs) {
            docCounter++;
            System.out.println( "Doc # " + docCounter );
            for ( Entry<String, Object> field : doc.entrySet() ) {
                String name = field.getKey();
                Object value = field.getValue();
                System.out.println( "\t" + name + "=" + value );
            }
            List<SolrDocument> childDocs = doc.getChildDocuments();
            if ( null!=childDocs ) {
                for ( SolrDocument child : childDocs ) {
                    System.out.println( "\tChild doc:" );
                    for ( Entry<String, Object> field : child.entrySet() ) {
                        String name = field.getKey();
                        Object value = field.getValue();
                        System.out.println( "\t\t" + name + "=" + value );
                    }
                }
            }
        }

        System.out.println( "Query URL:" );
        System.out.println( SOLR_URL + "/select?" + q );
    }
}

关于java - 在 Solr(J) 中搜索嵌套文档，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/27983644/

中搜 java 34 addField SolrInputDocument solr solrj

有关java - 在 Solr(J) 中搜索嵌套文档的更多相关文章

ruby-on-rails - Rails 编辑表单不显示嵌套项 - 2
我得到了一个包含嵌套链接的表单。编辑时链接字段为空的问题。这是我的表格:Editingkategori{:action=>'update',:id=>@konkurrancer.id})do|f|%>'Trackingurl',:style=>'width:500;'%>'Editkonkurrence'%>|我的konkurrencer模型:has_one:link我的链接模型:classLink我的konkurrancer编辑操作:defedit@konkurrancer=Konkurrancer.find(params[:id])@konkurrancer.link_attrib
ruby - 将散列转换为嵌套散列 - 2
这道题是thisquestion的逆题.给定一个散列，每个键都有一个数组，例如{[:a,:b,:c]=>1,[:a,:b,:d]=>2,[:a,:e]=>3,[:f]=>4,}将其转换为嵌套哈希的最佳方法是什么{:a=>{:b=>{:c=>1,:d=>2},:e=>3,},:f=>4,} 最佳答案这是一个迭代的解决方案，递归的解决方案留给读者作为练习:defconvert(h={})ret={}h.eachdo|k,v|node=retk[0..-2].each{|x|node[x]||={};node=node[x]}node[
java - 等价于 Java 中的 Ruby Hash - 2
我真的很习惯使用Ruby编写以下代码:my_hash={}my_hash['test']=1Java中对应的数据结构是什么？最佳答案 HashMapmap=newHashMap();map.put("test",1);我假设？关于java-等价于Java中的RubyHash，我们在StackOverflow上找到一个类似的问题： https://stackoverflow.com/questions/22737685/
Ruby——嵌套类和子类是一回事吗？ - 2
下面例子中的Nested和Child有什么区别？是否只是同一事物的不同语法？classParentclassNested...endendclassChild 最佳答案不，它们是不同的。嵌套:Computer之外的“Processor”类只能作为Computer::Processor访问。嵌套为内部类(namespace)提供上下文。对于ruby解释器Computer和Computer::Processor只是两个独立的类。classComputerclassProcessor#Tocreateanobjectforthisc
ruby - 模块嵌套代码风格偏好 - 2
我的假设是moduleAmoduleBendend和moduleA::Bend是一样的。我能够从thisblog找到解决方案,thisSOthread和andthisSOthread.为什么以及什么时候应该更喜欢紧凑语法A::B而不是另一个，因为它显然有一个缺点？我有一种直觉，它可能与性能有关，因为在更多命名空间中查找常量需要更多计算。但是我无法通过对普通类进行基准测试来验证这一点。最佳答案这两种写作方法经常被混淆。首先要说的是，据我所知，没有可衡量的性能差异。(在下面的书面示例中不断查找)最明显的区别，可能也是最著名的，是你的
java - 从 JRuby 调用 Java 类的问题 - 2
我正在尝试使用boilerpipe来自JRuby。我看过guide从JRuby调用Java，并成功地将它与另一个Java包一起使用，但无法弄清楚为什么同样的东西不能用于boilerpipe。我正在尝试基本上从JRuby中执行与此Java等效的操作:URLurl=newURL("http://www.example.com/some-location/index.html");Stringtext=ArticleExtractor.INSTANCE.getText(url);在JRuby中试过这个:require'java'url=java.net.URL.new("http://www
ruby-on-rails - 使用回形针的嵌套形式 - 2
我有一个名为posts的模型，它有很多附件。附件模型使用回形针。我制作了一个用于创建附件的独立模型，效果很好，这是此处说明的View(https://github.com/thoughtbot/paperclip):@attachment,:html=>{:multipart=>true}do|form|%>posts中的嵌套表单如下所示:prohibitedthispostfrombeingsaved:@attachment,:html=>{:multipart=>true}do|at_form|%>附件记录已创建，但它是空的。文件未上传。同时，帖子已成功创建...有什么想法吗？
ruby-on-rails - Rails 3，嵌套资源，没有路由匹配 [PUT] - 2
我真的为这个而疯狂。我一直在搜索答案并尝试我找到的所有内容，包括相关问题和stackoverflow上的答案，但仍然无法正常工作。我正在使用嵌套资源，但无法使表单正常工作。我总是遇到错误，例如没有路线匹配[PUT]"/galleries/1/photos"表格在这里:/galleries/1/photos/1/edit路线.rbresources:galleriesdoresources:photosendresources:galleriesresources:photos照片Controller.rbdefnew@gallery=Gallery.find(params[:galle
java - 我的模型类或其他类中应该有逻辑吗 - 2
我只想对我一直在思考的这个问题有其他意见，例如我有classuser_controller和classuserclassUserattr_accessor:name,:usernameendclassUserController//dosomethingaboutanythingaboutusersend问题是我的User类中是否应该有逻辑user=User.newuser.do_something(user1)oritshouldbeuser_controller=UserController.newuser_controller.do_something(user1,user2)我
java - 什么相当于 ruby 的 rack 或 python 的 Java wsgi？ - 2
什么是ruby的rack或python的Java的wsgi？还有一个路由库。最佳答案来自Python标准PEP333:Bycontrast,althoughJavahasjustasmanywebapplicationframeworksavailable,Java's"servlet"APImakesitpossibleforapplicationswrittenwithanyJavawebapplicationframeworktoruninanywebserverthatsupportstheservletAPI.ht

java - 在 Solr(J) 中搜索嵌套文档

有关java - 在 Solr(J) 中搜索嵌套文档的更多相关文章

随机推荐