MongoDB AggregationOutput 响应时间更长

coder 2023-05-05 原文

我有一个名为“logTransaction”的集合。我想得到你在附图中看到的结果。

logTransaction 有很多字段，但用于此图像的字段是:

customer、environment、firstTime、lastTime、integrationIds[] (一个事务可以有多个集成), transactionStatus (FINISHED, UNFINISHED, FAILED)

我正在使用 AggregationOutput 来获得这个结果，但它需要 30 多秒，这比我拥有的数据量要长得多(我认为)。我只是想知道我是否可以通过修改我已经拥有或应该拥有的东西来改善这一点我完全改变它。我应该使用什么类型的索引来加快速度？

我使用 MongoDB 和 Grails。我目前的方法是这样的:

def myCustomAggregation(integrations, timestamp_lt, timestamp_gt, cust, env) {
    def currentRequest = RequestContextHolder.requestAttributes

    def customer = cust ?: currentRequest?.session?.customer
    def environment = env ?: currentRequest?.session?.environment

    //$match
    DBObject matchMap = new BasicDBObject('integrationIds', new BasicDBObject('$in', integrations.collectAll { it?.baselineId }))
    matchMap.put("firstTimestamp", new BasicDBObject('$lte', timestamp_lt as Long).append('$gte', timestamp_gt as Long))
    matchMap.put("customer",customer)
    matchMap.put("environment",environment)
    DBObject match = new BasicDBObject('$match',matchMap);

    //$group1
    Map<String, Object> dbObjIdMap1 = new HashMap<String, Object>();
    dbObjIdMap1.put('integrationId', '$integrationIds');
    dbObjIdMap1.put('transactionStatus', '$transactionStatus');
    DBObject groupFields1 = new BasicDBObject( "_id", new BasicDBObject(dbObjIdMap1));
    groupFields1.put('total', new BasicDBObject( '$sum', 1));
    DBObject group1 = new BasicDBObject('$group', groupFields1);

    //$group2
    DBObject groupFields2 = new BasicDBObject( "_id", '$_id.integrationId');
    groupFields2.put('total_finished',
        new BasicDBObject('$sum', new BasicDBObject('$cond', [
            new BasicDBObject('$eq', ['$_id.transactionStatus', 'FINISHED']), '$total', 0
        ]))
    );
    groupFields2.put('total_unfinished',
        new BasicDBObject('$sum', new BasicDBObject('$cond', [
            new BasicDBObject('$eq', ['$_id.transactionStatus', 'UNFINISHED']), '$total', 0
        ]))
    );
    groupFields2.put('total_failed',
        new BasicDBObject('$sum', new BasicDBObject('$cond', [
            new BasicDBObject('$eq', ['$_id.transactionStatus', 'FAILED']), '$total', 0
        ]))
    );
    DBObject group2 = new BasicDBObject('$group', groupFields2);
    // This taking more than 30 seconds. Its too much for the amount of data I have in Database.
    AggregationOutput output = db.logTransaction.aggregate(match,group1,group2)
    return output.results()
}

编辑:

我按照 HoefMeistert 的建议创建了一个复合索引:

db.logTransaction.createIndex({integrationIds: 1, firstTimestamp: -1, customer: 1, environment: 1})

但是当我在这个聚合上使用解释时:

db.logTransaction.explain().aggregate( [
    { $match: {integrationIds: {$in: ["INT010","INT011","INT012A","INT200"]}, "firstTimestamp": { "$lte" : 1476107324000 , "$gte" : 1470002400000}, "customer": "Awsome_Company", "environment": "PROD"}},
    { $group: { _id: {"integrationId": '$integrationIds', "transactionStatus": '$transactionStatus'}, total: {$sum: 1}}},
    { $group: { _id: "$_id.integrationId", "total_finished": {$sum: {$cond: [{$eq: ["$_id.transactionStatus", "FINISHED"]}, "$total", 0]}}, "total_unfinished": {$sum: {$cond: [{$eq: ["$_id.transactionStatus", "UNFINISHED"]}, "$total", 0]}}, "total_failed": {$sum: {$cond: [{$eq: ["$_id.transactionStatus", "FAILED"]}, "$total", 0]}}}}
]);

我仍然每次都能得到这个获奖计划:

"winningPlan" : {
                "stage" : "CACHED_PLAN",
                "inputStage" : {
                    "stage" : "FETCH",
                    "filter" : {
                        "$and" : [
                                {
                                    "environment" : {
                                            "$eq" : "PROD"
                                    }
                                },
                                {
                                    "integrationIds" : {
                                        "$in" : [
                                            "INT010",
                                            "INT011",
                                            "INT012A",
                                            "INT200"
                                        ]
                                    }
                                }
                        ]
                    },
                    "inputStage" : {
                            "stage" : "IXSCAN",
                            "keyPattern" : {
                                "tenant" : 1,
                                "firstTimestamp" : -1
                            },
                            "indexName" : "customer_1_firstTimestamp_-1",
                            "isMultiKey" : false,
                            "isUnique" : false,
                            "isSparse" : false,
                            "isPartial" : false,
                            "indexVersion" : 1,
                            "direction" : "forward",
                            "indexBounds" : {
                                "customer" : [
                                    "[\"Awsome_Company\", \"Awsome_Company\"]"
                                ],
                                "firstTimestamp" : [
                                    "[1476107324000.0, 1470002400000.0]"
                                ]
                            }
                    }
                }
        },

开发环境中集合的当前索引。而且速度比以前好但是当时间跨度大于1周时，我仍然得到sockettimeoutexception(3分钟):

"customer_1_firstTimestamp_-1" : 56393728,
"firstTimestamp_-1_customer_1" : 144617472,
"integrationIds_1_firstTimestamp_-1" : 76644352,
"integrationId_1_firstTimestamp_-1" : 56107008,
"transactionId_1_firstTimestamp_-1" : 151429120,
"firstTimestamp_1" : 56102912,
"transactionId_1" : 109445120,
"integrationIds_1_firstTimestamp_-1_customer_1_environment_1" : 247790976

最佳答案

您目前拥有哪些索引？当我查看您的聚合时，请确保您在匹配的字段上有一个索引:

集成标识
第一个时间戳
客户
环境

在第一(匹配)阶段之后，索引不再相关。正如elixir所问，shell/editor中的性能如何？那里也慢吗。如果是这样，请尝试找到“慢”阶段。

更新: 你也可以帮助Aggregation Pipeline optimizer ;-) 将匹配重写为单个 $and匹配

{ $match: {integrationIds: {$in: ["INT010","INT011","INT012A","INT200"]}, "firstTimestamp": { "$lte" : 1476107324000 , "$gte" : 1470002400000}, "customer": "Awsome_Company", "environment": "PROD"}}

到:

    { $match: { $and : [
      {integrationIds: {$in: ["INT010","INT011","INT012A","INT200"]}}, 
      {"firstTimestamp": { "$lte" : 1476107324000 , "$gte" : 1470002400000}}, 
      {"customer": "Awsome_Company"}, 
      {"environment": "PROD"}]
    }

关于MongoDB AggregationOutput 响应时间更长，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/39916540/

AggregationOutput MongoDB 34 39 BasicDBObject grails aggregation-framework

有关MongoDB AggregationOutput 响应时间更长的更多相关文章

ruby-on-rails - Ruby 检查日期时间是否为 iso8601 并保存 - 2
我需要检查DateTime是否采用有效的ISO8601格式。喜欢:#iso8601?我检查了ruby是否有特定方法，但没有找到。目前我正在使用date.iso8601==date来检查这个。有什么好的方法吗？编辑解释我的环境，并改变问题的范围。因此，我的项目将使用jsapiFullCalendar，这就是我需要iso8601字符串格式的原因。我想知道更好或正确的方法是什么，以正确的格式将日期保存在数据库中，或者让ActiveRecord完成它们的工作并在我需要时间信息时对其进行操作。最佳答案我不太明白你的问题。我假设您想检查
ruby-on-rails - 每次我尝试部署时，我都会得到 - (gcloud.preview.app.deploy) 错误响应 : [4] DEADLINE_EXCEEDED - 2
我是Google云的新手，我正在尝试对其进行首次部署。我的第一个部署是RubyonRails项目。我基本上是在关注thisguideinthegoogleclouddocumentation.唯一的区别是我使用的是我自己的项目，而不是他们提供的“helloworld”项目。这是我的app.yaml文件runtime:customvm:trueentrypoint:bundleexecrackup-p8080-Eproductionconfig.ruresources:cpu:0.5memory_gb:1.3disk_size_gb:10当我转到我的项目目录并运行gcloudprevie
ruby-on-rails - 将 Ruby 中的日期/时间格式化为 YYYY-MM-DD HH :MM:SS - 2
这个问题在这里已经有了答案:Railsformattingdate(4个答案)关闭4年前。我想格式化Time.Now函数以显示YYYY-MM-DDHH:MM:SS而不是:“2018-03-0909:47:19+0000”该函数需要放在时间中.现在功能。require‘roo’require‘roo-xls’require‘byebug’file_name=ARGV.first||“Template.xlsx”excel_file=Roo::Spreadsheet.open(“./#{file_name}“,extension::xlsx)xml=Nokogiri::XML::Build
ruby - 查找字符串中的内容类型(数字、日期、时间、字符串等) - 2
我正在尝试解析一个CSV文件并使用SQL命令自动为其创建一个表。CSV中的第一行给出了列标题。但我需要推断每个列的类型。Ruby中是否有任何函数可以找到每个字段中内容的类型。例如，CSV行:"12012","Test","1233.22","12:21:22","10/10/2009"应该产生像这样的类型['integer','string','float','time','date']谢谢! 最佳答案 require'time'defto_something(str)if(num=Integer(str)rescueFloat(s
sql - 查询忽略时间戳日期的时间范围 - 2
我正在尝试查询我的Rails数据库(Postgres)中的购买表，我想查询时间范围。例如，我想知道在所有日期的下午2点到3点之间进行了多少次购买。此表中有一个created_at列，但我不知道如何在不搜索特定日期的情况下完成此操作。我试过:Purchases.where("created_atBETWEEN?and?",Time.now-1.hour,Time.now)但这最终只会搜索今天与那些时间的日期。最佳答案您需要使用PostgreSQL'sdate_part/extractfunction从created_at中提取小时
ruby-on-rails - 在 Ruby on Rails 中发送响应之前如何等待多个异步操作完成？ - 2
在我做的一些网络开发中，我有多个操作开始，比如对外部API的GET请求，我希望它们同时开始，因为一个不依赖另一个的结果。我希望事情能够在后台运行。我找到了concurrent-rubylibrary这似乎运作良好。通过将其混合到您创建的类中，该类的方法具有在后台线程上运行的异步版本。这导致我编写如下代码，其中FirstAsyncWorker和SecondAsyncWorker是我编写的类，我在其中混合了Concurrent::Async模块，并编写了一个名为“work”的方法来发送HTTP请求:defindexop1_result=FirstAsyncWorker.new.async.
ruby - 在没有基准或时间的情况下用 Ruby 测量用户时间或系统时间 - 2
因为我现在正在做一些时间测量，我想知道是否可以在不使用Benchmark类或命令行实用程序time的情况下测量用户时间或系统时间。使用Time类只显示挂钟时间，而不显示系统和用户时间，但是我正在寻找具有相同灵active的解决方案，例如time=TimeUtility.now#somecodeuser,system,real=TimeUtility.now-time原因是我有点不喜欢Benchmark，因为它不能只返回数字(编辑:我错了-它可以。请参阅下面的答案。)。当然，我可以解析输出，但感觉不对。*NIX系统的time实用程序也应该可以解决我的问题，但我想知道是否已经在Ruby中实
ruby - 以毫秒为单位获取当前系统时间 - 2
在Ruby中，以毫秒为单位获取自纪元(1970)以来的当前系统时间的正确方法是什么？我试过了Time.now.to_i，好像不是我想要的结果。我需要结果显示毫秒并且使用long类型，而不是float或double。最佳答案 (Time.now.to_f*1000).to_iTime.now.to_f显示包含十进制数字的时间。要获得毫秒数，只需将时间乘以1000。关于ruby-以毫秒为单位获取当前系统时间，我们在StackOverflow上找到一个类似的问题：
ruby-on-rails - Ruby on Rails - 需要在每周的特定时间将消息发送到电子邮件 - 2
我想知道我应该如何着手这个项目。我需要每周向人们发送一次电子邮件。但是，这必须在每周的特定时间自动生成并发送。编码有多难？我需要知道是否有任何书籍可以提供帮助，或者你们中的任何人是否可以指导我。它必须使用rubyonrails进行编程。因此有一个网络服务和数据库集成。干杯最佳答案为什么这么复杂？您只需安排工作。您可以使用Delayed::Job例如。Delayed::Job让您可以使用run_at符号在特定时间安排作业，如下所示:Delayed::Job.enqueue(SendEmailJob.new(...),:run_
ruby - rspec 显示负时间 - 2
我在ruby1.9.3p0上运行rails3.2.1和rspec2.8.1，在运行我的测试时它显示负时间值。这很烦人，因为我正在尝试优化我的测试。Running:spec/models/transaction_spec.rb................................................Finishedin-7603162.49414seconds我已经尝试将rspec更新到2.9.0，但这没有帮助。最佳答案你在使用timecopgem吗？确保在卡住后Timecop.return。或者你在某处

MongoDB AggregationOutput 响应时间更长

有关MongoDB AggregationOutput 响应时间更长的更多相关文章

随机推荐