jjzjj

performance - 子集 DataFrames 时的 Goroutines 开销和性能分析(Gota)

coder 2023-06-27 原文

自 2016 年初以来,我一直致力于为 Go 实现 Pandas/R DataFrame 实现:https://github.com/kniren/gota

最近,我一直专注于提高库的性能以尝试与 Pandas/Dplyr 相匹配。您可以在此处关注目前的进展:https://github.com/kniren/gota/issues/16

由于更频繁使用的操作之一是 DataFrame 子集化,我认为引入并发性以尝试提高系统性能可能是个好主意。

之前:

columns := make([]series.Series, df.ncols)
for i, column := range df.columns {
    s := column.Subset(indexes)
    columns[i] = s
}

之后:

columns := make([]series.Series, df.ncols)
var wg sync.WaitGroup
wg.Add(df.ncols)
for i := range df.columns {
    go func(i int) {
        columns[i] = df.columns[i].Subset(indexes)
        wg.Done()
    }(i)
}
wg.Wait()

据我所知,为 DataFrame 的每一列创建一个 goroutine 应该不会引入太多开销,所以我期望相对于串行版本至少实现 x2 加速(至少对于大型数据集) .然而,当使用不同大小的数据集和索引对这一变化进行基准测试时,结果非常令人失望 (NROWSxNCOLS_INDEXSIZE-CPUCORES):

benchmark                                          old ns/op      new ns/op      delta
BenchmarkDataFrame_Subset/1000000x20_100           55230          109349         +97.99%
BenchmarkDataFrame_Subset/1000000x20_100-2         51457          67714          +31.59%
BenchmarkDataFrame_Subset/1000000x20_100-4         49845          70141          +40.72%
BenchmarkDataFrame_Subset/1000000x20_1000          518506         518085         -0.08%
BenchmarkDataFrame_Subset/1000000x20_1000-2        476661         311379         -34.67%
BenchmarkDataFrame_Subset/1000000x20_1000-4        505023         316583         -37.31%
BenchmarkDataFrame_Subset/1000000x20_10000         6621116        6314112        -4.64%
BenchmarkDataFrame_Subset/1000000x20_10000-2       7316062        4509601        -38.36%
BenchmarkDataFrame_Subset/1000000x20_10000-4       6483812        8394113        +29.46%
BenchmarkDataFrame_Subset/1000000x20_100000        105341711      106427967      +1.03%
BenchmarkDataFrame_Subset/1000000x20_100000-2      94567729       56778647       -39.96%
BenchmarkDataFrame_Subset/1000000x20_100000-4      91896690       60971444       -33.65%
BenchmarkDataFrame_Subset/1000000x20_1000000       1538680081     1632044752     +6.07%
BenchmarkDataFrame_Subset/1000000x20_1000000-2     1292113119     1100075806     -14.86%
BenchmarkDataFrame_Subset/1000000x20_1000000-4     1282367864     949615298      -25.95%
BenchmarkDataFrame_Subset/100000x20_100            50286          106850         +112.48%
BenchmarkDataFrame_Subset/100000x20_100-2          54537          70492          +29.26%
BenchmarkDataFrame_Subset/100000x20_100-4          58024          76617          +32.04%
BenchmarkDataFrame_Subset/100000x20_1000           541600         625967         +15.58%
BenchmarkDataFrame_Subset/100000x20_1000-2         493894         362894         -26.52%
BenchmarkDataFrame_Subset/100000x20_1000-4         535373         349211         -34.77%
BenchmarkDataFrame_Subset/100000x20_10000          6298063        7678499        +21.92%
BenchmarkDataFrame_Subset/100000x20_10000-2        5827185        4832560        -17.07%
BenchmarkDataFrame_Subset/100000x20_10000-4        8195048        3660077        -55.34%
BenchmarkDataFrame_Subset/100000x20_100000         105108807      82976477       -21.06%
BenchmarkDataFrame_Subset/100000x20_100000-2       92112736       58317114       -36.69%
BenchmarkDataFrame_Subset/100000x20_100000-4       92044966       63469935       -31.04%
BenchmarkDataFrame_Subset/1000x20_10               9741           53365          +447.84%
BenchmarkDataFrame_Subset/1000x20_10-2             9366           36457          +289.25%
BenchmarkDataFrame_Subset/1000x20_10-4             9463           46682          +393.31%
BenchmarkDataFrame_Subset/1000x20_100              50841          103523         +103.62%
BenchmarkDataFrame_Subset/1000x20_100-2            49972          62344          +24.76%
BenchmarkDataFrame_Subset/1000x20_100-4            72014          81808          +13.60%
BenchmarkDataFrame_Subset/1000x20_1000             457799         571292         +24.79%
BenchmarkDataFrame_Subset/1000x20_1000-2           460551         405116         -12.04%
BenchmarkDataFrame_Subset/1000x20_1000-4           462928         416522         -10.02%
BenchmarkDataFrame_Subset/1000x200_10              90125          688443         +663.88%
BenchmarkDataFrame_Subset/1000x200_10-2            85259          392705         +360.60%
BenchmarkDataFrame_Subset/1000x200_10-4            87412          387509         +343.31%
BenchmarkDataFrame_Subset/1000x200_100             486600         1082901        +122.54%
BenchmarkDataFrame_Subset/1000x200_100-2           471154         732304         +55.43%
BenchmarkDataFrame_Subset/1000x200_100-4           542846         659571         +21.50%
BenchmarkDataFrame_Subset/1000x200_1000            5926086        6686480        +12.83%
BenchmarkDataFrame_Subset/1000x200_1000-2          5364091        3986970        -25.67%
BenchmarkDataFrame_Subset/1000x200_1000-4          5904977        4504084        -23.72%
BenchmarkDataFrame_Subset/1000x2000_10             1187297        7800052        +556.96%
BenchmarkDataFrame_Subset/1000x2000_10-2           1217022        3930742        +222.98%
BenchmarkDataFrame_Subset/1000x2000_10-4           1301666        3617871        +177.94%
BenchmarkDataFrame_Subset/1000x2000_100            6942015        10790196       +55.43%
BenchmarkDataFrame_Subset/1000x2000_100-2          6588351        7592847        +15.25%
BenchmarkDataFrame_Subset/1000x2000_100-4          7067226        14391327       +103.63%
BenchmarkDataFrame_Subset/1000x2000_1000           62392457       69560711       +11.49%
BenchmarkDataFrame_Subset/1000x2000_1000-2         57793006       37416703       -35.26%
BenchmarkDataFrame_Subset/1000x2000_1000-4         59572261       58398203       -1.97%

benchmark                                          old allocs     new allocs     delta
BenchmarkDataFrame_Subset/1000000x20_100           41             42             +2.44%
BenchmarkDataFrame_Subset/1000000x20_100-2         41             42             +2.44%
BenchmarkDataFrame_Subset/1000000x20_100-4         41             42             +2.44%
BenchmarkDataFrame_Subset/1000000x20_1000          41             42             +2.44%
BenchmarkDataFrame_Subset/1000000x20_1000-2        41             42             +2.44%
BenchmarkDataFrame_Subset/1000000x20_1000-4        41             42             +2.44%
BenchmarkDataFrame_Subset/1000000x20_10000         41             42             +2.44%
BenchmarkDataFrame_Subset/1000000x20_10000-2       41             42             +2.44%
BenchmarkDataFrame_Subset/1000000x20_10000-4       41             42             +2.44%
BenchmarkDataFrame_Subset/1000000x20_100000        41             42             +2.44%
BenchmarkDataFrame_Subset/1000000x20_100000-2      41             42             +2.44%
BenchmarkDataFrame_Subset/1000000x20_100000-4      41             42             +2.44%
BenchmarkDataFrame_Subset/1000000x20_1000000       41             42             +2.44%
BenchmarkDataFrame_Subset/1000000x20_1000000-2     41             43             +4.88%
BenchmarkDataFrame_Subset/1000000x20_1000000-4     41             46             +12.20%
BenchmarkDataFrame_Subset/100000x20_100            41             42             +2.44%
BenchmarkDataFrame_Subset/100000x20_100-2          41             42             +2.44%
BenchmarkDataFrame_Subset/100000x20_100-4          41             42             +2.44%
BenchmarkDataFrame_Subset/100000x20_1000           41             42             +2.44%
BenchmarkDataFrame_Subset/100000x20_1000-2         41             42             +2.44%
BenchmarkDataFrame_Subset/100000x20_1000-4         41             42             +2.44%
BenchmarkDataFrame_Subset/100000x20_10000          41             42             +2.44%
BenchmarkDataFrame_Subset/100000x20_10000-2        41             42             +2.44%
BenchmarkDataFrame_Subset/100000x20_10000-4        41             42             +2.44%
BenchmarkDataFrame_Subset/100000x20_100000         41             42             +2.44%
BenchmarkDataFrame_Subset/100000x20_100000-2       41             42             +2.44%
BenchmarkDataFrame_Subset/100000x20_100000-4       41             42             +2.44%
BenchmarkDataFrame_Subset/1000x20_10               41             42             +2.44%
BenchmarkDataFrame_Subset/1000x20_10-2             41             42             +2.44%
BenchmarkDataFrame_Subset/1000x20_10-4             41             42             +2.44%
BenchmarkDataFrame_Subset/1000x20_100              41             42             +2.44%
BenchmarkDataFrame_Subset/1000x20_100-2            41             42             +2.44%
BenchmarkDataFrame_Subset/1000x20_100-4            41             42             +2.44%
BenchmarkDataFrame_Subset/1000x20_1000             41             42             +2.44%
BenchmarkDataFrame_Subset/1000x20_1000-2           41             42             +2.44%
BenchmarkDataFrame_Subset/1000x20_1000-4           41             42             +2.44%
BenchmarkDataFrame_Subset/1000x200_10              401            402            +0.25%
BenchmarkDataFrame_Subset/1000x200_10-2            401            402            +0.25%
BenchmarkDataFrame_Subset/1000x200_10-4            401            402            +0.25%
BenchmarkDataFrame_Subset/1000x200_100             401            402            +0.25%
BenchmarkDataFrame_Subset/1000x200_100-2           401            402            +0.25%
BenchmarkDataFrame_Subset/1000x200_100-4           401            402            +0.25%
BenchmarkDataFrame_Subset/1000x200_1000            401            402            +0.25%
BenchmarkDataFrame_Subset/1000x200_1000-2          401            402            +0.25%
BenchmarkDataFrame_Subset/1000x200_1000-4          401            402            +0.25%
BenchmarkDataFrame_Subset/1000x2000_10             4001           4002           +0.02%
BenchmarkDataFrame_Subset/1000x2000_10-2           4001           4002           +0.02%
BenchmarkDataFrame_Subset/1000x2000_10-4           4001           4002           +0.02%
BenchmarkDataFrame_Subset/1000x2000_100            4001           4002           +0.02%
BenchmarkDataFrame_Subset/1000x2000_100-2          4001           4002           +0.02%
BenchmarkDataFrame_Subset/1000x2000_100-4          4001           4002           +0.02%
BenchmarkDataFrame_Subset/1000x2000_1000           4001           4002           +0.02%
BenchmarkDataFrame_Subset/1000x2000_1000-2         4001           4010           +0.22%
BenchmarkDataFrame_Subset/1000x2000_1000-4         4001           4003           +0.05%

benchmark                                          old bytes     new bytes     delta
BenchmarkDataFrame_Subset/1000000x20_100           32400         32416         +0.05%
BenchmarkDataFrame_Subset/1000000x20_100-2         32400         32416         +0.05%
BenchmarkDataFrame_Subset/1000000x20_100-4         32400         32416         +0.05%
BenchmarkDataFrame_Subset/1000000x20_1000          298880        298896        +0.01%
BenchmarkDataFrame_Subset/1000000x20_1000-2        298880        298896        +0.01%
BenchmarkDataFrame_Subset/1000000x20_1000-4        298880        298896        +0.01%
BenchmarkDataFrame_Subset/1000000x20_10000         2971520       2971536       +0.00%
BenchmarkDataFrame_Subset/1000000x20_10000-2       2971520       2971536       +0.00%
BenchmarkDataFrame_Subset/1000000x20_10000-4       2971520       2971536       +0.00%
BenchmarkDataFrame_Subset/1000000x20_100000        29083520      29083536      +0.00%
BenchmarkDataFrame_Subset/1000000x20_100000-2      29083520      29083547      +0.00%
BenchmarkDataFrame_Subset/1000000x20_100000-4      29083542      29083563      +0.00%
BenchmarkDataFrame_Subset/1000000x20_1000000       290121600     290121616     +0.00%
BenchmarkDataFrame_Subset/1000000x20_1000000-2     290121600     290121696     +0.00%
BenchmarkDataFrame_Subset/1000000x20_1000000-4     290121600     290121840     +0.00%
BenchmarkDataFrame_Subset/100000x20_100            32400         32416         +0.05%
BenchmarkDataFrame_Subset/100000x20_100-2          32400         32416         +0.05%
BenchmarkDataFrame_Subset/100000x20_100-4          32400         32416         +0.05%
BenchmarkDataFrame_Subset/100000x20_1000           298880        298896        +0.01%
BenchmarkDataFrame_Subset/100000x20_1000-2         298880        298896        +0.01%
BenchmarkDataFrame_Subset/100000x20_1000-4         298880        298896        +0.01%
BenchmarkDataFrame_Subset/100000x20_10000          2971520       2971536       +0.00%
BenchmarkDataFrame_Subset/100000x20_10000-2        2971520       2971536       +0.00%
BenchmarkDataFrame_Subset/100000x20_10000-4        2971520       2971536       +0.00%
BenchmarkDataFrame_Subset/100000x20_100000         29083520      29083536      +0.00%
BenchmarkDataFrame_Subset/100000x20_100000-2       29083520      29083536      +0.00%
BenchmarkDataFrame_Subset/100000x20_100000-4       29083542      29083553      +0.00%
BenchmarkDataFrame_Subset/1000x20_10               4880          4896          +0.33%
BenchmarkDataFrame_Subset/1000x20_10-2             4880          4896          +0.33%
BenchmarkDataFrame_Subset/1000x20_10-4             4880          4896          +0.33%
BenchmarkDataFrame_Subset/1000x20_100              32400         32416         +0.05%
BenchmarkDataFrame_Subset/1000x20_100-2            32400         32416         +0.05%
BenchmarkDataFrame_Subset/1000x20_100-4            32400         32416         +0.05%
BenchmarkDataFrame_Subset/1000x20_1000             298880        298896        +0.01%
BenchmarkDataFrame_Subset/1000x20_1000-2           298880        298896        +0.01%
BenchmarkDataFrame_Subset/1000x20_1000-4           298880        298896        +0.01%
BenchmarkDataFrame_Subset/1000x200_10              49568         49584         +0.03%
BenchmarkDataFrame_Subset/1000x200_10-2            49568         49584         +0.03%
BenchmarkDataFrame_Subset/1000x200_10-4            49568         49585         +0.03%
BenchmarkDataFrame_Subset/1000x200_100             324768        324784        +0.00%
BenchmarkDataFrame_Subset/1000x200_100-2           324768        324784        +0.00%
BenchmarkDataFrame_Subset/1000x200_100-4           324768        324784        +0.00%
BenchmarkDataFrame_Subset/1000x200_1000            2989568       2989584       +0.00%
BenchmarkDataFrame_Subset/1000x200_1000-2          2989568       2989584       +0.00%
BenchmarkDataFrame_Subset/1000x200_1000-4          2989569       2989588       +0.00%
BenchmarkDataFrame_Subset/1000x2000_10             491072        491088        +0.00%
BenchmarkDataFrame_Subset/1000x2000_10-2           491072        491133        +0.01%
BenchmarkDataFrame_Subset/1000x2000_10-4           491072        491088        +0.00%
BenchmarkDataFrame_Subset/1000x2000_100            3243072       3243088       +0.00%
BenchmarkDataFrame_Subset/1000x2000_100-2          3243074       3243102       +0.00%
BenchmarkDataFrame_Subset/1000x2000_100-4          3243076       3243100       +0.00%
BenchmarkDataFrame_Subset/1000x2000_1000           29891072      29891088      +0.00%
BenchmarkDataFrame_Subset/1000x2000_1000-2         29891086      29891797      +0.00%
BenchmarkDataFrame_Subset/1000x2000_1000-4         29891115      29891167      +0.00%

在此基准测试上运行分析器 (cpu/mem) 似乎没有发现任何重要信息。并发版本似乎在 rumtime.match_semaphore_signal 上花费了一些时间,但我想这是在等待 goroutines 完成时预料到的。

我尝试将启动的 goroutine 数量限制为 runtime.GOMAXPROCS(0) 报告的最大内核数量,但结果更糟。我是不是在这里做错了什么,或者 goroutines 的开销太大以至于对性能有如此显着的影响?

最佳答案

Goroutines 很便宜,但不是免费的。

我没有阅读您的代码,但如果您为处理的每个 行生成 NCOLS_INDEXSIZE goroutine,那么这是一个非常糟糕的做法。

这可以在您的基准测试中看到,其中您有 2k 列而只有 1k 行 - 您获得了非常大的改进。但在所有其他情况下,当列数 < 行数时,goroutine="">

相反,您应该生成一个 goroutines 池(接近您的 CPU 数量)并通过 channel 在它们之间分配工作——这是规范的方式。您可能想阅读 https://blog.golang.org/pipelines

关于performance - 子集 DataFrames 时的 Goroutines 开销和性能分析(Gota),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41209838/

有关performance - 子集 DataFrames 时的 Goroutines 开销和性能分析(Gota)的更多相关文章

  1. ruby-on-rails - 如何从过时的 TZInfo 标识符中获取 Rails TimeZone 名称? - 2

    已经有一个问题回答了如何将“America/Los_Angeles”转换为“PacificTime(US&Canada)”。但是我想将“美国/太平洋”和其他过时的时区转换为RailsTimeZone。我无法在图书馆中找到任何可以帮助我完成此任务的东西。 最佳答案 来自RailsActiveSupport::TimeZonedocs:TheversionofTZInfobundledwithActiveSupportonlyincludesthedefinitionsnecessarytosupportthezonesdefinedb

  2. ruby-on-rails - Resque - 类的未定义方法 'perform' - 2

    我目前对后台队列不太满意。我正在尝试让Resque工作。我已经安装了redis和Resquegem。Redis正在运行。一个worker正在运行(rakeresque:workQUEUE=simple)。使用Web界面,我可以看到工作人员正在运行并等待工作。当我运行“rakeget_updates”时,作业已排队但失败了。我已经用defself.perform和defperform试过了。发条.raketask:get_updates=>:environmentdoResque.enqueue(GetUpdates)end类文件(app/workers/get_updates.rb)c

  3. 建模分析 | 平面2R机器人(二连杆)运动学与动力学建模(附Matlab仿真) - 2

    目录0专栏介绍1平面2R机器人概述2运动学建模2.1正运动学模型2.2逆运动学模型2.3机器人运动学仿真3动力学建模3.1计算动能3.2势能计算与动力学方程3.3动力学仿真0专栏介绍?附C++/Python/Matlab全套代码?课程设计、毕业设计、创新竞赛必备!详细介绍全局规划(图搜索、采样法、智能算法等);局部规划(DWA、APF等);曲线优化(贝塞尔曲线、B样条曲线等)。?详情:图解自动驾驶中的运动规划(MotionPlanning),附几十种规划算法1平面2R机器人概述如图1所示为本文的研究本体——平面2R机器人。对参数进行如下定义:机器人广义坐标

  4. 网站日志分析软件--让网站日志分析工作变得更简单 - 2

    网站的日志分析,是seo优化不可忽视的一门功课,但网站越大,每天产生的日志就越大,大站一天都可以产生几个G的网站日志,如果光靠肉眼去分析,那可能看到猴年马月都看不完,因此借助网站日志分析工具去分析网站日志,那将会使网站日志分析工作变得更简单。下面推荐两款网站日志分析软件。第一款:逆火网站日志分析器逆火网站日志分析器是一款功能全面的网站服务器日志分析软件。通过分析网站的日志文件,不仅能够精准的知道网站的访问量、网站的访问来源,网站的广告点击,访客的地区统计,搜索引擎关键字查询等,还能够一次性分析多个网站的日志文件,让你轻松管理网站。逆火网站日志分析器下载地址:https://pan.baidu.

  5. ABB-IRB-1200运动学分析MATLAB RVC工具分析+Simulink-Adams联合仿真 - 2

    一、机器人介绍        此处是基于MATLABRVC工具箱,对ABB-IRB-1200型号的微型机械臂进行正逆向运动学分析,并利Simulink工具实现对机械臂进行具有动力学参数的末端轨迹规划仿真,最后根据机械模型设计Simulink-Adams联合仿真。 图1.ABBIRB 1200尺寸参数示意图ABBIRB 1200提供的两种型号广泛适用于各作业,且两者间零部件通用,两种型号的工作范围分别为700 mm 和 900 mm,大有效负载分别为 7 kg 和5 kg。 IRB 1200 能够在狭小空间内能发挥其工作范围与性能优势,具有全新的设计、小型化的体积、高效的性能、易于集成、便捷的接

  6. 关于Qt程序打包后运行库依赖的常见问题分析及解决方法 - 2

    目录一.大致如下常见问题:(1)找不到程序所依赖的Qt库version`Qt_5'notfound(requiredby(2)CouldnotLoadtheQtplatformplugin"xcb"in""eventhoughitwasfound(3)打包到在不同的linux系统下,或者打包到高版本的相同系统下,运行程序时,直接提示段错误即segmentationfault,或者Illegalinstruction(coredumped)非法指令(4)ldd应用程序或者库,查看运行所依赖的库时,直接报段错误二.问题逐个分析,得出解决方法:(1)找不到程序所依赖的Qt库version`Qt_5'

  7. ruby - 比较 rspec 中的 float 时的奇怪行为 - 2

    以下测试中的第3个失败:specify{(0.6*2).shouldeql(1.2)}specify{(0.3*3).shouldeql(0.3*3)}specify{(0.3*3).shouldeql(0.9)}#thisonefails这是为什么呢?这是浮点问题还是ruby​​或rspec问题? 最佳答案 从rspec-2.1开始specify{(0.6*2).shouldbe_within(0.01).of(1.2)}在那之前:specify{(0.6*2).shouldbe_close(1.2,0.01)}

  8. ruby - 使用 Ruby 在数组中查找大小为 N 的所有子集 - 2

    给定一个数组['a','b','c','d','e','f'],我如何获得包含两个的所有子集的列表、三、四元素?我是Ruby的新手(从C#迁移过来),不确定“Ruby之道”是什么。 最佳答案 查看Array#combination然后是这样的:2.upto(4){|n|array.combination(n)} 关于ruby-使用Ruby在数组中查找大小为N的所有子集,我们在StackOverflow上找到一个类似的问题: https://stackoverf

  9. ruby-on-rails - 如何使用 ruby​​-prof 和 JMeter 分析 Rails - 2

    我想使用ruby​​-prof和JMeter分析Rails应用程序。我对分析特定Controller/操作/或模型方法的建议方法不感兴趣,我想分析完整堆栈,从上到下。所以我运行这样的东西:RAILS_ENV=productionruby-prof-fprof.outscript/server>/dev/null然后我在上面运行我的JMeter测试计划。然而,问题是使用CTRL+C或SIGKILL中断它也会在ruby​​-prof可以写入任何输出之前杀死它。如何在不中断ruby​​-prof的情况下停止mongrel服务器? 最佳答案

  10. ruby - 使用 Nokogiri 替换时的编码问题 - 2

    我有这个代码:#encoding:utf-8require'nokogiri's="CaféVerona".encode('UTF-8')puts"Originalstring:#{s}"@doc=Nokogiri::HTML::DocumentFragment.parse(s)links=@doc.css('a')only_text='CaféVerona'.encode('UTF-8')puts"Replacementtext:#{only_text}"links.first.replace(only_text)puts@doc.to_html但是,输出是这样的:Originals

随机推荐