ruby - Ruby 2.2 中的垃圾收集器引发意想不到的 CoW

coder 2025-05-31 原文

当我 fork 我的进程时，如何防止 GC 引发写时复制？由于我在我的程序中遇到了一些内存问题(我的 60 核 0.5Tb 机器上的内存不足，即使是相当小的任务)，我最近一直在分析 Ruby 中垃圾收集器的行为。对我来说，这确实限制了 ruby 在多核服务器上运行程序的实用性。我想在这里展示我的实验和结果。

当垃圾收集器在 fork 期间运行时会出现此问题。我调查了三个案例来说明这个问题。

案例一:我们使用数组在内存中分配了很多对象(不超过20字节的字符串)。字符串是使用随机数和字符串格式创建的。当进程 fork 并且我们强制 GC 在子进程中运行时，所有共享内存都变为私有(private)，导致初始内存重复。

案例2:我们使用数组在内存中分配了很多对象(字符串)，但是字符串是使用rand.to_s 函数创建的，因此与前一种情况相比，我们删除了数据的格式。我们最终使用的内存量较少，大概是因为垃圾较少。当进程 fork 并且我们强制 GC 在子进程中运行时，只有部分内存变为私有(private)。我们有初始内存的复制，但程度较小。

情况 3:与之前相比，我们分配的对象更少，但对象更大，因此分配的内存量与之前的情况相同。当进程 fork 并且我们强制 GC 在子进程中运行时，所有内存保持共享，即没有内存重复。

我在这里粘贴了用于这些实验的 Ruby 代码。要在 case 之间切换，您只需要更改 memory_object 函数中的“option”值。代码在 Ubuntu 14.04 机器上使用 Ruby 2.2.2、2.2.1、2.1.3、2.1.5 和 1.9.3 进行了测试。

案例 1 的示例输出:

ruby version 2.2.2 
 proces   pid log                   priv_dirty   shared_dirty 
 Parent  3897 post alloc                   38            0 
 Parent  3897 4 fork                        0           37 
 Child   3937 4 initial                     0           37 
 Child   3937 8 empty GC                   35            5

完全相同的代码是用 Python 编写的，在所有情况下，CoW 都运行良好。

案例 1 的示例输出:

python version 2.7.6 (default, Mar 22 2014, 22:59:56) 
[GCC 4.8.2] 
 proces   pid log                   priv_dirty shared_dirty 
 Parent  4308 post alloc                35             0 
 Parent  4308 4 fork                     0            35 
 Child   4309 4 initial                  0            35 
 Child   4309 10 empty GC                1            34

ruby 代码

$start_time=Time.new

# Monitor use of Resident and Virtual memory.
class Memory

    shared_dirty = '.+?Shared_Dirty:\s+(\d+)'
    priv_dirty = '.+?Private_Dirty:\s+(\d+)'
    MEM_REGEXP = /#{shared_dirty}#{priv_dirty}/m

    # get memory usage
    def self.get_memory_map( pids)
        memory_map = {}
        memory_map[ :pids_found] = {}
        memory_map[ :shared_dirty] = 0
        memory_map[ :priv_dirty] = 0

        pids.each do |pid|
            begin
                lines = nil
                lines = File.read( "/proc/#{pid}/smaps")
            rescue
                lines = nil
            end
            if lines
                lines.scan(MEM_REGEXP) do |shared_dirty, priv_dirty|
                    memory_map[ :pids_found][pid] = true
                    memory_map[ :shared_dirty] += shared_dirty.to_i
                    memory_map[ :priv_dirty] += priv_dirty.to_i
                end
            end
        end
        memory_map[ :pids_found] = memory_map[ :pids_found].keys
        return memory_map
    end

    # get the processes and get the value of the memory usage
    def self.memory_usage( )
        pids   = [ $$]
        result = self.get_memory_map( pids)

        result[ :pids]   = pids
        return result
    end

    # print the values of the private and shared memories
    def self.log( process_name='', log_tag="")
        if process_name == "header"
            puts " %-6s %5s %-12s %10s %10s\n" % ["proces", "pid", "log", "priv_dirty", "shared_dirty"]
        else
            time = Time.new - $start_time
            mem = Memory.memory_usage( )
            puts " %-6s %5d %-12s %10d %10d\n" % [process_name, $$, log_tag, mem[:priv_dirty]/1000, mem[:shared_dirty]/1000]
        end
    end
end

# function to delay the processes a bit
def time_step( n)
    while Time.new - $start_time < n
        sleep( 0.01)
    end
end

# create an object of specified size. The option argument can be changed from 0 to 2 to visualize the behavior of the GC in various cases
#
# case 0 (default) : we make a huge array of small objects by formatting a string
# case 1 : we make a huge array of small objects without formatting a string (we use the to_s function)
# case 2 : we make a smaller array of big objects
def memory_object( size, option=1)
    result = []
    count = size/20

    if option > 3 or option < 1
        count.times do
            result << "%20.18f" % rand
        end
    elsif option == 1
        count.times do
            result << rand.to_s
        end
    elsif option == 2
        count = count/10
        count.times do
            result << ("%20.18f" % rand)*30
        end
    end

    return result
end

##### main #####

puts "ruby version #{RUBY_VERSION}"

GC.disable

# print the column headers and first line
Memory.log( "header")

# Allocation of memory
big_memory = memory_object( 1000 * 1000 * 10)

Memory.log( "Parent", "post alloc")

lab_time = Time.new - $start_time
if lab_time < 3.9
    lab_time = 0
end

# start the forking
pid = fork do
    time = 4
    time_step( time + lab_time)
    Memory.log( "Child", "#{time} initial")

    # force GC when nothing happened
    GC.enable; GC.start; GC.disable

    time = 8
    time_step( time + lab_time)
    Memory.log( "Child", "#{time} empty GC")

    sleep( 1)
    STDOUT.flush
    exit!
end

time = 4
time_step( time + lab_time)
Memory.log( "Parent", "#{time} fork")

# wait for the child to finish
Process.wait( pid)

Python代码

import re
import time
import os
import random
import sys
import gc

start_time=time.time()

# Monitor use of Resident and Virtual memory.
class Memory:   

    def __init__(self):
        self.shared_dirty = '.+?Shared_Dirty:\s+(\d+)'
        self.priv_dirty = '.+?Private_Dirty:\s+(\d+)'
        self.MEM_REGEXP = re.compile("{shared_dirty}{priv_dirty}".format(shared_dirty=self.shared_dirty, priv_dirty=self.priv_dirty), re.DOTALL)

    # get memory usage
    def get_memory_map(self, pids):
        memory_map = {}
        memory_map[ "pids_found" ] = {}
        memory_map[ "shared_dirty" ] = 0
        memory_map[ "priv_dirty" ] = 0

        for pid in pids:
            try:
                lines = None

                with open( "/proc/{pid}/smaps".format(pid=pid), "r" ) as infile:
                    lines = infile.read()
            except:
                lines = None

            if lines:
                for shared_dirty, priv_dirty in re.findall( self.MEM_REGEXP, lines ):
                    memory_map[ "pids_found" ][pid] = True
                    memory_map[ "shared_dirty" ] += int( shared_dirty )
                    memory_map[ "priv_dirty" ] += int( priv_dirty )     

        memory_map[ "pids_found" ] = memory_map[ "pids_found" ].keys()
        return memory_map

    # get the processes and get the value of the memory usage   
    def memory_usage( self):
        pids   = [ os.getpid() ]
        result = self.get_memory_map( pids)

        result[ "pids" ]   = pids

        return result

    # print the values of the private and shared memories
    def log( self, process_name='', log_tag=""):
        if process_name == "header":
            print " %-6s %5s %-12s %10s %10s" % ("proces", "pid", "log", "priv_dirty", "shared_dirty")
        else:
            global start_time
            Time = time.time() - start_time
            mem = self.memory_usage( )
            print " %-6s %5d %-12s %10d %10d" % (process_name, os.getpid(), log_tag, mem["priv_dirty"]/1000, mem["shared_dirty"]/1000)

# function to delay the processes a bit
def time_step( n):
    global start_time
    while (time.time() - start_time) < n:
        time.sleep( 0.01)

# create an object of specified size. The option argument can be changed from 0 to 2 to visualize the behavior of the GC in various cases
#
# case 0 (default) : we make a huge array of small objects by formatting a string
# case 1 : we make a huge array of small objects without formatting a string (we use the to_s function)
# case 2 : we make a smaller array of big objects                                       
def memory_object( size, option=2):
    count = size/20

    if option > 3 or option < 1:
        result = [ "%20.18f"% random.random() for i in xrange(count) ]

    elif option == 1:
        result = [ str( random.random() ) for i in xrange(count) ]

    elif option == 2:
        count = count/10
        result = [ ("%20.18f"% random.random())*30 for i in xrange(count) ]

    return result

##### main #####

print "python version {version}".format(version=sys.version)

memory = Memory()

gc.disable()

# print the column headers and first line
memory.log( "header")   # Print the headers of the columns

# Allocation of memory
big_memory = memory_object( 1000 * 1000 * 10)   # Allocate memory

memory.log( "Parent", "post alloc")

lab_time = time.time() - start_time
if lab_time < 3.9:
    lab_time = 0

# start the forking
pid = os.fork()     # fork the process
if pid == 0:
    Time = 4
    time_step( Time + lab_time)
    memory.log( "Child", "{time} initial".format(time=Time))

    # force GC when nothing happened
    gc.enable(); gc.collect(); gc.disable();

    Time = 10
    time_step( Time + lab_time)
    memory.log( "Child", "{time} empty GC".format(time=Time))

    time.sleep( 1)

    sys.exit(0)

Time = 4
time_step( Time + lab_time)
memory.log( "Parent", "{time} fork".format(time=Time))

# Wait for child process to finish
os.waitpid( pid, 0)

编辑

确实，在 fork 进程之前多次调用 GC 解决了这个问题，我很惊讶。我也使用 Ruby 2.0.0 运行代码，但问题甚至没有出现，因此它必须与您提到的这一代 GC 相关。但是，如果我调用 memory_object 函数而不将输出分配给任何变量(我只是在创建垃圾)，那么内存就会被复制。复制的内存量取决于我创建的垃圾量 - 垃圾越多，私有(private)内存就越多。

有什么办法可以防止这种情况发生吗？

结果如下

在 2.0.0 中运行 GC

ruby version 2.0.0
 proces   pid log          priv_dirty shared_dirty
 Parent  3664 post alloc           67          0
 Parent  3664 4 fork                1         69
 Child   3700 4 initial             1         69
 Child   3700 8 empty GC            6         65

在子进程中调用memory_object(1000*1000)

ruby version 2.0.0
 proces   pid log          priv_dirty shared_dirty
 Parent  3703 post alloc           67          0
 Parent  3703 4 fork                1         70
 Child   3739 4 initial             1         70
 Child   3739 8 empty GC           15         56

调用内存对象(1000*1000*10)

ruby version 2.0.0
 proces   pid log          priv_dirty shared_dirty
 Parent  3743 post alloc           67          0
 Parent  3743 4 fork                1         69
 Child   3779 4 initial             1         69
 Child   3779 8 empty GC           89          5

最佳答案

UPD2

突然想通了为什么在格式化字符串时所有内存都变为私有(private)——格式化期间会产生垃圾，禁用 GC，然后启用 GC，并且在生成的数据中有已释放对象的漏洞。然后你 fork ，新的垃圾开始占据这些洞，垃圾越多 - 私有(private)页面越多。

所以我添加了一个清理函数以每 2000 个周期运行一次 GC(只是启用惰性 GC 没有帮助):

count.times do |i|
  cleanup(i)
  result << "%20.18f" % rand
end

#......snip........#

def cleanup(i)
      if ((i%2000).zero?)
        GC.enable; GC.start; GC.disable
      end
end   

##### main #####

这导致(在 fork 之后生成 memory_object( 1000 * 1000 * 10)):

RUBY_GC_HEAP_INIT_SLOTS=600000 ruby gc-test.rb 0
ruby version 2.2.0
 proces   pid log          priv_dirty shared_dirty
 Parent  2501 post alloc           35          0
 Parent  2501 4 fork                0         35
 Child   2503 4 initial             0         35
 Child   2503 8 empty GC           28         22

是的，它会影响性能，但只会在 fork 之前，即在您的情况下会增加加载时间。

UPD1

刚找到criteria ruby 2.2 通过它设置旧对象位，它是 3 个 GC，所以如果你在 fork 之前添加以下内容:

GC.enable; 3.times {GC.start}; GC.disable
# start the forking

你会得到(命令行中的选项是1):

$ RUBY_GC_HEAP_INIT_SLOTS=600000 ruby gc-test.rb 1
ruby version 2.2.0
 proces   pid log          priv_dirty shared_dirty
 Parent  2368 post alloc           31          0
 Parent  2368 4 fork                1         34
 Child   2370 4 initial             1         34
 Child   2370 8 empty GC            2         32

但这需要进一步测试这些对象在未来 GC 上的行为，至少在 100 次 GC 之后 :old_objects 保持不变，所以我想它应该没问题

用 GC.stat 记录是 here

顺便说一句，还有一个选项 RGENGC_OLD_NEWOBJ_CHECK从头开始创建旧对象，但我怀疑这是个好主意，但可能对特定情况有用。

第一个答案

我在上面评论中的主张是错误的，实际上位图表是救世主。

(option = 1)

ruby version 2.0.0
 proces   pid log          priv_dirty shared_dirty
 Parent 14807 post alloc           27          0
 Parent 14807 4 fork                0         27
 Child  14809 4 initial             0         27
 Child  14809 8 empty GC            6         25 # << almost everything stays shared <<

还手动测试了 Ruby 企业版，它只比最坏的情况好一半。

ruby version 1.8.7
 proces   pid log          priv_dirty shared_dirty
 Parent 15064 post alloc           86          0
 Parent 15064 4 fork                2         84
 Child  15065 4 initial             2         84
 Child  15065 8 empty GC           40         46

(我通过将 RUBY_GC_HEAP_INIT_SLOTS 增加到 600k 使脚本严格运行 1 次 GC)

关于ruby - Ruby 2.2 中的垃圾收集器引发意想不到的 CoW，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/30353272/

收集器意想 memory dirty 34 ruby garbage-collection fork shared-memory copy-on-write

有关ruby - Ruby 2.2 中的垃圾收集器引发意想不到的 CoW的更多相关文章

ruby - 如何使用 Nokogiri 的 xpath 和 at_xpath 方法 - 2
我正在学习如何使用Nokogiri，根据这段代码我遇到了一些问题:require'rubygems'require'mechanize'post_agent=WWW::Mechanize.newpost_page=post_agent.get('http://www.vbulletin.org/forum/showthread.php?t=230708')puts"\nabsolutepathwithtbodygivesnil"putspost_page.parser.xpath('/html/body/div/div/div/div/div/table/tbody/tr/td/div
ruby - 如何从 ruby 中的字符串运行任意对象方法？ - 2
总的来说，我对ruby还比较陌生，我正在为我正在创建的对象编写一些rspec测试用例。许多测试用例都非常基础，我只是想确保正确填充和返回值。我想知道是否有办法使用循环结构来执行此操作。不必为我要测试的每个方法都设置一个assertEquals。例如:describeitem,"TestingtheItem"doit"willhaveanullvaluetostart"doitem=Item.new#HereIcoulddotheitem.name.shouldbe_nil#thenIcoulddoitem.category.shouldbe_nilendend但我想要一些方法来使用
Ruby 解析字符串 - 2
我有一个字符串input="maybe(thisis|thatwas)some((nice|ugly)(day|night)|(strange(weather|time)))"Ruby中解析该字符串的最佳方法是什么？我的意思是脚本应该能够像这样构建句子:maybethisissomeuglynightmaybethatwassomenicenightmaybethiswassomestrangetime等等，你明白了......我应该一个字符一个字符地读取字符串并构建一个带有堆栈的状态机来存储括号值以供以后计算，还是有更好的方法？也许为此目的准备了一个开箱即用的库？
ruby - 使用 RubyZip 生成 ZIP 文件时设置压缩级别 - 2
我有一个Ruby程序，它使用rubyzip压缩XML文件的目录树。gem。我的问题是文件开始变得很重，我想提高压缩级别，因为压缩时间不是问题。我在rubyzipdocumentation中找不到一种为创建的ZIP文件指定压缩级别的方法。有人知道如何更改此设置吗？是否有另一个允许指定压缩级别的Ruby库？最佳答案这是我通过查看rubyzip内部创建的代码。level=Zlib::BEST_COMPRESSIONZip::ZipOutputStream.open(zip_file)do|zip|Dir.glob("**/*")d
ruby - 为什么我可以在 Ruby 中使用 Object#send 访问私有(private)/ protected 方法？ - 2
类classAprivatedeffooputs:fooendpublicdefbarputs:barendprivatedefzimputs:zimendprotecteddefdibputs:dibendendA的实例a=A.new测试a.foorescueputs:faila.barrescueputs:faila.zimrescueputs:faila.dibrescueputs:faila.gazrescueputs:fail测试输出failbarfailfailfail.发送测试[:foo,:bar,:zim,:dib,:gaz].each{|m|a.send(m)resc
ruby-on-rails - 使用 Ruby on Rails 进行自动化测试 - 最佳实践 - 2
很好奇，就使用rubyonrails自动化单元测试而言，你们正在做什么？您是否创建了一个脚本来在cron中运行rake作业并将结果邮寄给您？git中的预提交Hook？只是手动调用？我完全理解测试，但想知道在错误发生之前捕获错误的最佳实践是什么。让我们理所当然地认为测试本身是完美无缺的，并且可以正常工作。下一步是什么以确保他们在正确的时间将可能有害的结果传达给您？最佳答案不确定您到底想听什么，但是有几个级别的自动代码库控制:在处理某项功能时，您可以使用类似autotest的内容获得关于哪些有效，哪些无效的即时反馈。要确保您的提
ruby-on-rails - rails : keeping DRY with ActiveRecord models that share similar complex attributes - 2
这似乎应该有一个直截了当的答案，但在Google上花了很多时间，所以我找不到它。这可能是缺少正确关键字的情况。在我的RoR应用程序中，我有几个模型共享一种特定类型的字符串属性，该属性具有特殊验证和其他功能。我能想到的最接近的类似示例是表示URL的字符串。这会导致模型中出现大量重复(甚至单元测试中会出现更多重复)，但我不确定如何让它更DRY。我能想到几个可能的方向...按照“validates_url_format_of”插件，但这只会让验证干给这个特殊的字符串它自己的模型，但这看起来很像重溶液为这个特殊的字符串创建一个ruby类，但是我如何得到ActiveRecord关联这个类模型
ruby - 在 Ruby 中使用匿名模块 - 2
假设我做了一个模块如下:m=Module.newdoclassCendend三个问题:除了对m的引用之外，还有什么方法可以访问C和m中的其他内容？我可以在创建匿名模块后为其命名吗(就像我输入“module...”一样)？如何在使用完匿名模块后将其删除，使其定义的常量不再存在？最佳答案三个答案:是的，使用ObjectSpace.此代码使c引用你的类(class)C不引用m:c=nilObjectSpace.each_object{|obj|c=objif(Class===objandobj.name=~/::C$/)}当然这取决于
ruby - 其他文件中的 Rake 任务 - 2
我试图在一个项目中使用rake，如果我把所有东西都放到Rakefile中，它会很大并且很难读取/找到东西，所以我试着将每个命名空间放在lib/rake中它自己的文件中，我添加了这个到我的rake文件的顶部:Dir['#{File.dirname(__FILE__)}/lib/rake/*.rake'].map{|f|requiref}它加载文件没问题，但没有任务。我现在只有一个.rake文件作为测试，名为“servers.rake”，它看起来像这样:namespace:serverdotask:testdoputs"test"endend所以当我运行rakeserver:testid时
ruby - 如何在 Ruby 中顺序创建 PI - 2
出于纯粹的兴趣，我很好奇如何按顺序创建PI，而不是在过程结果之后生成数字，而是让数字在过程本身生成时显示。如果是这种情况，那么数字可以自行产生，我可以对以前看到的数字实现垃圾收集，从而创建一个无限系列。结果只是在Pi系列之后每秒生成一个数字。这是我通过互联网筛选的结果:这是流行的计算机友好算法，类机器算法:defarccot(x,unity)xpow=unity/xn=1sign=1sum=0loopdoterm=xpow/nbreakifterm==0sum+=sign*(xpow/n)xpow/=x*xn+=2sign=-signendsumenddefcalc_pi(digits

ruby - Ruby 2.2 中的垃圾收集器引发意想不到的 CoW

编辑

UPD2

UPD1

第一个答案

有关ruby - Ruby 2.2 中的垃圾收集器引发意想不到的 CoW的更多相关文章

随机推荐