python - 如何将 .txt 文件解析为 .xml？

coder 2024-07-01 原文

这是我的txt文件:

In File Name:   C:\Users\naqushab\desktop\files\File 1.m1
Out File Name:  C:\Users\naqushab\desktop\files\Output\File 1.m2
In File Size:   Low:    22636   High:   0
Total Process time: 1.859000
Out File Size:  Low:    77619   High:   0

In File Name:   C:\Users\naqushab\desktop\files\File 2.m1
Out File Name:  C:\Users\naqushab\desktop\files\Output\File 2.m2
In File Size:   Low:    20673   High:   0
Total Process time: 3.094000
Out File Size:  Low:    94485   High:   0

In File Name:   C:\Users\naqushab\desktop\files\File 3.m1
Out File Name:  C:\Users\naqushab\desktop\files\Output\File 3.m2
In File Size:   Low:    66859   High:   0
Total Process time: 3.516000
Out File Size:  Low:    217268  High:   0

我正在尝试将其解析为这样的 XML 格式:

<?xml version='1.0' encoding='utf-8'?>
<root>
    <filedata>
        <InFileName>File 1.m1</InFileName>
        <OutFileName>File 1.m2</OutFileName>
        <InFileSize>22636</InFileSize>
        <OutFileSize>77619</OutFileSize>
        <ProcessTime>1.859000</ProcessTime>
    </filedata>
    <filedata>
        <InFileName>File 2.m1</InFileName>
        <OutFileName>File 2.m2</OutFileName>
        <InFileSize>20673</InFileSize>
        <OutFileSize>94485</OutFileSize>
        <ProcessTime>3.094000</ProcessTime>
    </filedata>
    <filedata>
        <InFileName>File 3.m1</InFileName>
        <OutFileName>File 3.m2</OutFileName>
        <InFileSize>66859</InFileSize>
        <OutFileSize>217268</OutFileSize>
        <ProcessTime>3.516000</ProcessTime>
    </filedata>
</root>

这是我试图实现的代码(我使用的是 Python 2):

import re
import xml.etree.ElementTree as ET

rex = re.compile(r'''(?P<title>In File Name:
                       |Out File Name:
                       |In File Size:   Low:
                       |Total Process time:
                       |Out File Size:  Low:
                     )
                     (?P<value>.*)
                     ''', re.VERBOSE)

root = ET.Element('root')
root.text = '\n'    # newline before the celldata element

with open('Performance.txt') as f:
    celldata = ET.SubElement(root, 'filedata')
    celldata.text = '\n'    # newline before the collected element
    celldata.tail = '\n\n'  # empty line after the celldata element
    for line in f:
        # Empty line starts new celldata element (hack style, uggly)
        if line.isspace():
            celldata = ET.SubElement(root, 'filedata')
            celldata.text = '\n'
            celldata.tail = '\n\n'

        # If the line contains the wanted data, process it.
        m = rex.search(line)
        if m:
            # Fix some problems with the title as it will be used
            # as the tag name.
            title = m.group('title')
            title = title.replace('&', '')
            title = title.replace(' ', '')

            e = ET.SubElement(celldata, title.lower())
            e.text = m.group('value')
            e.tail = '\n'

# Display for debugging
ET.dump(root)

# Include the root element to the tree and write the tree
# to the file.
tree = ET.ElementTree(root)
tree.write('Performance.xml', encoding='utf-8', xml_declaration=True)

但是我得到的是空值，是否可以将此 txt 解析为 XML？

最佳答案

对你的正则表达式的更正:它应该是

m = re.search('(?P<title>(In File Name)|(Out File Name)|(In File Size: *Low)|(Total Process time)|(Out File Size: *Low)):(?P<value>.*)',line)

而不是你给的。因为在您的正则表达式中，In File Name|Out File Name 意味着，它将检查 In File Nam 后跟 e 或 O 后跟 ut 文件名 等等。

建议，

您无需使用正则表达式即可完成。 xml.dom.minidom 可用于美化您的 xml 字符串。

为了更好地理解，我添加了内联评论!

Node.toprettyxml([indent=""[, newl=""[, encoding=""]]])

Return a pretty-printed version of the document. indent specifies the indentation string and defaults to a tabulator; newl specifies the string emitted at the end of each line and defaults to

编辑

import itertools as it
[line[0] for line in it.groupby(lines)]
you can use groupby of itertools package to group consucutive dedup in list lines

所以，

import xml.etree.ElementTree as ET
root = ET.Element('root')

with open('file1.txt') as f:
    lines = f.read().splitlines()

#add first subelement
celldata = ET.SubElement(root, 'filedata')

import itertools as it
#for every line in input file
#group consecutive dedup to one 
for line in it.groupby(lines):
    line=line[0]
    #if its a break of subelements  - that is an empty space
    if not line:
        #add the next subelement and get it as celldata
        celldata = ET.SubElement(root, 'filedata')
    else:
        #otherwise, split with : to get the tag name
        tag = line.split(":")
        #format tag name
        el=ET.SubElement(celldata,tag[0].replace(" ",""))
        tag=' '.join(tag[1:]).strip()
        
        #get file name from file path
        if 'File Name' in line:
            tag = line.split("\\")[-1].strip()
        elif 'File Size' in line:
            splist =  filter(None,line.split(" "))
            tag = splist[splist.index('Low:')+1]
            #splist[splist.index('High:')+1]
        el.text = tag

#prettify xml
import xml.dom.minidom as minidom
formatedXML = minidom.parseString(
                          ET.tostring(
                                      root)).toprettyxml(indent=" ",encoding='utf-8').strip()
# Display for debugging
print formatedXML

#write the formatedXML to file.
with open("Performance.xml","w+") as f:
    f.write(formatedXML)

输出: 性能.xml

<?xml version="1.0" encoding="utf-8"?>
<root>
 <filedata>
  <InFileName>File 1.m1</InFileName>
  <OutFileName>File 1.m2</OutFileName>
  <InFileSize>22636</InFileSize>
  <TotalProcesstime>1.859000</TotalProcesstime>
  <OutFileSize>77619</OutFileSize>
 </filedata>
 <filedata>
  <InFileName>File 2.m1</InFileName>
  <OutFileName>File 2.m2</OutFileName>
  <InFileSize>20673</InFileSize>
  <TotalProcesstime>3.094000</TotalProcesstime>
  <OutFileSize>94485</OutFileSize>
 </filedata>
 <filedata>
  <InFileName>File 3.m1</InFileName>
  <OutFileName>File 3.m2</OutFileName>
  <InFileSize>66859</InFileSize>
  <TotalProcesstime>3.516000</TotalProcesstime>
  <OutFileSize>217268</OutFileSize>
 </filedata>
</root>

希望对您有所帮助!

关于python - 如何将 .txt 文件解析为 .xml？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/42835956/

python txt File gt lt xml python-2.7 parsing elementtree

有关python - 如何将 .txt 文件解析为 .xml？的更多相关文章

ruby - 如何使用 Nokogiri 的 xpath 和 at_xpath 方法 - 2
我正在学习如何使用Nokogiri，根据这段代码我遇到了一些问题:require'rubygems'require'mechanize'post_agent=WWW::Mechanize.newpost_page=post_agent.get('http://www.vbulletin.org/forum/showthread.php?t=230708')puts"\nabsolutepathwithtbodygivesnil"putspost_page.parser.xpath('/html/body/div/div/div/div/div/table/tbody/tr/td/div
ruby - 如何从 ruby 中的字符串运行任意对象方法？ - 2
总的来说，我对ruby还比较陌生，我正在为我正在创建的对象编写一些rspec测试用例。许多测试用例都非常基础，我只是想确保正确填充和返回值。我想知道是否有办法使用循环结构来执行此操作。不必为我要测试的每个方法都设置一个assertEquals。例如:describeitem,"TestingtheItem"doit"willhaveanullvaluetostart"doitem=Item.new#HereIcoulddotheitem.name.shouldbe_nil#thenIcoulddoitem.category.shouldbe_nilendend但我想要一些方法来使用
Ruby 解析字符串 - 2
我有一个字符串input="maybe(thisis|thatwas)some((nice|ugly)(day|night)|(strange(weather|time)))"Ruby中解析该字符串的最佳方法是什么？我的意思是脚本应该能够像这样构建句子:maybethisissomeuglynightmaybethatwassomenicenightmaybethiswassomestrangetime等等，你明白了......我应该一个字符一个字符地读取字符串并构建一个带有堆栈的状态机来存储括号值以供以后计算，还是有更好的方法？也许为此目的准备了一个开箱即用的库？
ruby - 使用 RubyZip 生成 ZIP 文件时设置压缩级别 - 2
我有一个Ruby程序，它使用rubyzip压缩XML文件的目录树。gem。我的问题是文件开始变得很重，我想提高压缩级别，因为压缩时间不是问题。我在rubyzipdocumentation中找不到一种为创建的ZIP文件指定压缩级别的方法。有人知道如何更改此设置吗？是否有另一个允许指定压缩级别的Ruby库？最佳答案这是我通过查看rubyzip内部创建的代码。level=Zlib::BEST_COMPRESSIONZip::ZipOutputStream.open(zip_file)do|zip|Dir.glob("**/*")d
ruby - 其他文件中的 Rake 任务 - 2
我试图在一个项目中使用rake，如果我把所有东西都放到Rakefile中，它会很大并且很难读取/找到东西，所以我试着将每个命名空间放在lib/rake中它自己的文件中，我添加了这个到我的rake文件的顶部:Dir['#{File.dirname(__FILE__)}/lib/rake/*.rake'].map{|f|requiref}它加载文件没问题，但没有任务。我现在只有一个.rake文件作为测试，名为“servers.rake”，它看起来像这样:namespace:serverdotask:testdoputs"test"endend所以当我运行rakeserver:testid时
ruby-on-rails - 在 Rails 中将文件大小字符串转换为等效千字节 - 2
我的目标是转换表单输入，例如“100兆字节”或“1GB”，并将其转换为我可以存储在数据库中的文件大小(以千字节为单位)。目前，我有这个:defquota_convert@regex=/([0-9]+)(.*)s/@sizes=%w{kilobytemegabytegigabyte}m=self.quota.match(@regex)if@sizes.include?m[2]eval("self.quota=#{m[1]}.#{m[2]}")endend这有效，但前提是输入是倍数(“gigabytes”，而不是“gigabyte”)并且由于使用了eval看起来疯狂不安全。所以，功能正常，
python - 如何使用 Ruby 或 Python 创建一系列高音调和低音调的蜂鸣声？ - 2
关闭。这个问题是opinion-based.它目前不接受答案。想要改进这个问题？更新问题，以便editingthispost可以用事实和引用来回答它.关闭4年前。Improvethisquestion我想在固定时间创建一系列低音和高音调的哔哔声。例如:在150毫秒时发出高音调的蜂鸣声在151毫秒时发出低音调的蜂鸣声200毫秒时发出低音调的蜂鸣声250毫秒的高音调蜂鸣声有没有办法在Ruby或Python中做到这一点？我真的不在乎输出编码是什么(.wav、.mp3、.ogg等等)，但我确实想创建一个输出文件。
ruby-on-rails - Rails 3 中的多个路由文件 - 2
Rails2.3可以选择随时使用RouteSet#add_configuration_file添加更多路由。是否可以在Rails3项目中做同样的事情？最佳答案在config/application.rb中:config.paths.config.routes在Rails3.2(也可能是Rails3.1)中，使用:config.paths["config/routes"] 关于ruby-on-rails-Rails3中的多个路由文件，我们在StackOverflow上找到一个类似的问题
ruby-on-rails - 如何验证 update_all 是否实际在 Rails 中更新 - 2
给定这段代码defcreate@upgrades=User.update_all(["role=?","upgraded"],:id=>params[:upgrade])redirect_toadmin_upgrades_path,:notice=>"Successfullyupgradeduser."end我如何在该操作中实际验证它们是否已保存或未重定向到适当的页面和消息？最佳答案在Rails3中，update_all不返回任何有意义的信息，除了已更新的记录数(这可能取决于您的DBMS是否返回该信息)。http://ar.ru
ruby-on-rails - 'compass watch' 是如何工作的/它是如何与 rails 一起使用的 - 2
我在我的项目目录中完成了compasscreate.和compassinitrails。几个问题:我已将我的.sass文件放在public/stylesheets中。这是放置它们的正确位置吗？当我运行compasswatch时，它不会自动编译这些.sass文件。我必须手动指定文件:compasswatchpublic/stylesheets/myfile.sass等。如何让它自动运行？文件ie.css、print.css和screen.css已放在stylesheets/compiled。如何在编译后不让它们重新出现的情况下删除它们？我自己编译的.sass文件编译成compiled/t

python - 如何将 .txt 文件解析为 .xml？

有关python - 如何将 .txt 文件解析为 .xml？的更多相关文章

随机推荐