如何使用python提取文件中的某些文本？

我想提取文本文件的某些部分。我的输入文件：
- NUM Cell端口功能安全[CCELL DESVAL RSLT]
" 17(bc_1，clk，输入，x)"，＆
" 16(bc_1，oc_neg，输入，x)"，＆ - 合并输入/
" 8(bc_1，d(8)，输入，x)，"＆ - 单元16 @ 1-> hi -z
" 7(bc_1，q(1)，output3，x，16，1，z)，"＆
" 0(bc_1，q(8)，output3，x，16，1，z)"；
我需要这样的输出：
NUM Cell端口功能安全CCELL
17 BC_1 CLK输入x
16 BC_1 OC_NEG输入x
16 BC_1 *控制1
8 BC_1 D8输入x
7 BC_1 Q1输出3 x 16 1
0 BC_1 Q8输出3 x 16 1
到目前为止，我尝试了以下代码，但它给出了索引错误。请给个建议。

选择 | 换行 | 行号

 import re
lines=open("input.txt",'r').readlines()
 
for line in lines:
    a=re.findall(r'\w+',line)
    print re.findall(r'\w+',line)
    print a[0],a[1],a[2],a[3],a[4],a[5],a[6]
 
 
 

我正在使用Python 2.6.6，并赢了7和错误如下：['num'，'cell'，'port'，'function'，'safe'，'ccell'，'disval'，'desval'，'rslt'] num单元端口函数安全ccell disval ['17'，'bc_1'，'clk'，'input'，'x'] 17 bc_1 clk input x trackback(最新呼叫最后)：文件：c：\ users \ ctee1 \ desktop \ pyparsing \ outputparser.py"，第39行，印刷a [0]，a [1]，a [2]，a [3]，a [4]，a [5]，a [6] indexError：list：list索引超出范围
感谢Maximus

附加的文件

input.txt

(863个字节，683个视图)

output.txt

(528个字节，599个视图)

# 回答1

我相信这是因为" 17 BC_1 CLK输入X"中只有5个元素，而您正在尝试将7(A [0])打印到[6])。

# 回答2

谢谢会重新融入其中。

# 回答3

更安全地执行此操作的简单方法就是这样

选择 | 换行 | 行号

 import re
lines=open("input.txt",'r').readlines()
 
for line in lines:
    a=re.findall(r'\w+',line)
    print re.findall(r'\w+',line)
    #print a[0],a[1],a[2],a[3],a[4],a[5],a[6]
    for b in a:
        print b,
    print
 

# 回答4

嗨，谢谢您的f/back。
想检查最后一行打印是否是错字吗？而且在for循环中，还有一个印刷品吗？
我正在考虑做：

选择 | 换行 | 行号

 for line in lines:
    a=line.split('-')[0]
    print a
    for b in a: 
        print b,
    print
 
 

# 回答5

"打印B"是在不进入新线路的情况下打印B。要等于打印a [0]，a [1]，a [2]，...对于所需的数量。
最后的"打印"是建立一条新线路。
我假设您的正则表达方式在起作用，但也许不是。我无法想象您的分裂表达也可以起作用。
也许您可以解释您要实现的目标的逻辑。考虑到该输入，您可以通过多种方法获得该输出。但是，更一般的规则是什么？例如，第一行总是以这种格式？您是否发现所有行都是两种格式之一？如果您可以提供有关要实现的目标的更多信息，那么可以更轻松地提供帮助。

# 回答6

对混乱表示歉意。
通用规则：
1)input.txt内部的第一行(如第一篇文章中所述)：
- NUM Cell端口功能安全[CCELL DESVAL RSLT]
我只需要Num Cell端口功能安全CCELL
但是，我下面的脚本无法得到这个，所以我跳过了线条。如果您能告诉我，那将很棒。
2)我正在尝试转换输入中的行
" 17(bc_1，clk，输入，x)"，＆
进入17 BC_1 CLK输入X
基本上，我只是提取NUM，单元，端口，功能，安全和CCELL的列。其余的是n 需要。
和" 7(bc_1，q(1)，output3，x，16，1，z)"，＆
进入7 BC_1 Q1输出3 x 16 1
到目前为止，我的脚本是导致output.txt(如第一篇文章中附加)：

选择 | 换行 | 行号

 import re
 
fileIn = open("input.txt", "rb")
fileOut = open("output.txt", "w")
 
for strData in fileIn:
    strData = strData.split('-')[0] #this is to remove the first line
 
    if("input" in strData):
        a=re.split("\W+", strData)
        #print a
        #fileOut.write (' '.join(a[1:7]) )
        fileOut.write(a[1]+' '+a[2]+' '+a[3]+' '+a[4]+' '+a[5]+' '+a[6]+'\n')
 
    if("output" in strData):
        a=re.split("\W+", strData)
        #print a
        fileOut.write(a[1]+' '+a[2]+' '+a[3]+' '+a[4]+' '+a[5]+' '+a[6]+' '+a[7]+'\n')
 
 

# 回答7

正则表达是您需要的。他们需要一些习惯，但是一旦您掌握了它，就可以出色地工作。
您的input.txt文件中您有不同数量的变量，因此以下变量可与最初发布的格式使用，但与后者没有：

选择 | 换行 | 行号

 import re
 
lines=open("input.txt","r")
 
p=re.compile('   " *(.*) \((.*), (.*), (.*), (.*)\)," &.*')
 
 
for line in lines:
    m=p.match(line)
    if not m: continue
    for i in range(1,6):
        print m.group(i),
    print
 

给予

选择 | 换行 | 行号

 17 BC_1 CLK input X
15 BC_1 D(1) input X
14 BC_1 D(2) input X
13 BC_1 D(3) input X
12 BC_1 D(4) input X
11 BC_1 D(5) input X
10 BC_1 D(6) input X
9 BC_1 D(7) input X
8 BC_1 D(8) input X
7 BC_1, Q(1), output3, X 16 1 Z
6 BC_1, Q(2), output3, X 16 1 Z
5 BC_1, Q(3), output3, X 16 1 Z
4 BC_1, Q(4), output3, X 16 1 Z
3 BC_1, Q(5), output3, X 16 1 Z
2 BC_1, Q(6), output3, X 16 1 Z
1 BC_1, Q(7), output3, X 16 1 Z
 

# 回答8

哇，只有几行代码。我不明确表达它，这很难。
不明白：
1)p = re.compile('*(。*)\((。*)，(。*)，(。*)，(。*)，(。*)\)，""
2)m = p.Match(线)，匹配行是什么意思？
3)M.Group(i)，它是什么组？
我尝试打印，但它仅打印地址。
谢谢

# 回答9

你可以看
这里
获取有关正则表达方式的更多详细信息。
我不明白您的陈述"我尝试打印，但它只是打印地址"。这是什么意思？该代码应工作以创建我给出的输出。
re.compile是为了制作一个模式，您可以将一些文本与之匹配。在这种情况下，它寻找以下内容：
'""是字符串的开始
' *'是一定数量的空间(大于或等于0)
'(。*)'是任何字符(。)和任何长度(*)的字符串。括号说这是您要找到的组之一(因此，M.Group(1)将返回其中的字符串
'\('查找字符串"("。您需要逃脱字符('\')，因为(是特殊字符之一(见上文)
'(。*)，'找到下一个字符字符串(对于第(2)组)，然后是逗号(，)和space()。
等等。您现在可能会明白这个想法。
然后m是一个匹配对象，从匹配模式(p)到行(这是输入.txt的行)。
然后，M.Group(i)是指您所说的组，以通过将括号()放在它们周围来选择。
请注意，正则表达式是"贪婪"的，因为他们找到了适合图案的最大弦(从左开始)。因此，在最后7行input.txt组(1)是字符串" bc_1，q(1)，ouput3，x"，我认为这不是您想要的。

# 回答10

感谢您的指导。 RE是Python中最难理解的话题之一。
对不起，困惑。我试图通过打印来理解您的代码。例如：

选择 | 换行 | 行号

 p=re.compile('   " *(.*) \((.*), (.*), (.*), (.*)\)," &.*') 
print p

印刷的：
和

选择 | 换行 | 行号

     m=p.match(line)
    print m

印刷的：
没有任何
17 BC_1 CLK输入x
没有任何
没有任何
15 BC_1 D(1)输入x
14 BC_1 D(2)输入x
13 BC_1 D(3)输入x
12 BC_1 D(4)输入x
11 BC_1 D(5)输入x
10 BC_1 D(6)输入x
9 BC_1 D(7)输入x
8 BC_1 D(8)输入x
7 bc_1，q(1)，output3，x 16 1 z
6 BC_1，Q(2)，输出3，x 16 1 z
5 BC_1，Q(3)，输出3，x 16 1 z
4 bc_1，q(4)，output3，x 16 1 z
3 BC_1，Q(5)，输出3，x 16 1 z
2 BC_1，Q(6)，输出3，x 16 1 z
1 BC_1，Q(7)，输出3，x 16 1 z
没有任何

# 回答11

是的，重新对象恐怕不要打印好。您需要使用其属性或方法。调试和理解可能有些沮丧。尝试阅读我上一篇文章中的文档链接。祝你好运！

# 回答12

或者，您可以无需重复

选择 | 换行 | 行号

 import re
 
lines=open("input.txt","r")
 
 
for line in lines:
    l1=line.replace(" ","").replace('"','').split(",")  #Remove the spaces from the line and separate on ,
    if len(l1)<2: continue   #to avoid lines that don't fit the general pattern
    l2=l1[0].split("(")
    l3=[l1[-2].replace(")","")]
    l4=l2+l1[1:-2]+l3
    print l4
 

给出这个：

选择 | 换行 | 行号

 >>> 
['17', 'BC_1', 'CLK', 'input', 'X']
['16', 'BC_1', 'OC_NEG', 'input', 'X']
['16', 'BC_1', '*', 'control', '1']
['15', 'BC_1', 'D(1)', 'input', 'X']
['14', 'BC_1', 'D(2)', 'input', 'X']
['13', 'BC_1', 'D(3)', 'input', 'X']
['12', 'BC_1', 'D(4)', 'input', 'X']
['11', 'BC_1', 'D(5)', 'input', 'X']
['10', 'BC_1', 'D(6)', 'input', 'X']
['9', 'BC_1', 'D(7)', 'input', 'X']
['8', 'BC_1', 'D(8)', 'input', 'X']
['7', 'BC_1', 'Q(1)', 'output3', 'X', '16', '1', 'Z']
['6', 'BC_1', 'Q(2)', 'output3', 'X', '16', '1', 'Z']
['5', 'BC_1', 'Q(3)', 'output3', 'X', '16', '1', 'Z']
['4', 'BC_1', 'Q(4)', 'output3', 'X', '16', '1', 'Z']
['3', 'BC_1', 'Q(5)', 'output3', 'X', '16', '1', 'Z']
['2', 'BC_1', 'Q(6)', 'output3', 'X', '16', '1', 'Z']
['1', 'BC_1', 'Q(7)', 'output3', 'X', '16', '1', 'Z']
['0', 'BC_1', 'Q(8)', 'output3', 'X', '16', '1']
 

显然，您可以根据需要使用列表的内容

如何使用python提取文件中的某些文本？

添加新评论

最新文章

分类

最近回复

归档

其它