将HTML文件解析为EXCEL
你好,
希望你们都很好。
问题:
我在目录上有几个HTML文件(发票)。
我需要的是阅读HTML内容(客户名称,bill_no,会费等),然后将它们存储在.csv文件中...
[我已附上文件iveice.txt,plz在IE中打开它]
我有一个已经存在的解析器(已连接的Parser.sh),可以正常工作。
请帮助我将现有的Shell Parser转换为Python Parser。
Shell Parser代码如下:
选择 | 换行 | 行号
- #!/bin/bash
- echo "Script started \n";
- echo "\"BILL_NUMBER\",\"ACCOUNT_NUMBER\",\"USERNAME\",\"CUSTOMER_NAME\",\"CONTACT_NO\",\"EMAIL\",
- \"PACKAGE_PLAN\",\"TOTAL_AMOUNT_DUE_TOP\",\"PAYMENT_DUE_DATE\",\"TOTAL_AMOUNT_AFTER_DUE_DATE\",\"BILLING_PERIOD\",
- \"PERVIOUS_BALANCE\",\"PAYMENTS\",\"NET_PREVIOUS_BALANCE\",\"SUBORDINATE_AC_CHARGES\",\"INITIAL_CHARGES\",
- \"MONTHLY_LINE_RENT\",\"ANTIVIRUS_LINE_RENT\",\"PARENTAL_LINE_RENT\",\"EXTRA_USAGE\",\"Extra Usage-2GB_COUNT\",
- \"Extra Usage-2GB_AMOUNT\",\"Extra Usage-5GB_COUNT\",\"Extra Usage-5GB_AMOUNT\",\"SPEED_BOOST_COUNT\",
- \"SPEED_BOOST_AMOUNT\",\"HAPPY_DAYS_1_COUNT\",\"HAPPY_DAYS_1_AMOUNT\",\"HAPPY_DAYS_3_COUNT\",
- \"HAPPY_DAYS_3_AMOUNT\",\"StaticIP_COUNT\",\"StaticIP_AMOUNT\",\"PayAsYouGo_05Day_COUNT\",\"PayAsYouGo_05Day_AMOUNT\",
- \"PayAsYouGo_10Day_COUNT\",\"PayAsYouGo_10Day_AMOUNT\",\"PayAsYouGo_30Day_COUNT\",\"PayAsYouGo_30Day_AMOUNT\",
- \"PayAsYouGo_03Day_COUNT\",\"PayAsYouGo_03Day_AMOUNT\",\"PayAsYouGo_07Day_COUNT\",\"PayAsYouGo_07Day_AMOUNT\",
- \"PayAsYouGo_15Day_COUNT\",\"PayAsYouGo_15Day_AMOUNT\",\"Power Hours-30Days_COUNT\",\"Power Hours-30Day_AMOUNT\",
- \"DISCOUNTS\",\"ADJUSTMENTS\",\"DEVICE_CHANGE_CHARGES\",\"PLAN_CHANGE_CHARGES\",\"DEVICE_DAMAGE_CHARGES\",
- \"DEVICE_LOST_CHARGES\",\"ACCOUNT_FREEZE_CHARGES\",\"LATE_PAYMENT_CHARGES\",\"SUBTOTAL\",\"FEDERAL_EXCISE_DUTY\",
- \"ADVANCE_WITHHOLDING_TAX\",\"CURRENT_CHARGES\",\"ToTAL_DUE\"" > parsed.csv
- for file in *.html; do
- cat $file | grep '<!--B' | awk 'BEGIN{ RS = ""; FS = "|"} { for( i=1;i<=NF;i+=1) { split($i,a,"##"); printf "\"" a[2] "\","} printf "\n" }'
- >> parsed.csv
- done
- echo "Script finished";
如果有人可以以python格式分享。
附加的文件
Invoice.txt
(333.5 kb,3862次观点)