系列文章

第一章 python办公自动化之批量修改docx——以修改含有表格的word文件为例

第二章 python办公自动化之批量生成docx——根据excel (word)生成word (excel)

第三章 python办公自动化之python-docx,openpyxl——根据excel(word表格)填写word表格(excel)

本文目录

一、从excel读取

使用pandas读取excel数据。

import pandas as pdExcel_1=pd.read_excel("样例详情.xlsx",sheet_name=0)Excel_2=pd.read_excel("样例详情.xlsx",sheet_name=1)print(Excel_1)

输出：

乙方合同金额工期0 刘一 10000 101 陈二 20000 202 张三 30000 303 李四 40000 404 王五 50000 505 赵六 60000 606 钱七 70000 707 周八 80000 808 吴九 90000 909 郑十 100000 100

二、写入docx

假设word文件已经给定了，可以是合同、工资条、通知等等，但格式需得是.docx, 否则python-docx包无法操作。

1. 导入模块，读取指定文件

from docx import Documentimport re #re模块用于确定写入的位置#读取word文件实例document = Document("建设工程勘察合同.docx")

2. 定义操作word文档的函数

#定义函数，查找关键字出现的段落def find_index_paragraph(pattern,document):i=0id_=[]for paragraph in document.paragraphs:result = re.findall(pattern,paragraph.text)if result:print("Line",i,"Exist：",paragraph.text)id_.append([i,paragraph.text])i+=1print("*"*20)return id_ #返回paragraph的索引,以及该paragraph的内容#定义函数，查找段落中关键字出现的sentence（python-docx称之为run)def find_index_run(line_number,document):idx_=[]for j in line_number:i=0for run in document.paragraphs[j].runs:print("line_number",j,"Run",i,"Content：",run.text)idx_.append([j,i,run.text])i+=1print("*"*20)return idx_ #返回paragraph的索引,run的索引，以及对应run的内容#定义函数，在指定位置插入需要的字符串def change_paragraph_value(line_run_number,change,document,bold=True,underline=True):for i in line_run_number:document.paragraphs[i[0]].runs[i[1]].bold=bold #设置加粗document.paragraphs[i[0]].runs[i[1]].underline=underline #设置下划线document.paragraphs[i[0]].runs[i[1]].text=change

3. 使用定义的函数确定需要修改的位置

find_paragraph=find_index_paragraph("勘察人（全称）：",document)print(find_paragraph)

输出：

Line 2 Exist：勘察人（全称）：_______________________________********************[[2, '勘察人（全称）：_______________________________']]

原文只有一个段落(Line 2)出现了字符串 “勘察人（全称）：”,此处即为要修改的段落。

#确定需要修改的段落之中的具体句子( paragraph ---> run)line_number=[x[0] for x in find_paragraph]find_run=find_index_run(line_number,document)print(find_run)

输出：

line_number 2 Run 0 Content：勘察人（全称）：line_number 2 Run 1 Content： _______________________________********************[[2, 0, '勘察人（全称）：'], [2, 1, '_______________________________']]

段落2的第0个run是字符串 ‘勘察人（全称）：’，第1个run是字符串 ‘_______________________________’，段落2的第0处run（ [2,1] ）即为要修改的第一处。

同理，分别以“元)”、“（总日历天数）”为字符串确定需要修改的位置[17,5]以及[12,3]。此三值即为需要修改的位置，记为Location.

Location=[[2,1],[17,5],[12,3],]

3. 使用定义的函数修改指定的位置并保存

print(Excel_1.iloc[0,:])

输出：

乙方刘一合同金额 10000工期 10Name: 0, dtype: object

修改并保存文件：

change_paragraph_value(Location[0],str(Excel_1.iloc[0,0]),document)change_paragraph_value(Location[1],str(Excel_1.iloc[0,1]),document)change_paragraph_value(Location[2],str(Excel_1.iloc[0,2]),document)document.save("Sample."+str(i)+str(Excel_1.iloc[i,0])+".docx")

三、批量生成docx

import osfile_path=os.getcwd()+'\\Sample' if not os.path.exists(file_path): #如果文件目录不存在try:os.mkdir(file_path) #生成Sample文件夹用于存放系列word文件except Exception as e:print(e)for i in range(Excel_1.shape[0]):document = Document("建设工程勘察合同.docx")change_paragraph_value([Location[0]],str(Excel_1.iloc[i,0]),document)change_paragraph_value([Location[1]],str(Excel_1.iloc[i,1]),document)change_paragraph_value([Location[2]],str(Excel_1.iloc[i,2]),document)document.save(file_path+"\\Sample"+str(i)+str(Excel_1.iloc[i,0])+".docx")

输出：

四、反向操作，根据word生成excel

Location=[[2,1],[17,5],[12,3],] #已知需要提取信息的位置[paragraph,run]files=os.listdir(file_path)#os.walk()也可以，且功能更强大excel_file=[]for file in files:document = Document(os.path.join(file_path,file))in_run=[]for i in Location:in_run.append(document.paragraphs[i[0]].runs[i[1]].text) excel_file.append(in_run)print(excel_file)

输出：

[['刘一', '10000', '10'],['陈二', '20000', '20'],['张三', '30000', '30'],['李四', '40000', '40'],['王五', '50000', '50'],['赵六', '60000', '60'],['钱七', '70000', '70'],['周八', '80000', '80'],['吴九', '90000', '90'],['郑十', '100000', '100']]

接下来加上表头写入excel即可。

output=pd.DataFrame(excel_file, columns=['乙方', '金额', '工期'])output.to_excel(os.path.join(file_path,"sample.xlsx"))

相关资源已上传至CSDN, 请点击0积分下载