Batch File Formatting with Python
Below is an example of how paragraphs of text in Microsoft Word documents can be formatted in a batch, based on specified paragraph text. For this to work the third-party module Python-Docx must be installed.
Firstly, the file path is set, together with a list of paragraphs containing particular text to format. A check is then made to see if the file path exists and whether there are any files to format. Each file is then processed one by one. All files without a ‘.docx’ extension are ignored. For each paragraph of text within a document, a check is made to see if its text matches any item in the paragraphs to format list. If a match is found the text is formatted.
Note that in Python, a paragraph of text is made up on one or more ‘runs’. A run is a section of text where the formatting is the same, so, for example, if a paragraph has a section of bold text in the middle, the paragraph would be made up of three runs, one up to the bold text, the bold text itself and another for the text after. In the below example an assumption is made that the paragraphs being formatted contain only one run. The first run in a paragraph has an index value of zero and not one.
Feedback is given as to the number of paragraphs that have been formatted in each file, along with a total count of files formatted at the end. A ‘try-except’ block is also used to handle errors with opening, formatting and saving files.
# Import required modules import docx.shared import os # File path filePath = 'c:\\demo' # List of paragraphs to format parasToFormat = ['Example Heading 1', 'Example Heading 2', 'Example Heading 3'] # Check to see if the file path exists if os.path.exists(filePath): # Change the current working directory os.chdir(filePath) # Check if there are any files in the chosen directory if len(os.listdir(filePath)) == 0: print('There are no files to format.') else: # Formatted file and paragraph counts filesFormatted = 0 paragraphsFormatted = 0 # Process the files at the path for filename in os.listdir(filePath): # Check if the file is a Word document, excluding temp files if filename.endswith('.docx') and not filename.startswith('~'): try: # Assign current file to a variable currentDoc = docx.Document(filename) # Process the paragraphs in the document for para in currentDoc.paragraphs: # Check if the paragraph text is one that needs formatting if para.text in parasToFormat: # Format the paragraph font, weight and size para.runs[0].font.name = 'Arial' para.runs[0].bold = True para.runs[0].font.size = docx.shared.Pt(14) # Indicate a paragraph has been formatted paragraphsFormatted += 1 # Check if any paragraphs have been formatted if paragraphsFormatted > 0: # Save the document currentDoc.save(filename) # Increment the files formatted count filesFormatted += 1 # Message displaying file formatting information if paragraphsFormatted == 1: print(str(paragraphsFormatted) + ' paragraph formatted in the file "' + filename + '".') else: print(str(paragraphsFormatted) + ' paragraphs formatted in the file "' + filename + '".') # Reset the paragraphs formatted variable paragraphsFormatted = 0 except PermissionError as e: print('The file "' + filename + '" could not be formatted.') except docx.opc.exceptions.PackageNotFoundError as e: print('The file "' + filename + '" could not be formatted.') # Message displaying the number of files formatted if filesFormatted == 1: print(str(filesFormatted) + ' file has been formatted.') else: print(str(filesFormatted) + ' files have been formatted.') else: # Display a message stating that the file path does not exist print('File path does not exist.')