Batch File Formatting with Python

Below is an example of how paragraphs of text in Microsoft Word documents can be formatted in a batch, based on specified paragraph text. For this to work the third-party module Python-Docx must be installed.

Firstly, the file path is set, together with a list of paragraphs containing particular text to format. A check is then made to see if the file path exists and whether there are any files to format. Each file is then processed one by one. All files without a ‘.docx’ extension are ignored. For each paragraph of text within a document, a check is made to see if its text matches any item in the paragraphs to format list. If a match is found the text is formatted.

Note that in Python, a paragraph of text is made up on one or more ‘runs’. A run is a section of text where the formatting is the same, so, for example, if a paragraph has a section of bold text in the middle, the paragraph would be made up of three runs, one up to the bold text, the bold text itself and another for the text after. In the below example an assumption is made that the paragraphs being formatted contain only one run. The first run in a paragraph has an index value of zero and not one.

Feedback is given as to the number of paragraphs that have been formatted in each file, along with a total count of files formatted at the end. A ‘try-except’ block is also used to handle errors with opening, formatting and saving files.

# Import required modules
import docx.shared
import os

# File path
filePath = 'c:\\demo'

# List of paragraphs to format
parasToFormat = ['Example Heading 1',
                 'Example Heading 2',
                 'Example Heading 3']

# Check to see if the file path exists
if os.path.exists(filePath):

    # Change the current working directory
    os.chdir(filePath)

    # Check if there are any files in the chosen directory
    if len(os.listdir(filePath)) == 0:

        print('There are no files to format.')

    else:

        # Formatted file and paragraph counts
        filesFormatted = 0
        paragraphsFormatted = 0

        # Process the files at the path
        for filename in os.listdir(filePath):

            # Check if the file is a Word document, excluding temp files
            if filename.endswith('.docx') and not filename.startswith('~'):

                try:

                    # Assign current file to a variable
                    currentDoc = docx.Document(filename)

                    # Process the paragraphs in the document
                    for para in currentDoc.paragraphs:

                        # Check if the paragraph text is one that needs formatting
                        if para.text in parasToFormat:

                            # Format the paragraph font, weight and size
                            para.runs[0].font.name = 'Arial'
                            para.runs[0].bold = True
                            para.runs[0].font.size = docx.shared.Pt(14)

                            # Indicate a paragraph has been formatted
                            paragraphsFormatted += 1

                    # Check if any paragraphs have been formatted
                    if paragraphsFormatted > 0:

                        # Save the document
                        currentDoc.save(filename)

                        # Increment the files formatted count
                        filesFormatted += 1

                        # Message displaying file formatting information
                        if paragraphsFormatted == 1:

                            print(str(paragraphsFormatted)
                                  + ' paragraph formatted in the file "'
                                  + filename + '".')

                        else:

                            print(str(paragraphsFormatted)
                                  + ' paragraphs formatted in the file "'
                                  + filename + '".')

                        # Reset the paragraphs formatted variable
                        paragraphsFormatted = 0

                except PermissionError as e:

                    print('The file "' + filename + '" could not be formatted.')

                except docx.opc.exceptions.PackageNotFoundError as e:

                    print('The file "' + filename + '" could not be formatted.')

        # Message displaying the number of files formatted
        if filesFormatted == 1:
            print(str(filesFormatted) + ' file has been formatted.')
        else:
            print(str(filesFormatted) + ' files have been formatted.')

else:

    # Display a message stating that the file path does not exist
    print('File path does not exist.')