Merging Files with PowerShell

Below is an example of how multiple PDF documents, containing a varying number of pages, can be merged together into one file, with all page ones together, followed by all page twos and so on. For this to work the library iTextSharp needs to be used.

Firstly, the file path is set and a check is made to see if it exists. If it does, the current working directory is changed and the names of all the PDF documents are retrieved. A further check is made to see if there is more than one PDF to merge. The files are then processed one by one to find the number of pages in each document and this information is stored in a sorted dictionary, along with the corresponding file name. Whilst doing this, a record is made of the maximum number of pages in an individual file.

The sorted dictionary containing file name and page information, along with the maximum number of pages figure, is then used to access pages in each file and check that the desired page actually exists in a particular file, which allows for PDFs of varying sizes to be merged. Finally, a confirmation message is displayed stating how many files have been merged.

# Clear the console window.
Clear-Host

# File path.
$filePath = "c:\demo"

# Check to see if the file path exists.
if (Test-Path $filePath)
{

    # Change the current working directory.
    Set-Location $filePath

    # Retrieve the names of the PDF files.
    $files = Get-ChildItem -Path *.pdf

    # Check if there are any PDF files to merge.
    if ($files.Count -eq 0)
    {

        Write-Host "There are no PDF files to merge."

    }
    elseif ($files.Count -eq 1)
    {

        Write-Host "There is only one PDF file at the specified location."

    }
    else
    {

        # Add the iTextSharp dll file.
        Add-Type -Path "c:\scripts\itextsharp.dll"

        # Maximum number of pages.
        $maxPages = 0

        # Files to process with number of pages sorted dictionary.
        $filesToProcess = [ordered]
        # Collect page count information for each PDF file.
        foreach ($file in $files)
        {

            try
            {

                # Assign current PDF to a reader object.
                $pdfReader = New-Object iTextSharp.text.pdf.PdfReader `
                    -ArgumentList $file.FullName

                # Assign the number of pages to the maximum if greater
                # than current value.
                if ($pdfReader.NumberOfPages -gt $maxPages)
                {

                    $maxPages = $pdfReader.NumberOfPages

                }

                # Add the file information to the sorted dictionary.
                $filesToProcess.Add($file.FullName, $pdfReader.NumberOfPages)

                # Dispose of the reader object.
                $pdfReader.Dispose()

            }
            catch
            {

                # Message confirming the file could not be merged.
                Write-Host "The file `"$file`" could be merged."

            }

        }

        # If there are PDFs to merge, process them.
        if ($maxPages -gt 0 -and $filesToProcess.Count -gt 1)
        {

            # Create and open new document.
            $output = [System.IO.Path]::Combine($filePath, 'combined.pdf');
            $fileStream = New-Object System.IO.FileStream($output, `
                [System.IO.FileMode]::OpenOrCreate);
            $document = New-Object iTextSharp.text.Document
            $writer = New-Object iTextSharp.text.pdf.PdfSmartCopy($document, $fileStream)
            $document.Open()

            try
            {

                # Process PDF files up to the maximum number of pages.
                for ($pageIndex = 1; $pageIndex -le $maxPages; $pageIndex++)
                {

                    # Add the desired page from each PDF to the new PDF.
                    foreach ($pdfFile in $filesToProcess.GetEnumerator())
                    {

                        # Check if current file has the desired page to merge.
                        if ($pageIndex -le $pdfFile.Value)
                        {

                            # Assign the current PDF to a reader object.
                            $pdfReader = New-Object iTextSharp.text.pdf.PdfReader `
                                -ArgumentList $pdfFile.Name

                            # Extract the desired page.
                            $page = $writer.GetImportedPage($pdfReader, $pageIndex)

                            # Add the extracted page to the combined PDF.
                            $writer.AddPage($page)

                            # Dispose of the reader object.
                            $pdfReader.Dispose()

                        }

                    }

                }

                # Dispose of objects to clean up.
                $document.Dispose()
                $writer.Dispose()
                $filestream.Dispose()

                # Feedback that file merge has been successful.
                Write-Host "$($filesToProcess.Count) PDF files merged successfully."

            }
            catch
            {

                # Display a message stating the merge was unsuccessful.
                Write-Host "The file merge was unsuccessful."

            }

        }
        else
        {

            # Display a message stating there are no files to merge.
            Write-Host "There are no files to merge."

        }

    }

}
else
{

    # Message stating file path does not exist.
    Write-Host "File path does not exist."

}