scitex_msword.diff

Paragraph-level diff between two DOCX documents.

This module implements diff_docx which compares two .docx files (or two already-loaded python-docx Document objects) and returns a list of paragraph-level operations describing the changes.

The diff is computed with difflib.SequenceMatcher over paragraph text, and per-paragraph run-level formatting deltas (bold / italic / underline / font / highlight) are also captured for modify operations.

Typical usage

>>> from scitex_msword.diff import diff_docx
>>> ops = diff_docx("v15.docx", "v16.docx")
>>> for op in ops:
...     print(op["op"], op["index"], op.get("text_b") or op.get("text_a"))

Functions

diff_docx(a, b, *[, include_run_diff])

Compute paragraph-level diff between two DOCX documents.

summarize_diff(ops)

Convenience helper: count operations by type.