scitex_msword.diff
Paragraph-level diff between two DOCX documents.
This module implements diff_docx which compares two .docx files (or two
already-loaded python-docx Document objects) and returns a list of
paragraph-level operations describing the changes.
The diff is computed with difflib.SequenceMatcher over paragraph text,
and per-paragraph run-level formatting deltas (bold / italic / underline /
font / highlight) are also captured for modify operations.
Typical usage
>>> from scitex_msword.diff import diff_docx
>>> ops = diff_docx("v15.docx", "v16.docx")
>>> for op in ops:
... print(op["op"], op["index"], op.get("text_b") or op.get("text_a"))
Functions
|
Compute paragraph-level diff between two DOCX documents. |
|
Convenience helper: count operations by type. |