Truncation Of Cell Data When using OpenPyxl
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lxml |
New
|
Undecided
|
Unassigned |
Bug Description
When using OpenPyxl 2.5.12 (required in our case) to write data into a .xlsx cell, there is some string data where the text is only partially written to the cell and therefore truncated at some arbitrary point in the text.
Openpyxl makes use of lxml and this bug only appears to happen when using lxml 4.7.0 or greater, with earlier versions not causing truncation of cell data.
The attached python script attempts to write some repeated randomly generated strings of Bulgarian and should generate 'test.xlsx' which contains the single cell where the data has been truncated/cut-off. It has been truncated 4007 characters into the 15971 character long original text but this is never consistent across occurrences where the truncation occurs. There is a possibility the truncation is directly affected by Cyrillic Unicode characters, but the issue doesn't initially appear with only the lines where truncation occurs and usually occurs several lines deep into a text sample.
There is a possibility that this issue may stem from this change where parsing is done directly by encoding to UTF-8 instead of using Py_UNICODE strings like in previous versions:
https:/
The attached script will only recreate the error with openpyxl 2.5.12 and lxml 4.7.0 (or greater) and will create a new xlsx file in the current working directory the script is ran from.
Report Information:
Python : sys.version_
lxml.etree : (4, 7, 0, 0)
libxml used : (2, 9, 12)
libxml compiled : (2, 9, 12)
libxslt used : (1, 1, 34)
libxslt compiled : (1, 1, 34)