Activity log for bug #1677244

Date Who What changed Old value New value Message
2017-03-29 13:24:23 Jamie Strandboge bug added bug
2017-03-29 13:25:15 Jamie Strandboge bug task added python3.5 (Ubuntu)
2017-03-29 13:25:43 Jamie Strandboge description The following script works fine on 16.04 LTS: #!/usr/bin/python3 import magic import os dir = "/usr/share/ca-certificates/mozilla" mime = magic.open(magic.MAGIC_MIME) mime.load() for root, dirnames, filenames in os.walk(dir): for f in filenames: fn = os.path.join(root, f) print("%s: %s" % (fn, mime.file(fn))) Eg: $ python3 /tmp/test.py /usr/share/ca-certificates/mozilla/TWCA_Root_Certification_Authority.crt: text/plain; charset=us-ascii /usr/share/ca-certificates/mozilla/Baltimore_CyberTrust_Root.crt: text/plain; charset=us-ascii /usr/share/ca-certificates/mozilla/Comodo_AAA_Services_root.crt: text/plain; charset=us-ascii /usr/share/ca-certificates/mozilla/Hellenic_Academic_and_Research_Institutions_RootCA_2011.crt: text/plain; charset=us-ascii /usr/share/ca-certificates/mozilla/TC_TrustCenter_Class_3_CA_II.crt: text/plain; charset=us-ascii /usr/share/ca-certificates/mozilla/Security_Communication_RootCA2.crt: text/plain; charset=us-ascii /usr/share/ca-certificates/mozilla/EBG_Elektronik_Sertifika_Hizmet_Sağlayıcısı.crt: text/plain; charset=us-ascii ... (notice the last filename before the ellipsis) But on 17.04, this happens: $ python3 /tmp/test.py /usr/share/ca-certificates/mozilla/TWCA_Root_Certification_Authority.crt: text/plain; charset=us-ascii /usr/share/ca-certificates/mozilla/Baltimore_CyberTrust_Root.crt: text/plain; charset=us-ascii /usr/share/ca-certificates/mozilla/Comodo_AAA_Services_root.crt: text/plain; charset=us-ascii /usr/share/ca-certificates/mozilla/Hellenic_Academic_and_Research_Institutions_RootCA_2011.crt: text/plain; charset=us-ascii /usr/share/ca-certificates/mozilla/TC_TrustCenter_Class_3_CA_II.crt: text/plain; charset=us-ascii /usr/share/ca-certificates/mozilla/Security_Communication_RootCA2.crt: text/plain; charset=us-ascii Traceback (most recent call last): File "/home/ubuntu/test.py", line 15, in <module> print("%s: %s" % (fn, mime.file(fn))) File "/usr/lib/python3/dist-packages/magic.py", line 130, in file bi = bytes(filename, 'utf-8') UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc4' in position 69: surrogates not allowed I'm guessing this is a change in python3 that python3-magic hasn't accounted for, but I'm not sure. Adding python3 task just in case. The following script works fine on 16.04 LTS: #!/usr/bin/python3 import magic import os dir = "/usr/share/ca-certificates/mozilla" mime = magic.open(magic.MAGIC_MIME) mime.load() for root, dirnames, filenames in os.walk(dir):     for f in filenames:         fn = os.path.join(root, f)         print("%s: %s" % (fn, mime.file(fn))) Eg: $ python3 /tmp/test.py /usr/share/ca-certificates/mozilla/TWCA_Root_Certification_Authority.crt: text/plain; charset=us-ascii /usr/share/ca-certificates/mozilla/Baltimore_CyberTrust_Root.crt: text/plain; charset=us-ascii /usr/share/ca-certificates/mozilla/Comodo_AAA_Services_root.crt: text/plain; charset=us-ascii /usr/share/ca-certificates/mozilla/Hellenic_Academic_and_Research_Institutions_RootCA_2011.crt: text/plain; charset=us-ascii /usr/share/ca-certificates/mozilla/TC_TrustCenter_Class_3_CA_II.crt: text/plain; charset=us-ascii /usr/share/ca-certificates/mozilla/Security_Communication_RootCA2.crt: text/plain; charset=us-ascii /usr/share/ca-certificates/mozilla/EBG_Elektronik_Sertifika_Hizmet_Sağlayıcısı.crt: text/plain; charset=us-ascii ... (notice the last filename before the ellipsis) But on 17.04, this happens: $ python3 /tmp/test.py /usr/share/ca-certificates/mozilla/TWCA_Root_Certification_Authority.crt: text/plain; charset=us-ascii /usr/share/ca-certificates/mozilla/Baltimore_CyberTrust_Root.crt: text/plain; charset=us-ascii /usr/share/ca-certificates/mozilla/Comodo_AAA_Services_root.crt: text/plain; charset=us-ascii /usr/share/ca-certificates/mozilla/Hellenic_Academic_and_Research_Institutions_RootCA_2011.crt: text/plain; charset=us-ascii /usr/share/ca-certificates/mozilla/TC_TrustCenter_Class_3_CA_II.crt: text/plain; charset=us-ascii /usr/share/ca-certificates/mozilla/Security_Communication_RootCA2.crt: text/plain; charset=us-ascii Traceback (most recent call last):   File "/home/ubuntu/test.py", line 15, in <module>     print("%s: %s" % (fn, mime.file(fn)))   File "/usr/lib/python3/dist-packages/magic.py", line 130, in file     bi = bytes(filename, 'utf-8') UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc4' in position 69: surrogates not allowed I'm guessing this is a change in python3 that python3-magic hasn't accounted for, but I'm not sure. Adding python3 task just in case.
2017-08-16 21:23:55 Michael Hudson-Doyle python3.5 (Ubuntu): status New Fix Released