utf16 file detected as binary file

Bug #267296 reported by JoeDevSys
30
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Bazaar
Confirmed
Medium
Unassigned
Breezy
Triaged
Medium
Unassigned

Bug Description

bzr 1.5 on windows

Create new text file in notepad, save file with encoding="unicode" in SaveAs dialog. (First two bytes of file are 0xff 0xfe)
Commit.
Edit file with new text.
bzr diff treats the file as binary and does not report text differences.

Interestingly "UTF8" files are treated correctly.

Related branches

Revision history for this message
Wesley J. Landaker (wjl) wrote :

Sounds like your file is saved as UTF-16 LE, which is looks just like a binary file to almost any tool, not just Bazaar.

Saving your files as UTF-8 is almost always the right thing to do, as it is the only Unicode encoding fully supported by almost every tool.

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

we probably need something more complex than the simple check for null bytes that we use atm afaik

Changed in bzr:
status: New → Confirmed
importance: Undecided → Low
status: Confirmed → Triaged
importance: Low → Medium
tags: added: binary utf16
Revision history for this message
Jelmer Vernooij (jelmer) wrote :

a more complex check /to determine if a file is a binary file/

Martin Pool (mbp)
summary: - unicode file treated as binary file
+ utf16 file detected as binary file
Revision history for this message
Martin Pool (mbp) wrote :

As a workaround for this, you can use the --using option to diff to run an external program that does understand utf-16 or ucs-2

Revision history for this message
Tymek (maju7) wrote :

This is minor.

We had "text as binary" problem and we
(BranchA and BranchB are branches of Trunk)
1. Saved FileA as ANSI 1252 in branch A -> the std diff worked in the text mode-> good
2. Merged BranchA into Trunk
3. Saved FileA ANSI 1252 in branch B -> now the std diff worked in the text mode-> good
4. Merged BranchB in from trunk

Bzr still "complained" it's a binary file. We did it twice to make sure that .BASE file was 1252.

Only when
BranchB used the file in BranchA and added the changes after (with an external diff) and committed,
bzr started to report conflicts as text conflicts.

Changed in bzr:
assignee: nobody → Gary van der Merwe (garyvdm)
status: Triaged → In Progress
Revision history for this message
John A Meinel (jameinel) wrote :

I don't think this is actively in-progress anymore. So I'm pulling this off of Gary's plate.
I will mark it as "work to do" for ~canonical-bazaar since he did do some work on this, and we would like that work to not be lost. Consider it as something a Patch Pilot can pick up as desired.

Changed in bzr:
assignee: Gary van der Merwe (garyvdm) → canonical-bazaar (canonical-bazaar)
status: In Progress → Confirmed
John A Meinel (jameinel)
Changed in bzr:
assignee: canonical-bazaar (canonical-bazaar) → nobody
tags: added: patch-needswork
Jelmer Vernooij (jelmer)
tags: added: check-for-breezy
Jelmer Vernooij (jelmer)
tags: removed: check-for-breezy
Changed in brz:
status: New → Triaged
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.