Comment 9 for bug 883315

Revision history for this message
Chris Eberle (eberle1080) wrote :

To which paths are you referring? The only time I ask for real unicode strings is when using disk paths (this is python's preference). Everything else is actual byte strings. However for my part, yes when converting to strings I have been assuming a utf-8 encoding. I figured this was a relatively safe assumption since the old code assumed 8-bit characters. Now of course, one of the advantages of the old way is that there was no implied encoding, so you could just stuff bytes in there. That's why in general I've been preferring bytes objects. However in the few instances where strings make more sense (like committer name / email address / commit message / diff strings / email messages, etc) I've been assuming utf-8. I think there's somewhere in one of my dozens of commits where I said something like "I guess I'll be using utf-8 until someone corrects me". Frankly it was a guess. The default encoding is 7-bit ascii which I found to be insufficient for most things (e.g. I couldn't even view the git log for dulwich because some commiters have international characters in their names). So obviously I'm open to any and all suggestions. But as mentioned, there are actually some places where real unicode strings are required by python.

As for rebasing everything, I'm all for that. I can definitely see that as an advantage. I say go for it. :)