UnicodeFix v1.2.1: Smoother Installs, Cleaner Reports, Less Weirdness
UnicodeFix v1.2.1: The “Smooth the Edges” Release
Not every release needs fireworks.
Some releases need to do something quieter, and honestly more useful: remove friction.
UnicodeFix v1.2.1 is that kind of release.
This update is about making the tool feel better in real use:
- easier to install
- easier to understand
- less confusing when you ask for metrics
- more honest in how it reports what it found
In other words: less “why is it doing that?” and more “yes, that is exactly what I wanted.”
First: Installation Finally Got Less Annoying
One of the big cleanup jobs in this release was the install path.
Before, UnicodeFix had a few too many ways to get itself onto a system. That usually sounds flexible until you’re the person trying to figure out which path is the real one.
v1.2.1 tightens that up.
Now the project uses pyproject.toml as the single source of truth for dependencies, and setup.sh acts like a sane bootstrap script instead of a side quest.
That means:
- no split-brain dependency story between
requirements.txtand package metadata - a cleaner path for contributors and regular users alike
- better alignment between local installs, docs, and CI
It also behaves more politely with Python environments now. If you’re already inside Conda, UnicodeFix should act like a good guest and install there instead of wandering off and building a surprise environment on its own.
That sounds small. If you’ve ever lost ten minutes to “which Python did this install into?” you know it is not small.
The Hyphen Fix That Should Have Been There Already
This release also fixes one of those maddening Unicode details that only becomes obvious after it breaks something important.
UnicodeFix now normalizes additional dash and hyphen variants, including the non-breaking hyphen (U+2011).
That matters because these characters are sneaky. To a human reader they often look perfectly normal. To a terminal, linter, parser, or diff tool, they are very much not normal.
So if you’ve ever had text that looked clean but still behaved strangely, this is exactly the kind of thing that causes it.
Now UnicodeFix catches more of those cases by default.
Metrics Now Behave Like a Person Thought About Them
The --metrics feature got one of the most practical improvements in the entire release.
Before, the behavior around metrics could feel a little too clever in the wrong way. You might ask for metrics and get output that wasn’t quite what you expected. Or ask for cleaned output and wonder why the reporting behavior changed.
v1.2.1 makes that much clearer:
cleanup-text --metrics file.txtgives you a report- if you explicitly ask for cleaned output with
-oor-t, UnicodeFix still writes the cleaned file - and now the report is emitted as a side report, so you can see what happened without losing the clean output you asked for
That is a much better contract.
You say what you want. The tool does that. And it doesn’t make you decode its mood afterward.
Reports Are More Honest Now
This is my favorite kind of fix: the kind that makes a tool more trustworthy.
UnicodeFix’s scanner/reporting logic now does a better job separating what is merely informational from what is actually an anomaly.
A good example is quote counting.
This release splits quote-like characters into more useful categories, so the tool can distinguish between:
- normal ASCII quote characters
- Unicode quote-like characters that may actually be suspicious or worth cleaning
That means if you clean a file and run it back through the tool, you are less likely to get a misleading “something is still wrong” vibe just because the file contains ordinary apostrophes or quotation marks.
That is a subtle change, but it matters a lot. If a report tool overstates the problem, people stop trusting the report.
This release pushes UnicodeFix in the opposite direction: fewer false alarms, clearer signal.
A Quiet But Important Python Compatibility Fix
UnicodeFix now behaves better across Python versions too.
There was a compatibility issue around tomllib, which is available in newer Python versions but not older ones like 3.9 and 3.10.
That got cleaned up by falling back to tomli where needed.
The result:
- smoother installs on older supported Python versions
- better CI coverage
- fewer “works on my machine” moments
And yes, the CI matrix was expanded too, because if you’re going to claim support, you should actually test it.
Who This Release Is Really For
This update helps a few different kinds of people.
If you’re technical
You get:
- a cleaner packaging story
- better CI behavior
- saner metrics/report interactions
- more accurate anomaly reporting
- fewer environment surprises
If you’re not especially technical
You get something just as valuable:
- install it more easily
- run it with less confusion
- trust the output more quickly
- spend less time wondering whether the weirdness is your fault
That is one of my favorite things about this release. It improves the developer experience without making the tool more intimidating.
A Nice Example of What “Polish” Really Means
People talk about polish like it’s decorative.
It usually isn’t.
Most of the time, polish is just this:
removing the little moments where software makes you stop and ask, “wait… what?”
UnicodeFix v1.2.1 does a lot of that.
It doesn’t reinvent the project. It makes the project feel more deliberate.
And for a tool that exists to clean up tiny hidden problems, that feels exactly right.
Try It
If you’ve already been using UnicodeFix, this is a very easy upgrade to appreciate.
If you haven’t used it yet, this is a nice release to start with.
Use it when:
- pasted text starts acting cursed
- your docs look normal but diff weirdly
- CI starts complaining about characters nobody can see
- you want a fast sanity check on whether a file is carrying invisible baggage
UnicodeFix is still the same basic idea: clean the text, remove the gremlins, move on with your life.
This release just makes that whole experience smoother.
In Short
UnicodeFix v1.2.1 brings:
- a unified install path
- better Conda/env behavior
- improved Unicode hyphen handling
- cleaner metrics/report behavior
- more accurate quote/anomaly reporting
- stronger Python compatibility and CI coverage
Not flashy. Very useful. Exactly the kind of release you want in a tool that lives in your daily workflow.
Get It / Support It
Repo, issues, and discussions:
- https://github.com/unixwzrd/UnicodeFix
UnicodeFix is used pretty widely now, which is great. What it has not done particularly well is pay for the time it takes to keep improving it.
This project keeps moving because people use it, report weird edge-cases, open issues, send patches, and occasionally throw a little fuel on the fire so more of that work can happen.
If UnicodeFix has saved you time, cleaned up a mess, helped your workflow, or spared you one of those invisible-character debugging sessions that steals an afternoon, support is appreciated.
Support the work:
Bug reports welcome. PRs welcome. New glyph sightings especially welcome.
This project evolves because people keep finding new ways for text to become cursed.
UnicodeFix is how we un-curse it.
-
- Mia & the unixwzrd*