919 stories
·
28 followers

Fish 4.0: The Fish Of Theseus

2 Shares

About two years ago, our head maintainer @ridiculousfish opened what quickly became our most-read pull request:

Truth be told, we did not quite expect that to be as popular as it was. It was written as a bit of an in-joke for the fish developers first, and not really as a press release to be shared far and wide. We didn’t post it anywhere, but other people did, and we got a lot of reactions.

Observant readers will note that the PR was a proposal to rewrite the entirety of fish in Rust, from C++.

Fish is no stranger to language changes - it was ported from pure C to C++ earlier in its life, but this was a much bigger project, porting to a much more different language that didn’t even exist when fish was started in 2007.

Now that we’ve released the beta of fish 4.0, containing 0% C++ and almost 100% pure Rust, let’s look back to see what we’ve learned, what went well, what could have gone better and what we can do now.

We’re writing this so others can learn from our experience, but it is our experience and not an exhaustive study. We hope that you’ll be able to follow along even if you have never written any rust, but experience with a roughly C++-shaped language should help.

Why are we doing this again?

We’ve experienced some pain with C++. In short:

  • tools and compiler/platform differences
  • ergonomics and (thread) safety
  • community

Frankly, the tooling around the language isn’t good, and we had to take on some additional pain in order to support our users. We want to provide up-to-date fish packages for systems that aren’t up-to-date, like LTS Linux and older macOS. But there is no ‘rustup’ for C++, no standard way to install recent C++ compilers on these operating systems. This means adopting recent C++ standards would complicate the lives of packagers and would-be contributors1. For example, we started using C++11 in 2016, and yet we still needed to upgrade the compilers on our build machines until 2020.

Fish also uses threads for its award-winning (note to editor: find an actual award) autosuggestions and syntax highlighting, and one long-term project is to add concurrency to the language.

Here’s a dirty secret: while external commands run in parallel, fish’s execution of internal commands (builtins and functions) is currently serial and can’t be backgrounded. Lifting this limitation will enable features like asynchronous prompts or non-blocking completions, as well as performance gains.

POSIX shells use subshells to get around this, but subshells are a leaky abstraction that can bite you in the behind when you least expect it. For instance, you can’t set variables from inside a pipe (except on some shells, but only in the last part of the pipe, maybe, if you have enabled the correct option). We would like to avoid that, and so the heavy hand of forking off a process isn’t appealing.

We prototyped true multithreaded execution in C++, but it just didn’t work out. For example, it was too easy to accidentally share objects across threads, with only post-hoc tools like Thread Sanitizer to prevent it.

The ergonomics of C++ are also simply not good - header files are annoying, templates are complicated, you can easily cause a compile error that throws pages of overloads in the standard library at you. Many functions are unsafe to use. C++ string handling is very verbose with easily confusable overloads of many methods, making it attractive to drop down to C-style char pointers, which are quite unsafe.

And the standard prioritizes performance over ergonomics. Consider for instance string_view, which provides a non-owning slice of a string. This is an extremely modern, well-liked feature that C++ programmers often claim is a great reason to switch to C++17. And it is extremely easy to run into use-after-free bugs with it, because the ergonomics weren’t a priority.

One good case study of the deficiencies of C++-in-practice is a C library: curses. This is a venerable library to access terminal features, and we use it to access the terminfo database, which describes differences in terminal features and behavior.

This not only caused us grief by being unsafe to use in weird ways - the “cur_term” pointer (or sometimes macro!) can be NULL, and it is dereferenced in surprising places, but also caused a surprisingly high number of issues when building from source. This was either because there are multiple implementations of it with differences as useless as “this function takes a char on system X but an int on system Y”, but also because users kept coming to us with new and exciting(ly terrible) ways to package and install it. The dependency system is the system package manager.

Finally, subjectively, C++ isn’t drawing in the crowds. We have never had a lot of C++ contributors. Over the 11 years fish used C++, only 17 people have at least 10 commits to the C++ code. We also don’t know a lot of people who would love to work on a C++ codebase in their free time.

Some parting thoughts we can give the C++ community: We would like to see improvements to ergonomics and safety of the language and the tools prioritized over performance, and we would like to see efforts to make C++ compilers easier to upgrade on real systems.

Why Rust?

We need to get one thing out of the way: Rust is cool. It’s fun.

It’s tempting to try to sweep this under the rug because it feels gauche to say, but it’s actually important for a number of reasons.

For one, fish is a hobby project, and that means we want it to be fun for us. Nobody is being paid to work on fish, so we need it to be fun. Being fun and interesting also attracts contributors.

Rust also has great tooling. The tools have really paid a lot of attention to use, and the compiler errors are terrific. Not even “compared to C++”, they just actually rule. And as we have tried to pay attention to our own error messages (fish has a bespoke error for if it thinks a file you told it to run has Windows line endings), we like it.

And it is easy to get that tooling installed - rustup is magic, and allows people to get started quickly, with minimal fuss or root permissions. When the answer to “how to upgrade C++ compiler” is “find a repository (with root permissions), compile it yourself, install some other repository or a docker image”, it is amazing how the Rust answer can just be “use rustup”.

Rust has great ergonomics - the difference between C++’s pointers (which can always be NULL) and Rust’s Options are apparent very quickly even to those of us who had never used it before. We did have a backport of C++’s optional, and liked using it, but it was never as integrated as Rust’s Options were.

Having an explicit use system where you know exactly which function comes from which module is a great improvement over #include.

Rust makes it nice to add dependencies. We don’t want to go overboard with it, but we do want to change our history format from our homegrown “I can’t believe it’s not YAML” to something specified that other tools can actually read, and Rust makes it easy to add support for YAML/JSON/KDL.

But the killer feature of Rust, from fish-shell’s perspective, is Send and Sync, statically enforcing rules around threading. “Fearless concurrency” is too strong - you can still blow your leg off with fork or signal handlers - but Send and Sync will be the key to unlocking fully multithreaded execution, with confidence in its correctness.

We did not do a comprehensive survey of other languages. We were confident Rust was up to the task and either already knew it or wanted to learn it, so we picked it.

Platform Support

A lot of hay has also been made online about Rust’s platform support (e.g. in the git project). We don’t see a big problem here - all of our big platforms (macOS, Linux, the BSDs) are supported, as are Opensolaris/Illumos and Haiku. We have never heard of anyone trying to run fish on NonStop.

Architecture support is even less of a problem - going by Debian’s popcon, 99.9995% (the actual result, not an exaggeration) of machines run an architecture that has Rust packages in Debian. Given that fish is installed on 1.92% of Debian systems, we would project two (2) or three (3) machines of the quarter million responses to have fish on an unsupported architecture 2.

Unlike what some online have assumed, a native Windows port was not a reason for switching to Rust as it was never in the cards. Fish is, at heart, a UNIX shell that relies not only on UNIX APIs but also their semantics, and exposes them in the scripting language. What would test -x say on Windows, which has no executable bit? These are issues that could be solved with a lot of work, but we’re unix nerds making a unix shell, not one for Windows.

The one platform we care about a bit that it does not currently seem to have enough support for is Cygwin, which is sad, but we have to make a cut somewhere.

The Story Of The Port

We had decided we were gonna do a “Fish Of Theseus” port - we would move over, component by component, until no C++ was left. And at every stage of that process, it would remain a working fish.

This was a necessity - if we didn’t, we would not have a working program for months, which is not only demoralizing but would also have precluded us from using most of our test suite - which is end-to-end tests that run a script or fake a terminal interaction. We would also not have been able to do another C++ release, putting some cool improvements into the hands of our users.

Had we chosen to disappear into a hole we might not have finished at all, and we would have to re-do a bunch of work once it became testable. We also mostly kept the structure of the C++ code intact - if a function is in the “env” subsystem, it would stay there. Resisting the temptation to clean up allowed us to compare the before and after to find places where we had mistranslated something.

So we used autocxx to generate bindings between C++ and Rust code, allowing us to port one component at a time.

We started3 by porting the builtins. These are essentially little self-contained programs, with their own arguments, streams, exit code, etc. That means it’s easy to port them separately from the rest of the shell once you have a way to call a Rust builtin from C++, which we had as part of the initial pull request.

Where they connected to the main shell, we used one of three approaches:

  1. Add some FFI glue to the C++ to make it callable from Rust, port the caller and leave the callee for later
  2. Move the callee to Rust and, if necessary, make it callable from C++
  3. Write a Rust version of the callee and call it from the ported caller, but leave the C++ version around

For instance, almost every builtin needs to parse its options. We have our own implementation of getopt, that we reimplemented in Rust in the initial PR, but the C++ version stuck around until it had no more callers remaining. Otherwise we would have had to write a C++-to-Rust bridge and adjust the C++ callers to use it.

Or the builtin builtin (the builtin called builtin) needs access to the names of all builtins to print them for builtin --get-names. In that case we bridged some access to what amounts to a constant vector of strings in the C++, and eventually moved it over once the users were in Rust.

That’s how it went for a while, but we finally hit the more entangled systems, where porting larger chunks felt more productive, since that reduced the amount of tricky FFI code to be written only to be thrown away. These were ported in solo efforts. This includes the input/output “reader”, which is, unsurprisingly, one of fish’s biggest parts, ending up at about 13000 lines of Rust.

During the port, we hit a bunch of snags with (auto)cxx. Sometimes it would just not understand a particular C++ construct, and we spent a lot of time trying to figure out ways to please it. As an example, we introduced a struct on the C++ side that wrapped C++’s vector, because for some reason autocxx liked to complain about vector<wstring>. We had to fork it to add support for wstring/wchar, which is understandable because using wchar is a horrible decision - we only do it because it’s a historical mistake.

Similarly, we had to wrap some C++ variables in unique_ptr and similar to make the ownership rules understandable to (auto)cxx, or copy values that didn’t strictly need to be copied. This caused the performance during the port to go down quite a bit, but we regained all of it in most spots, and even beat the C++ version in some.

We also patched autocxx to remove the requirement to use unsafe to invoke any C++ API, because that would have obscured uses of unsafe that wouldn’t disappear just by porting the callee. We were building something temporary, so sometimes it is okay to do something a little underhanded. If you used this for a permanent bridge between Rust and C++ in a few parts of your code, the unsafe markers might be useful, but in our case they were noise.

Because autocxx generated a lot of code, some tools also were less helpful than they’d usually be. rust-analyzer for instance was extremely slow.

So, even though our codebase was fairly amenable to being moved to Rust because we didn’t use exceptions or a lot of templates, autocxx isn’t the easiest to work with. It is absolutely magical that it works at all, and it enabled us to do this port, but it has a hard task to perform and isn’t perfect at it.

The Timeline

  • The initial PR was opened on 28th January 2023, merged on 19th February 2023

  • fish 3.7.0, another release in the C++ branch to flush out some accumulated improvements, was released in January 2024

  • The last C++ code was removed in January 2024 (and some additional test code was ported from C++ to C 12th of June 2024)

  • The first beta was released 17th of December 2024

The initial PR had a timeline of “handwaving, half a year”. It was clear to all of us that it might very well be entirely off, and we’re not disappointed that it was. Frankly, 14 months was still a pretty good pace, especially considering that we made a C++ release in-between, so it did not throw off our usual release cadence.

Most of the work was done by 7 people (going by those with at least 10 commits to “.rs” files), but we got a lot of help from interested community members.

The delay after that was down to a few reasons:

  1. The “second 90%” - testing that everything worked. We flushed out a lot of bugs in this time, and if we made a release at that time it would have been a bad one.
  2. Having something to release that’s visible to users - there’s no point in making a release that does the same thing in new code, you need it to do different things. So we held off until we had something.
  3. Simple availability - sometimes, some of us took time off.

So if you are trying to draw any conclusions from this, consider the context: A group of people working on a thing in their free time, diverting some effort to work on something else, and deciding that after the work is finished it actually isn’t.

The Gripes

It won’t surprise anyone who has spent any time on this world of ours that Rust is not, in fact, perfect. We have some gripes with it.

Chief among them is how Rust handles portability. While it offers many abstractions over systems, allowing you to target a variety of systems with the same code, when it comes to adapting your code to systems at a lower-level, it’s all based on enumerating systems by hand, using checks like #[cfg(any(target_os = "freebsd", target_os = "netbsd", target_os = "openbsd"))].

This is an imperfect solution, allowing you to miss systems and ignoring version differences entirely. From what we can tell, if FreeBSD 12 gains a function that we want to use, libc would add it, but calling it would then fail on FreeBSD 11 without a good way to check, at the moment.

But listing targets in our code is also fundamentally duplicating work that the libc crate (in our case) has already done. If you want to call libc::X, which is only defined on systems A, B and C, you need to put in that check for A, B and C yourself and if libc adds system D you need to add it as well. Instead of doing that, we are using our own rsconf crate to do compile-time feature detection in build.rs.

Most of this would be solved if Rust had some form of saying “compile this if that function exists” - #[cfg(has_fn = "fstatat")]. With that, the libc crate could do whatever checks it wants and fish would just follow what it did, and we could remove a lot of the use for rsconf. It would not really help support older distributions that lack some features, tho. That could be solved by something like the min_target_API_version cfg.

While we’re on portability, the tools also sometimes fail to consider other targets - clippy may warn about a conversion being useless when it isn’t on another system, it is often better to use if cfg!(...) instead of #[cfg(...)] because code behind the latter is eliminated very early, so it may be entirely wrong and only shows up when building on the affected system.

We’ve also had issues with localization - a lot of the usual Rust relies on format strings that are checked at compile-time, but unfortunately they aren’t translatable. We ported printf from musl, which we required for our own printf builtin anyway, which allows us to reuse our preexisting format strings at runtime.

The Mistakes

We’ve hit some false starts, dead ends and other kinds of mistakes. For instance we originally used a fancy macro to allow us to write our strings as "foo"L, but that did not end up carrying its weight and we removed it in favor of a regular L!("foo") macro call.

We were confused by a deprecation warning in the libc crate, which explains that “time_t” will be switched to 64-bit on musl in the future. We initially tried to work around it, adding a lot of wrappers to try to stay agnostic on that size, but only later figured out that it does not affect us, as we do not pass a time_t we get from one C library to another. (<a href="https://github.com/fish-shell/fish-shell/issues/10634" rel="nofollow">https://github.com/fish-shell/fish-shell/issues/10634</a>)

Some bugs appeared because we missed subtleties of the original code. Often this turned into a crash because we used asserts or assert’s modern cousin “.unwrap()”. This was often the easiest way to translate the C++, and sometimes it simply turned out to be not accurate, and had to be replaced with different error handling.

But overall most of these were, once found, pretty shallow - “it panics here, why would it do that? oh, this can be an Err? Okay, what leads to that? Ah, okay, let’s handle that in this way”.

We’ve also caused some friction by turning on link-time-optimization combined with having release builds as the default in CMake (currently needed to run the full test suite), which makes it easy to accidentally have very long build time.

The Good

A lot of the benefits of porting to Rust will appear over time, but some are already here.

Remember our issues with (n)curses? We will no longer have any, because we no longer use curses. Instead we switched to a Rust crate that gives us just what we need, which is access to terminfo and expanding its sequences. This removes some awkward global state, and means those building from source no longer need to ensure that curses is installed “correctly” on their system - cargo just downloads a crate and builds it.

We do still read terminfo, which means users need to install that, but that can be done at runtime, is preinstalled on all mainstream systems and if it can’t be found we just use an included copy of the xterm-256color definitions4.

We have also managed to create “self-installable” fish packages that include all the functions, completions and other asset files in the fish binary to be written out at runtime. That allowed us to create statically linked versions of fish (for linux this uses musl, because glibc has unavoidable crashes!), so for the first time we have one file you can download and run on any linux (the only requirement being that the architecture matches!).

This is a pretty big boon for people who want to use fish but sometimes ssh to servers, where they might not have root access to install a package. So they can just scp a single file and it’s available.

This might be possible with C23’s #embed, but Rust allowed us to do it now and, overall, pretty easily.

The Sad

The one goal of the port we did not succeed in was removing CMake.

That’s because, while cargo is great at building things, it is very simplistic at installing them. Cargo wants everything in a few neat binaries, and that isn’t our use case. Fish has about 1200 .fish scripts (961 completions, 217 associated functions), as well as about 130 pages of documentation (as html and man pages), and the web-config tool and the man page generator (both written in python).

It also has a test suite that is light on unit tests but heavy on end-to-end script and interactive tests. The scripted tests run through our own littlecheck tool, which runs a script and compares its output to embedded comments. The interactive tests are driven by pexpect, which fakes terminal interaction and checks that the right thing happens when you press buttons.

We kept cmake, in a simplified form, for these tasks, but let it hand over the responsibility of building to cargo.

It would be possible to switch all that to a simpler task runner like Just or even plain old makefiles, but since we already have this system we’re keeping it for now. The upside is that the build process hasn’t really changed for packagers.

We’re also losing Cygwin as a supported platform for the time being, because there is no Rust target for Cygwin and so no way to build binaries targeting it. We hope that this situation changes in future, but we had also hoped it would improve during the almost two years of the port. For now, the only way to run fish on Windows is to use WSL.

The Present & The Future

We’ve succeeded. This was a gigantic project and we made it. The sheer scale of this is perhaps best expressed in numbers:

  • 1155 files changed, 110247 insertions(+), 88941 deletions(-) (excluding translations)
  • 2604 commits by over 200 authors
  • 498 issues
  • Almost 2 years of work
  • 57K Lines of C++ to 75K Lines of Rust 5 (plus 400 lines of C 6)
  • C++–

The beta works very well. Performance is usually slightly better in terms of time taken, memory use has a slightly higher floor but a lower ceiling - it will use 8M instead of 7M at rest, but e.g. globbing a big directory won’t make it go up as much. These things can all be improved, of course, but for a first result it is encouraging.

Fish is still a bit of an odd duck…fish as a Rust program. It has some bits that smell like C spirit, directly using the C API and e.g. passing around file descriptors instead of File objects. It still uses UTF-32 strings - which is why we are using a fork of the pcre2 crate because we couldn’t convince the pcre2-crate maintainer to add UTF-32 support. We hope to find a nicer solution here, but it wasn’t necessary for the first release.

The port wasn’t without challenges, and it did not all go entirely as planned. But overall, it went pretty dang well. We’re now left with a codebase that we like a lot more, that has already gained some features that would have been much more annoying to add with C++, with more on the way, and we did it while creating a separate 3.7 release that also included some cool stuff.

And we had fun doing it.

Read the whole story
luizirber
17 days ago
reply
Davis, CA
acdha
17 days ago
reply
Washington, DC
Share this story
Delete

The Federation Deathmatch

2 Shares

It’s the weekend, and I have some Thoughts about federated social media. So, buckle up, I guess, it’s time to start some fights.


Recently there has been some discourse about Bluesky’s latest fundraising round. I’ve been participating in conversations about this on Mastodon, and I think I might sometimes come across as a Mastodon partisan, but my feelings are complex and I really don’t want to be boosting the ActivityPub Fediverse without qualification.

So here are some qualifications.

Bluesky Is Evil

To the extent that I am an ActivityPub partisan in the discourse between ActivityPub and ATProtocol, it is because I do not believe that Bluesky is a meaningfully decentralized social network. It is a social network, run by a company, which has a public API with some elements that might, one day, make it possible for it to be decentralized. But today, it is not, either practically or theoretically.

The Bluesky developers are putting in a ton of effort to maybe make it decentralized, hypothetically, someday. A lot of people think they will succeed. But ActivityPub (and, of course, Mastodon specifically) are already, today, meaningfully decentralized, as you can see on FediDB, there are instances with hundreds of thousands of people on them, before we even get to esoterica like the integrations Threads, Wordpress, Flipboard, and Ghost are doing.

The inciting incident for this post — that a lot of people are also angry about Bluesky raising millions of dollars from Evil Guys Doing Evil Stuff Capitalis indeed a serious concern. It lights the fuse that burns towards their eventual, inevitable incredible journey. ATProtocol is just an API, and that API will get shut off one day, whenever their funders get bored of the pretense of their network being “decentralized”.

At time of writing, it is also interesting that 3 of the 4 times that the CEO of Bluesky has even skeeted the word “blockchain” is to say “no blockchain”, to reassure users that the scam magnet of “Blockchain” is not actually near their product or protocol, which is a much harder position to maintain when your lead investor is “Blockchain Capital”.

I think these are all valid criticisms of Bluesky. But I also think that the actual engineers working on the product are aware of these issues, and are making a significant effort to address them or mitigate them in any way they can. All that work can still be easily incinerated by a slow quarter in terms of user growth numbers or a missed revenue forecast when the VCs are getting impatient, but it’s not nothing, it is a life’s work.

Really, who among us could not have our life’s ambitions trivially destroyed in an afternoon, simply because a billionaire decided that they should be? If you feel like you are safe from this, I have some bad news about how money works. So we are all doing our best in an imperfect system and maybe Bluesky is on to something here. That’s eminently possible. They’re certainly putting forth an earnest effort.

Mastodon Is Stupid

Meanwhile, not nearly as much has been made recently of Mastodon refusing funding from a variety of sources, when all indications are that funding is low, and plummeting, far below the level required to actually sustain the site, and they haven’t done a financial transparency report for over a year, and that report was already nearly a year late.

Mastodon and the fediverse are not nearly in a position to claim moral superiority over Bluesky. Sure, taking blockchain VC money might seem like a rookie mistake, but going out of business because you are spurning every possible source of funding is not that wise either.

Some might think that, sure, Mastodon the company might die but at least the Fediverse as a whole will keep going strong, right? Lots of people run their own instances! I even find elements of this argument convincing, and I think there is probably some truth to it. But to really believe this argument as claimed, that it’s a fait accompli that the fediverse will survive in some form, that all those self-run servers will be a robust network that will self-repair, requires believing some obviously false stuff. It is frankly unprofitable to run a Fediverse instance. Realistically, if you want to operate a mastodon server for yourself, it is going to cost at least $100/year once you include stuff like having a domain name, and managing the infrastructure costs is a complex problem that keeps getting harder to manage as the software itself gets slower.

Cory Doctorow has recently argued that this is all worth it, because at least on Mastodon, you’re in control, not at the whims of centralized website operators like Bluesky. In his words,

On Mastodon (and other services based on Activitypub), you can easily leave one server and go to another, and everyone you follow and everyone who follows you will move over to the new server. If the person who runs your server turns out to be imperfect in a way that you can’t endure, you can find another server, spend five minutes moving your account over, and you’re back up and running on the new server

He concludes:

Any system where users can leave without pain is a system whose owners have high switching costs and whose users have none

(Emphasis mine).

This is a beautiful vision. It is, however, an incorrect assessment of the state of the Fediverse as it stands today. It’s not true in two important ways:

First, if you look at any account of a user’s fediverse account migration, like this one from Steve Bate or this one from the Ente project or this one from Erin Kissane, you will see that it is “painful for the foreseeable future” or “wasn’t as seamless as advertised”, and that “the best time to […] migrate instances […] is never”. This language does not presage a pleasant experience, as Doctorow puts it, “without pain”.

Second, migration is an active process that requires engagement from the instance that hosts you. If you have been blocked or banned, or had your account terminated, you are just out of luck. You do not have control over your data or agency over your online identity unless you’ve shelled out the relatively exorbitant amount of money to actually operate your own instance.

In short, ActivityPub is no panacea. A federated system is not really a “decentralized” system, as much as it is a bunch of smaller centralized systems that all talk to each other. You still need to know, and care, about your social and financial relationship to the operators of your instance. There is probably no getting away from this, like, just generally on the Internet, no matter how much peer-to-peer software we deploy, but there certainly isn’t in the incomplete mess that is ActivityPub.

JOIN, or DIE.

Neither Mastodon (or ActivityPub) nor Bluesky (or ATProtocol) has a comprehensive solution to the problem of decentralized social media. These companies, and these protocols, are both deeply flawed and if everything keeps bumping along as it is, I believe both are likely to fail. At different times, on different timelines, and for different reasons, but fail nonetheless.

However, these networks are both small and growing, and we are not yet in the phase of enshittification where margins are shrinking and audiences are captured and the screws must be tightened to juice revenue. There are stil possibilities. Mastodon is crowdfunded and what they lack in resources they make up for in flexibility and scrappiness. Bluesky has money and while there will eventually be a need to monetize somehow, they have plenty of runway to come up with that answer, and a lot of sophisticated protocol work has been done. Not enough to make a complete circut and allow users true, practical decentralization, but it’s not nothing, either.

Mastodon and Bluesky are both organizations with humans in them, and piles of data that is roughly schema-compatible even if the nuances and details are different. I know that there is a compatible model becuse thanks to both platforms being relatively open, there is a functioning ActivityPub/ATProtocol bridge in the form of Brid.gy Fed. You can use it today, and I highly recommend that you do so, so that “choice of protocol” does not fully define your audience. If you’re on bluesky, follow this account, and if you’re on Mastodon or elsewhere on the Fediverse, search for and follow @bsky.brid.gy@bsky.brid.gy.

The reality that fans of decentralized, independent social media must confront is that we are a tiny audicence right now. Whichever site we are looking at, we are talking about a few million monthly active users at best, in a world where even the pathetic husk of Twitter still has hundreds of millions and Facebook has billions. Interneceine fights are not going to get us anywhere. We need to build bridges and links and connect our networks as densely as possible. If I’m being honest, Bridgy Fed looks like a pretty janky solution, but it’s something, and we need to start doing something soon, so we do not collectively become a permanent minority that mass markets can safely ignore.

As users, we need to set an example, so that the developers of the respective platforms get their shit together and work together directly so that workarounds like Bridgy are not required. Frankly, this is mostly on the ActivityPub and Mastodon devs, as far as I can tell. Unfortunately, not a lot of this seems to be public, or at least I haven’t witnessed a lot of it directly, but I have heard repeatedly that the ActivityPub developers are prickly, and this is one high-profile public example where an ActivityPub partisan is incredibly, pointlessly hostile and borderline harrassing towards someone — Mike Masnick, a long-time staunch advocate for open protocols and open patents, someone with a Mastodon account, and thus as good a prospective ally as the ActivityPub fediverse might reasonably find — explaining some of the relative benefits of Bluesky.

Most of us are technology nerds in one way or another. In that way we can look at signifiers like “ActivityPub” and “ATProtocol”, and feel like these are hard boundaries around different all-encompassing structures for the future, and thus tribes we must join and support.

A better way to look at this, however, is to see social entities like Mastodon gGmbH and Bluesky PBC — or, more to the point, Fosstodon, SFBA Social, Hachyderm (and maybe, one day, even an instance which isn’t fully just for software development nerds), as groups that deploy these protocols to access some data that they publish, just as they might publish their website over HTTP or their newsletters over SMTP. There are technical challenges involved in bridging between mutually unintelligible domain models, but that is, like, network software's whole deal. Most software is just some kind of translation from one format or context to another. The best possible future for the fediverse is the one where users care as much about the distinction between ATProtocol and ActivityPub as they do about the distinction between POP3 and IMAP.

To both developers and users of these systems, I say: get it together. Be nice to each other. Because the rest of the social media ecosystem is sure as shit not going to be nice to us if we ever see even a hint of success and start to actually cut into their user base.

Acknowledgments

Thank you to my patrons who are supporting my writing on this blog. If you like what you’ve read here and you’d like to read more of it, or you’d like to support my various open-source endeavors, you can support my work as a sponsor!

Read the whole story
luizirber
68 days ago
reply
Davis, CA
acdha
68 days ago
reply
Washington, DC
Share this story
Delete

Against Exploration

1 Comment and 3 Shares
Against Exploration / Cat: The urge to explore - / Cat: an INVIOLABLE part of human nature Girl: For jerks! / Cat: to take a chance Girl: There's always a couple of jerks who won't SHUT UP about it / Cat: See what there is to see - Girl: So you give them a boat and off they go! THEN everything is FINE / Cat: TAKE what there is to TAKE Girl: Until some OTHER place's jerks shoe up in THEIR boat! / Girl: ugh / Cat: What lies beyond the sea? What lies beyond the stars? Girl: MORE JERKS / Cat: I won't settle for less. Girl: You've settled PLENTY
Read the whole story
jepler
88 days ago
reply
I feel so rebuked, as a person who used to be so excited about space colonialism
Earth, Sol system, Western spiral arm
paniczap4335
87 days ago
Ayyyeeee
luizirber
80 days ago
reply
Davis, CA
Share this story
Delete

Get Me Out Of Data Hell

2 Shares

It is 9:59 AM in Melbourne, 9th October, 2024. Sunlight filters through my windows, illuminating swirling motes of dust across my living room. There is a cup of tea in my hand. I take a sip and savor it.

I text the other senior engineer, who unlike me is full-time, on the team: "I'm ready to start at 10", as is our custom.

The minute hand moves.

It is 10:00 AM in Melbourne, 9th October, 2024. The sun is immediately extinguished and replaced by a shrieking skull hanging low in a frigid sky. I glance down at my tea, and it is as blood. I take a sip and savor it.

I text the other senior engineer on the team: "Are you ready to enter the Pain Zone?"1, as is our custom.

I. The Pain Zone

The Pain Zone, coated in grass which rends those who tread upon it like a legion of upraised spears, is an enterprise data warehouse platform. At the small scale we operate at, with little loss of detail, a data warehouse platform simply means that we copy a bunch of text files from different systems into a single place every morning.

The word enterprise means that we do this in a way that makes people say "Dear God, why would anyone ever design it that way?", "But that doesn't even help with security" and "Everyone involved should be fired for the sake of all that is holy and pure."

For example, the architecture diagram which describes how we copy text files to our storage location has one hundred and four separate operations on it. When I went to count this, I was expecting to write forty and that was meant to illustrate my point. Instead, I ended up counting them up three times because there was no way it could be over a hundred. This whole thing should have ten operations in it.

Retrieve file. Validate file. Save file. Log what you did. Those could all be one point on the diagram, but I'm being generous. And you can keep the extra six points for stuff I forgot. That's ten. Why are there a hundred and four? Sweet merciful Christ, why?

At two of the four businesses I've worked at, the most highly-performing engineers have resorted to something that I think of as Pain Zone navigation. It's the practice of never working unless pair programming simply to have someone next to you, bolstering your resolve, so that you can gaze upon the horrors of the Pain Zone without immediately losing your mind. Of course, no code alone can make people this afraid of work. Code is, ultimately, characters on a screen, and software engineers do nothing but hammer that code into shapes that spark Joy and Money. The fear and dread comes from a culture where people feel bad that they can't work quickly enough in the terrible codebase, where they feel judged for slowing down to hammer the code into better shapes that sadly aren't on the Jira board, and where management looks down on people who practice craftsmanship.

The last doesn't even require malicious management — it just needs people that don't respect how deep craftsmanship can go. These are the same people that do not appreciate that an expert pianist is not simply pressing keys, they are obsessively perfecting timing and the force applied to each key, alongside dozens of factors that I can't comprehend. An unthoughtful person will see something and think "It can't be that hard" or more generously "That looks quite hard" . The respectful thought to have when viewing any competent professional in a foreign domain, in every domain that I'm aware of, is "That must be way harder than it looks."

I have now seen enough workplaces, and thanks to the blog have access to enough executives, to know that this is what most cultures degenerate into. Terrible companies are perpetual cognitohazards where everyone is bullied all day. The median companies (which some people call "good" for lack of ever having seen better) lack the outright bullying but still consist of people that are trying to convince themselves that it's fine to feel disempowered or subservient all day. There are times in my life where I have to deal with this, but it is hardly fine. The best are places where you can get at least some of the things that a person needs other than rent money2 .

This place has a better culture than most, as bullying is mostly not tolerated, we have fully remote work, and it is a terrible faux pas to accuse someone of not working fast enough outright, though you can hint at it gently.

It is worse than most places at software engineering, as... oh, you'll see.

In any case, I have a deal with the team. Every morning, grab your coffee, attend your meetings, and at 10 AM we navigate the Pain Zone together for at least three to four hours. Management is blissfully unaware that this force of camaraderie and mutual psychotherapy is the only way that things continue to limp along.

II. I Am Lost To The Pain Zone

We have one simple job today. The organization wants to know:

  1. Is data coming into our system?
  2. Is any data being lost?
  3. Each data source goes through approximately thirteen steps on average — how many are getting stuck along the way?

Someone has already landed all of the logs our system produces in the data warehouse, so we can examine them in there, alongside the actual data. Is that smart? I dunno, something feels a bit weird about it but I have no concrete objection. My co-navigator and I decide to look at the logs for one data source.

This is where, five seconds in, we begin to become lost in the Pain Zone.

Let's say the data source we picked was "Google Analytics". We search the landed logs for Google Analytics, expecting to see something like this.

Source
Google Analytics

Here is what I actually see in the source column, and yes, it actually looked exactly this bad.

Source
6g94-8jjf-eo84757h4758z", "jobStatus": "JobStatus.Waiting", "jobExpiry":"2023-10

That... is not "Google Analytics". In fact, what the fuck is that? It looks like someone has dumped a random snippet of JSON into the logs, but not even the entirety of the JSON. The strings aren't terminated. We should have around fifty source systems, so how many distinct source systems appear in— FIFTY-SEVEN THOUSAND?

We've been writing total nonsense to half the logs for over a year and no one noticed? We only have two jobs. Get the data and log that we got the data. But the logs are nonsense, so we aren't doing the second thing, and because the logs are nonsense I don't know if we've been doing the first thing.

I take a deep breath. The plan is to submit my notice on December 2nd anyway, so this is fine. This is so fine. The other engineers already know I'm leaving, and we've all committed to do the best we can for two months for our spiritual growth. The ability to do painful things is a virtuous skill to cultivate as a responsible adult.

Okay, how is this happening? Well, it turns out that we're embedding a huge amount of metadata in filenames, and the Lambda functions that produce all of this — of course, we're serverless, because how can you hurt yourself without a cutting-edge? — use lots of regex to extract data. Unfortunately, because we don't have any tests, someone eventually wrote some code to download data that passed a big JSON blob instead of a filename to the logging function, and that function happily went "Great, I'll just regex out the source system from the file name!" Except it wasn't a filename, so it has instead spewed garbage into the system for months.

I find something like this every time we enter the Pain Zone. Sometimes we've laughed so hard that we've cried at the things we've seen $2,000 per day consultants do.

The issue is raised with the team, but because fixing this critical error in our auditability is not on the board and Velocity Must Be Up, fixing the logs is judged to be less important than... parsing... the nonsense logs. Why? We have another saying on our team, which is "Stop asking questions, you're only going to hurt yourself".

I take another deep breath.

Okay, we'll continue with the work instead of fixing the critical production error. We can't query the Google Analytics stuff based on source system, so let's pick another one. We also draw data from Twitter once every two hours, and that source column isn't broken for that. I just need to be able to associate the log events to begin working on that. That is, we'll have one log that says "I downloaded the data from Twitter" and another log that says "I checked that it had all the correct stuff in it", and I just have to tie them together.

I don't like the way this table is configured for various reasons, but I'm expecting to see:

Event ID Source Event Success
1 Twitter Downloaded True
1 Twitter Validated True

Then I can just do something like:

select
  event,
  success
from
  log_table
where event_id = 1 and source = 'twitter'

Now I can see if all the correct stuff happened.

But I cannot find an event_id column or anything that looks like one. I hit up the expert on this system, and am informed that I should use the awslog column. I look at it.

It looks like this:

awslog
converted/twitter/retweets_per_post/year=2023/month=03/day=11/retweets_per_post_fact-00045-8b3226g9.txt | Validated

I mean, firstly, what the fuck is this? Secondly, what the fuck is this? Thirdly, well, you get it. Why not just store this in a relational format? Why are they all in one column? Why do you hate me specifically?

Stop asking questions, you're only going to hurt yourself.

I am expected to use regular expressions to construct a key in my query. As far as I can tell, the numbers and letters don't represent or uniquely identify anything, they've really just been appended for no reason. I waste a fair amount of time figuring out if I can use them.

December 2nd, I tell myself. Of course, I could be working on a book, shipping a hobby project, and dedicating more time to the business we're committing to in January, but December 2nd was the plan. It will be a great exercise to come up with a plan to gradually refactor all of this while delivering the things we're supposed to. It will make me a better engineer. December 2nd, December 2nd, hold fast.

III. I'm Out

Okay, we can write a regular expression to identify all Twitter sources that came from 11/03/2023. This is very stupid, but compared to minimum wage in my home country, I am being compensated spectacularly to deal with this particular brand of stupidity.

But wait, we retrieve this once every two hours, which means that while I can find all the Twitter data pulls from the 11th, I can't actually tell which rows are associated with the 8 AM run versus the 2 PM run. This perplexing awslog column only identifies things down to the day, not the hour. We have another column that logs the exact time down to the second that a Lambda function has fired, but each step happens at a slightly different time, and each source takes different amounts of time based on filesize.

I message the team. "Any ideas for how to identify specific runs that don't assume there is only one run per day?"

We take a ten minute break. We return.

I am informed that there is no way to do this. All I can think of is to create a heuristic per data source, such that I see when the file was acquired then scan for the validation event that happens closest to the acquisition event without going so far ahead that I read the next successful validation event by mistake. I just wanted to see if data was landing in the platform. And to make things worse, I suddenly remember that I've seen this awslog thing before. A month after I joined the business, I saw it, and I said that it was unacceptably bad. The response was that it's okay because all the data we want is technically inside those strings, and this design is more flexible. Of course, since then we've added our first data source that is downloaded more than once a day, so it turns out, shockingly, that they should have Just Used Postgres and not tried to be excessively clever. As always.

How have we been running things like this for two years? Millions of dollars were spent on this system. Our CTO, who has never written code themselves, gets on stages every few months and just lies to people about things that the CTO can't possibly understand, pretending that any of this works and that they're a leader in the space. Then their friends buy the same software — I know because recruiters keep calling to ask me if I'll help lead the efforts. Almost every large business in Melbourne is rushing to purchase our tooling, tools like Snowflake and Databricks, because the industry is pretending that any of this is more important than hiring competent people and treating them well. I could build something superior to this with an ancient laptop, an internet connection, and spreadsheets. It would take me a month tops.

I've known for a long time that I can't change things here. But in this moment, I realize that the organization values things that I don't value, and it's as simple as that. I could pretend to be neutral and say that my values aren't better, but you know what, my values are better. Having tested code is better. Having comprehensible logs is better. I'm wasting their money sitting around until December, which is unethical. I'm disrespecting myself waiting two more months for a measly Christmas break payout, which is unwise. I've even degraded team morale because I've convinced some of the engineers that things should be better, but not management, so now some of the engineers are upset. I'm a net negative for this team, except for that one time I saved them so much money that it continues to cover all three of our managers' salaries combined.

As an afterthought, the person who just informed us that we have no way to associate logs to their respective ingestion events adds:

"By the way, I think that there's a chance some of the logs don't actually report the right things. Like the ones that say Validated: True are actually just hardcoded strings in the Lambda functions, and the people that wrote them may have meant to type in things like File Landed: True but made mistakes."

I am dumbstruck. The other senior is laughing hysterically.

It is 11:30 AM in Melbourne, 9th October, 2024. The wind is a vortex of ghost-knives sending birds careening from the sky. I glance down at my tea, and it is liquid hatred. I take a sip and savor it.

"Hey, are you still there?", my pairing partner replies.

"Yeah. Yeah. Listen, I'm done. I'm out today."

"What? What about December?"

"I could get the entire terrible first draft of a whole book out by December if I wasn't wasting time on this."

"... Fair."

I briefly consider contacting my partner, but I know she'll support me. I could check in with my parents, but they'd just worry for no reason. I could chat with my co-founders, but they're just going to tell me to do what I need to do. I could sleep on it, but that would just be to give myself the illusion of responsibility even as I barrel towards wasting two more months to earn money that, thanks to five years of diligently navigating various Pain Zones, I don't even need.

I resign at 2:00 PM.

IV. Blessed Freedom

It is 3:00 PM in Melbourne, 9th October, 2024. I have called my director, who is highly competent, and explained why every engineer wants to quit, and finalized the paperwork. My last day is the 5th of November, 2024. My only job title is now director of my own consultancy, and in January my savings will start to tick down. I glance down at my tea, and it is tea. I take a sip and savor it.

PS:

Firstly, I gave a talk at GDG Melbourne which you can watch here. The audio quality is not great, so I forgive anyone who taps out. The comments are weird because I asked people to flip a coin and respond with "This guy is the next Steve Jobs" or "This guy seems like a real piece of work", which I only regret a little bit. I should not be allowed to run a business.

Secondly, I gave a webinar to US board members at the invitation of the Financial Times. Suffice it to say that while people are sincerely trying their best, our leaders are not even remotely equipped to handle the volume of people just outright lying to them about IT. Also apparently my psychotic blog does not disqualify me from Financial Times affiliation, which is wild, but is maybe a useful lesson that the world is desperate for sincerity even when it isn't dressed up as corporate maturity.


  1. We do actually say this every morning. 

  2. The people that expect to get all of them are probably not doing themselves any favors either. 

Read the whole story
luizirber
94 days ago
reply
Davis, CA
Share this story
Delete

Simplify Your Bioinformatics Workflow with Pixi: A Fresh Take on Conda

1 Share
How to adopt pixi on an HPC cluster near you. Simplify your bioinformatics workflow today, with Pixi!
Read the whole story
luizirber
146 days ago
reply
Davis, CA
Share this story
Delete

I Want Enemies

2 Shares
Read the whole story
luizirber
160 days ago
reply
Davis, CA
Share this story
Delete
Next Page of Stories