918 stories
·
28 followers

The Federation Deathmatch

2 Shares

It’s the weekend, and I have some Thoughts about federated social media. So, buckle up, I guess, it’s time to start some fights.


Recently there has been some discourse about Bluesky’s latest fundraising round. I’ve been participating in conversations about this on Mastodon, and I think I might sometimes come across as a Mastodon partisan, but my feelings are complex and I really don’t want to be boosting the ActivityPub Fediverse without qualification.

So here are some qualifications.

Bluesky Is Evil

To the extent that I am an ActivityPub partisan in the discourse between ActivityPub and ATProtocol, it is because I do not believe that Bluesky is a meaningfully decentralized social network. It is a social network, run by a company, which has a public API with some elements that might, one day, make it possible for it to be decentralized. But today, it is not, either practically or theoretically.

The Bluesky developers are putting in a ton of effort to maybe make it decentralized, hypothetically, someday. A lot of people think they will succeed. But ActivityPub (and, of course, Mastodon specifically) are already, today, meaningfully decentralized, as you can see on FediDB, there are instances with hundreds of thousands of people on them, before we even get to esoterica like the integrations Threads, Wordpress, Flipboard, and Ghost are doing.

The inciting incident for this post — that a lot of people are also angry about Bluesky raising millions of dollars from Evil Guys Doing Evil Stuff Capitalis indeed a serious concern. It lights the fuse that burns towards their eventual, inevitable incredible journey. ATProtocol is just an API, and that API will get shut off one day, whenever their funders get bored of the pretense of their network being “decentralized”.

At time of writing, it is also interesting that 3 of the 4 times that the CEO of Bluesky has even skeeted the word “blockchain” is to say “no blockchain”, to reassure users that the scam magnet of “Blockchain” is not actually near their product or protocol, which is a much harder position to maintain when your lead investor is “Blockchain Capital”.

I think these are all valid criticisms of Bluesky. But I also think that the actual engineers working on the product are aware of these issues, and are making a significant effort to address them or mitigate them in any way they can. All that work can still be easily incinerated by a slow quarter in terms of user growth numbers or a missed revenue forecast when the VCs are getting impatient, but it’s not nothing, it is a life’s work.

Really, who among us could not have our life’s ambitions trivially destroyed in an afternoon, simply because a billionaire decided that they should be? If you feel like you are safe from this, I have some bad news about how money works. So we are all doing our best in an imperfect system and maybe Bluesky is on to something here. That’s eminently possible. They’re certainly putting forth an earnest effort.

Mastodon Is Stupid

Meanwhile, not nearly as much has been made recently of Mastodon refusing funding from a variety of sources, when all indications are that funding is low, and plummeting, far below the level required to actually sustain the site, and they haven’t done a financial transparency report for over a year, and that report was already nearly a year late.

Mastodon and the fediverse are not nearly in a position to claim moral superiority over Bluesky. Sure, taking blockchain VC money might seem like a rookie mistake, but going out of business because you are spurning every possible source of funding is not that wise either.

Some might think that, sure, Mastodon the company might die but at least the Fediverse as a whole will keep going strong, right? Lots of people run their own instances! I even find elements of this argument convincing, and I think there is probably some truth to it. But to really believe this argument as claimed, that it’s a fait accompli that the fediverse will survive in some form, that all those self-run servers will be a robust network that will self-repair, requires believing some obviously false stuff. It is frankly unprofitable to run a Fediverse instance. Realistically, if you want to operate a mastodon server for yourself, it is going to cost at least $100/year once you include stuff like having a domain name, and managing the infrastructure costs is a complex problem that keeps getting harder to manage as the software itself gets slower.

Cory Doctorow has recently argued that this is all worth it, because at least on Mastodon, you’re in control, not at the whims of centralized website operators like Bluesky. In his words,

On Mastodon (and other services based on Activitypub), you can easily leave one server and go to another, and everyone you follow and everyone who follows you will move over to the new server. If the person who runs your server turns out to be imperfect in a way that you can’t endure, you can find another server, spend five minutes moving your account over, and you’re back up and running on the new server

He concludes:

Any system where users can leave without pain is a system whose owners have high switching costs and whose users have none

(Emphasis mine).

This is a beautiful vision. It is, however, an incorrect assessment of the state of the Fediverse as it stands today. It’s not true in two important ways:

First, if you look at any account of a user’s fediverse account migration, like this one from Steve Bate or this one from the Ente project or this one from Erin Kissane, you will see that it is “painful for the foreseeable future” or “wasn’t as seamless as advertised”, and that “the best time to […] migrate instances […] is never”. This language does not presage a pleasant experience, as Doctorow puts it, “without pain”.

Second, migration is an active process that requires engagement from the instance that hosts you. If you have been blocked or banned, or had your account terminated, you are just out of luck. You do not have control over your data or agency over your online identity unless you’ve shelled out the relatively exorbitant amount of money to actually operate your own instance.

In short, ActivityPub is no panacea. A federated system is not really a “decentralized” system, as much as it is a bunch of smaller centralized systems that all talk to each other. You still need to know, and care, about your social and financial relationship to the operators of your instance. There is probably no getting away from this, like, just generally on the Internet, no matter how much peer-to-peer software we deploy, but there certainly isn’t in the incomplete mess that is ActivityPub.

JOIN, or DIE.

Neither Mastodon (or ActivityPub) nor Bluesky (or ATProtocol) has a comprehensive solution to the problem of decentralized social media. These companies, and these protocols, are both deeply flawed and if everything keeps bumping along as it is, I believe both are likely to fail. At different times, on different timelines, and for different reasons, but fail nonetheless.

However, these networks are both small and growing, and we are not yet in the phase of enshittification where margins are shrinking and audiences are captured and the screws must be tightened to juice revenue. There are stil possibilities. Mastodon is crowdfunded and what they lack in resources they make up for in flexibility and scrappiness. Bluesky has money and while there will eventually be a need to monetize somehow, they have plenty of runway to come up with that answer, and a lot of sophisticated protocol work has been done. Not enough to make a complete circut and allow users true, practical decentralization, but it’s not nothing, either.

Mastodon and Bluesky are both organizations with humans in them, and piles of data that is roughly schema-compatible even if the nuances and details are different. I know that there is a compatible model becuse thanks to both platforms being relatively open, there is a functioning ActivityPub/ATProtocol bridge in the form of Brid.gy Fed. You can use it today, and I highly recommend that you do so, so that “choice of protocol” does not fully define your audience. If you’re on bluesky, follow this account, and if you’re on Mastodon or elsewhere on the Fediverse, search for and follow @bsky.brid.gy@bsky.brid.gy.

The reality that fans of decentralized, independent social media must confront is that we are a tiny audicence right now. Whichever site we are looking at, we are talking about a few million monthly active users at best, in a world where even the pathetic husk of Twitter still has hundreds of millions and Facebook has billions. Interneceine fights are not going to get us anywhere. We need to build bridges and links and connect our networks as densely as possible. If I’m being honest, Bridgy Fed looks like a pretty janky solution, but it’s something, and we need to start doing something soon, so we do not collectively become a permanent minority that mass markets can safely ignore.

As users, we need to set an example, so that the developers of the respective platforms get their shit together and work together directly so that workarounds like Bridgy are not required. Frankly, this is mostly on the ActivityPub and Mastodon devs, as far as I can tell. Unfortunately, not a lot of this seems to be public, or at least I haven’t witnessed a lot of it directly, but I have heard repeatedly that the ActivityPub developers are prickly, and this is one high-profile public example where an ActivityPub partisan is incredibly, pointlessly hostile and borderline harrassing towards someone — Mike Masnick, a long-time staunch advocate for open protocols and open patents, someone with a Mastodon account, and thus as good a prospective ally as the ActivityPub fediverse might reasonably find — explaining some of the relative benefits of Bluesky.

Most of us are technology nerds in one way or another. In that way we can look at signifiers like “ActivityPub” and “ATProtocol”, and feel like these are hard boundaries around different all-encompassing structures for the future, and thus tribes we must join and support.

A better way to look at this, however, is to see social entities like Mastodon gGmbH and Bluesky PBC — or, more to the point, Fosstodon, SFBA Social, Hachyderm (and maybe, one day, even an instance which isn’t fully just for software development nerds), as groups that deploy these protocols to access some data that they publish, just as they might publish their website over HTTP or their newsletters over SMTP. There are technical challenges involved in bridging between mutually unintelligible domain models, but that is, like, network software's whole deal. Most software is just some kind of translation from one format or context to another. The best possible future for the fediverse is the one where users care as much about the distinction between ATProtocol and ActivityPub as they do about the distinction between POP3 and IMAP.

To both developers and users of these systems, I say: get it together. Be nice to each other. Because the rest of the social media ecosystem is sure as shit not going to be nice to us if we ever see even a hint of success and start to actually cut into their user base.

Acknowledgments

Thank you to my patrons who are supporting my writing on this blog. If you like what you’ve read here and you’d like to read more of it, or you’d like to support my various open-source endeavors, you can support my work as a sponsor!

Read the whole story
luizirber
13 days ago
reply
Davis, CA
acdha
13 days ago
reply
Washington, DC
Share this story
Delete

Against Exploration

1 Comment and 3 Shares
Against Exploration / Cat: The urge to explore - / Cat: an INVIOLABLE part of human nature Girl: For jerks! / Cat: to take a chance Girl: There's always a couple of jerks who won't SHUT UP about it / Cat: See what there is to see - Girl: So you give them a boat and off they go! THEN everything is FINE / Cat: TAKE what there is to TAKE Girl: Until some OTHER place's jerks shoe up in THEIR boat! / Girl: ugh / Cat: What lies beyond the sea? What lies beyond the stars? Girl: MORE JERKS / Cat: I won't settle for less. Girl: You've settled PLENTY
Read the whole story
jepler
33 days ago
reply
I feel so rebuked, as a person who used to be so excited about space colonialism
Earth, Sol system, Western spiral arm
paniczap4335
31 days ago
Ayyyeeee
luizirber
24 days ago
reply
Davis, CA
Share this story
Delete

Get Me Out Of Data Hell

2 Shares

It is 9:59 AM in Melbourne, 9th October, 2024. Sunlight filters through my windows, illuminating swirling motes of dust across my living room. There is a cup of tea in my hand. I take a sip and savor it.

I text the other senior engineer, who unlike me is full-time, on the team: "I'm ready to start at 10", as is our custom.

The minute hand moves.

It is 10:00 AM in Melbourne, 9th October, 2024. The sun is immediately extinguished and replaced by a shrieking skull hanging low in a frigid sky. I glance down at my tea, and it is as blood. I take a sip and savor it.

I text the other senior engineer on the team: "Are you ready to enter the Pain Zone?"1, as is our custom.

I. The Pain Zone

The Pain Zone, coated in grass which rends those who tread upon it like a legion of upraised spears, is an enterprise data warehouse platform. At the small scale we operate at, with little loss of detail, a data warehouse platform simply means that we copy a bunch of text files from different systems into a single place every morning.

The word enterprise means that we do this in a way that makes people say "Dear God, why would anyone ever design it that way?", "But that doesn't even help with security" and "Everyone involved should be fired for the sake of all that is holy and pure."

For example, the architecture diagram which describes how we copy text files to our storage location has one hundred and four separate operations on it. When I went to count this, I was expecting to write forty and that was meant to illustrate my point. Instead, I ended up counting them up three times because there was no way it could be over a hundred. This whole thing should have ten operations in it.

Retrieve file. Validate file. Save file. Log what you did. Those could all be one point on the diagram, but I'm being generous. And you can keep the extra six points for stuff I forgot. That's ten. Why are there a hundred and four? Sweet merciful Christ, why?

At two of the four businesses I've worked at, the most highly-performing engineers have resorted to something that I think of as Pain Zone navigation. It's the practice of never working unless pair programming simply to have someone next to you, bolstering your resolve, so that you can gaze upon the horrors of the Pain Zone without immediately losing your mind. Of course, no code alone can make people this afraid of work. Code is, ultimately, characters on a screen, and software engineers do nothing but hammer that code into shapes that spark Joy and Money. The fear and dread comes from a culture where people feel bad that they can't work quickly enough in the terrible codebase, where they feel judged for slowing down to hammer the code into better shapes that sadly aren't on the Jira board, and where management looks down on people who practice craftsmanship.

The last doesn't even require malicious management — it just needs people that don't respect how deep craftsmanship can go. These are the same people that do not appreciate that an expert pianist is not simply pressing keys, they are obsessively perfecting timing and the force applied to each key, alongside dozens of factors that I can't comprehend. An unthoughtful person will see something and think "It can't be that hard" or more generously "That looks quite hard" . The respectful thought to have when viewing any competent professional in a foreign domain, in every domain that I'm aware of, is "That must be way harder than it looks."

I have now seen enough workplaces, and thanks to the blog have access to enough executives, to know that this is what most cultures degenerate into. Terrible companies are perpetual cognitohazards where everyone is bullied all day. The median companies (which some people call "good" for lack of ever having seen better) lack the outright bullying but still consist of people that are trying to convince themselves that it's fine to feel disempowered or subservient all day. There are times in my life where I have to deal with this, but it is hardly fine. The best are places where you can get at least some of the things that a person needs other than rent money2 .

This place has a better culture than most, as bullying is mostly not tolerated, we have fully remote work, and it is a terrible faux pas to accuse someone of not working fast enough outright, though you can hint at it gently.

It is worse than most places at software engineering, as... oh, you'll see.

In any case, I have a deal with the team. Every morning, grab your coffee, attend your meetings, and at 10 AM we navigate the Pain Zone together for at least three to four hours. Management is blissfully unaware that this force of camaraderie and mutual psychotherapy is the only way that things continue to limp along.

II. I Am Lost To The Pain Zone

We have one simple job today. The organization wants to know:

  1. Is data coming into our system?
  2. Is any data being lost?
  3. Each data source goes through approximately thirteen steps on average — how many are getting stuck along the way?

Someone has already landed all of the logs our system produces in the data warehouse, so we can examine them in there, alongside the actual data. Is that smart? I dunno, something feels a bit weird about it but I have no concrete objection. My co-navigator and I decide to look at the logs for one data source.

This is where, five seconds in, we begin to become lost in the Pain Zone.

Let's say the data source we picked was "Google Analytics". We search the landed logs for Google Analytics, expecting to see something like this.

Source
Google Analytics

Here is what I actually see in the source column, and yes, it actually looked exactly this bad.

Source
6g94-8jjf-eo84757h4758z", "jobStatus": "JobStatus.Waiting", "jobExpiry":"2023-10

That... is not "Google Analytics". In fact, what the fuck is that? It looks like someone has dumped a random snippet of JSON into the logs, but not even the entirety of the JSON. The strings aren't terminated. We should have around fifty source systems, so how many distinct source systems appear in— FIFTY-SEVEN THOUSAND?

We've been writing total nonsense to half the logs for over a year and no one noticed? We only have two jobs. Get the data and log that we got the data. But the logs are nonsense, so we aren't doing the second thing, and because the logs are nonsense I don't know if we've been doing the first thing.

I take a deep breath. The plan is to submit my notice on December 2nd anyway, so this is fine. This is so fine. The other engineers already know I'm leaving, and we've all committed to do the best we can for two months for our spiritual growth. The ability to do painful things is a virtuous skill to cultivate as a responsible adult.

Okay, how is this happening? Well, it turns out that we're embedding a huge amount of metadata in filenames, and the Lambda functions that produce all of this — of course, we're serverless, because how can you hurt yourself without a cutting-edge? — use lots of regex to extract data. Unfortunately, because we don't have any tests, someone eventually wrote some code to download data that passed a big JSON blob instead of a filename to the logging function, and that function happily went "Great, I'll just regex out the source system from the file name!" Except it wasn't a filename, so it has instead spewed garbage into the system for months.

I find something like this every time we enter the Pain Zone. Sometimes we've laughed so hard that we've cried at the things we've seen $2,000 per day consultants do.

The issue is raised with the team, but because fixing this critical error in our auditability is not on the board and Velocity Must Be Up, fixing the logs is judged to be less important than... parsing... the nonsense logs. Why? We have another saying on our team, which is "Stop asking questions, you're only going to hurt yourself".

I take another deep breath.

Okay, we'll continue with the work instead of fixing the critical production error. We can't query the Google Analytics stuff based on source system, so let's pick another one. We also draw data from Twitter once every two hours, and that source column isn't broken for that. I just need to be able to associate the log events to begin working on that. That is, we'll have one log that says "I downloaded the data from Twitter" and another log that says "I checked that it had all the correct stuff in it", and I just have to tie them together.

I don't like the way this table is configured for various reasons, but I'm expecting to see:

Event ID Source Event Success
1 Twitter Downloaded True
1 Twitter Validated True

Then I can just do something like:

select
  event,
  success
from
  log_table
where event_id = 1 and source = 'twitter'

Now I can see if all the correct stuff happened.

But I cannot find an event_id column or anything that looks like one. I hit up the expert on this system, and am informed that I should use the awslog column. I look at it.

It looks like this:

awslog
converted/twitter/retweets_per_post/year=2023/month=03/day=11/retweets_per_post_fact-00045-8b3226g9.txt | Validated

I mean, firstly, what the fuck is this? Secondly, what the fuck is this? Thirdly, well, you get it. Why not just store this in a relational format? Why are they all in one column? Why do you hate me specifically?

Stop asking questions, you're only going to hurt yourself.

I am expected to use regular expressions to construct a key in my query. As far as I can tell, the numbers and letters don't represent or uniquely identify anything, they've really just been appended for no reason. I waste a fair amount of time figuring out if I can use them.

December 2nd, I tell myself. Of course, I could be working on a book, shipping a hobby project, and dedicating more time to the business we're committing to in January, but December 2nd was the plan. It will be a great exercise to come up with a plan to gradually refactor all of this while delivering the things we're supposed to. It will make me a better engineer. December 2nd, December 2nd, hold fast.

III. I'm Out

Okay, we can write a regular expression to identify all Twitter sources that came from 11/03/2023. This is very stupid, but compared to minimum wage in my home country, I am being compensated spectacularly to deal with this particular brand of stupidity.

But wait, we retrieve this once every two hours, which means that while I can find all the Twitter data pulls from the 11th, I can't actually tell which rows are associated with the 8 AM run versus the 2 PM run. This perplexing awslog column only identifies things down to the day, not the hour. We have another column that logs the exact time down to the second that a Lambda function has fired, but each step happens at a slightly different time, and each source takes different amounts of time based on filesize.

I message the team. "Any ideas for how to identify specific runs that don't assume there is only one run per day?"

We take a ten minute break. We return.

I am informed that there is no way to do this. All I can think of is to create a heuristic per data source, such that I see when the file was acquired then scan for the validation event that happens closest to the acquisition event without going so far ahead that I read the next successful validation event by mistake. I just wanted to see if data was landing in the platform. And to make things worse, I suddenly remember that I've seen this awslog thing before. A month after I joined the business, I saw it, and I said that it was unacceptably bad. The response was that it's okay because all the data we want is technically inside those strings, and this design is more flexible. Of course, since then we've added our first data source that is downloaded more than once a day, so it turns out, shockingly, that they should have Just Used Postgres and not tried to be excessively clever. As always.

How have we been running things like this for two years? Millions of dollars were spent on this system. Our CTO, who has never written code themselves, gets on stages every few months and just lies to people about things that the CTO can't possibly understand, pretending that any of this works and that they're a leader in the space. Then their friends buy the same software — I know because recruiters keep calling to ask me if I'll help lead the efforts. Almost every large business in Melbourne is rushing to purchase our tooling, tools like Snowflake and Databricks, because the industry is pretending that any of this is more important than hiring competent people and treating them well. I could build something superior to this with an ancient laptop, an internet connection, and spreadsheets. It would take me a month tops.

I've known for a long time that I can't change things here. But in this moment, I realize that the organization values things that I don't value, and it's as simple as that. I could pretend to be neutral and say that my values aren't better, but you know what, my values are better. Having tested code is better. Having comprehensible logs is better. I'm wasting their money sitting around until December, which is unethical. I'm disrespecting myself waiting two more months for a measly Christmas break payout, which is unwise. I've even degraded team morale because I've convinced some of the engineers that things should be better, but not management, so now some of the engineers are upset. I'm a net negative for this team, except for that one time I saved them so much money that it continues to cover all three of our managers' salaries combined.

As an afterthought, the person who just informed us that we have no way to associate logs to their respective ingestion events adds:

"By the way, I think that there's a chance some of the logs don't actually report the right things. Like the ones that say Validated: True are actually just hardcoded strings in the Lambda functions, and the people that wrote them may have meant to type in things like File Landed: True but made mistakes."

I am dumbstruck. The other senior is laughing hysterically.

It is 11:30 AM in Melbourne, 9th October, 2024. The wind is a vortex of ghost-knives sending birds careening from the sky. I glance down at my tea, and it is liquid hatred. I take a sip and savor it.

"Hey, are you still there?", my pairing partner replies.

"Yeah. Yeah. Listen, I'm done. I'm out today."

"What? What about December?"

"I could get the entire terrible first draft of a whole book out by December if I wasn't wasting time on this."

"... Fair."

I briefly consider contacting my partner, but I know she'll support me. I could check in with my parents, but they'd just worry for no reason. I could chat with my co-founders, but they're just going to tell me to do what I need to do. I could sleep on it, but that would just be to give myself the illusion of responsibility even as I barrel towards wasting two more months to earn money that, thanks to five years of diligently navigating various Pain Zones, I don't even need.

I resign at 2:00 PM.

IV. Blessed Freedom

It is 3:00 PM in Melbourne, 9th October, 2024. I have called my director, who is highly competent, and explained why every engineer wants to quit, and finalized the paperwork. My last day is the 5th of November, 2024. My only job title is now director of my own consultancy, and in January my savings will start to tick down. I glance down at my tea, and it is tea. I take a sip and savor it.

PS:

Firstly, I gave a talk at GDG Melbourne which you can watch here. The audio quality is not great, so I forgive anyone who taps out. The comments are weird because I asked people to flip a coin and respond with "This guy is the next Steve Jobs" or "This guy seems like a real piece of work", which I only regret a little bit. I should not be allowed to run a business.

Secondly, I gave a webinar to US board members at the invitation of the Financial Times. Suffice it to say that while people are sincerely trying their best, our leaders are not even remotely equipped to handle the volume of people just outright lying to them about IT. Also apparently my psychotic blog does not disqualify me from Financial Times affiliation, which is wild, but is maybe a useful lesson that the world is desperate for sincerity even when it isn't dressed up as corporate maturity.


  1. We do actually say this every morning. 

  2. The people that expect to get all of them are probably not doing themselves any favors either. 

Read the whole story
luizirber
39 days ago
reply
Davis, CA
Share this story
Delete

Simplify Your Bioinformatics Workflow with Pixi: A Fresh Take on Conda

1 Share
How to adopt pixi on an HPC cluster near you. Simplify your bioinformatics workflow today, with Pixi!
Read the whole story
luizirber
90 days ago
reply
Davis, CA
Share this story
Delete

I Want Enemies

2 Shares
Read the whole story
luizirber
104 days ago
reply
Davis, CA
Share this story
Delete

We Have a Mouse

3 Shares
Read the whole story
luizirber
104 days ago
reply
Davis, CA
Share this story
Delete
Next Page of Stories