Democratizing Algorithmic Feeds on Bluesky

If the original sin of Web 1.0 was the pop-up ad, the original sin of web 2.0 was the move to algorithmic feeds. Opaque optimization strategies aimed at maximizing private revenue for the sake of what was otherwise externally billed as public goods became increasingly toxic, spawning discourse about echo chambers and filter bubbles.

Now at its zenith, what were once proof of concept Rails apps and PHP dorm room projects have leveraged "the feed" writ large as the catalyst of an unprecedented concentration of wealth in a new gilded age. The election this week only further underscores the need to move beyond the mode of content provisioning that dominated the last 20 years and try to figure out how we can create new channels of communication from the media strip mining operation that has laid waste to our ecosystem.

To me, federated social media feels like the right way out. A system that is inherently designed to resist the concentration of power that got us in this mess feels like the right next step towards a better internet landscape. To that end, the remarkable growth of Bluesky over the past few weeks feels like an inflection point. I have had an account since April 2023, but I didn't use it - after all, none of my friends were there. But some time, in the last while, the network effects, at least locally, finally kicked in. Just like that, I've totally transitioned, and I couldn't imagine going back. It feels like Old Twitter, in the best way possible.

To be sure, Bluesky isn't the only attempt at more federated tools, but it seems to me to strike the balance between the ease of use that is mandatory for the non-dweebs to get into, while having many of the properties that will make it more resistant to the diseases that have taken over on the other platforms. To that end, I've been looking for ways to contribute for the last few weeks, and I think I found my niche.

If the algorithmic feed is the original sin of Web 2.0, what does a path forwards look like? I've read purists time and again just insisting on the straight dough - time descending, all posts from followed accounts. Of course, in practice, this is such a narrow way forwards that it resists practical constraints of needing to winnow down to relevance in at least some fashion. People like the feeds, they just don't like several properties of the feeds. What are those properties, and how can we work out a way forwards?

Principally, the feed does not work for you -- you work for the feed. You are the paypig to the feed on any moderately successful platform. How can we use federated tools to invert this power?

My thoughts on this are getting firmed up in my sky-feeder repo. On Bluesky, instead of the feed working internal on the company servers, and as such, subject to company needs and whims, the feeds work on your own machines. Of course, there is the burden of hosting the machines that operate the feeds -- but at the same time, there is the liberation of being able to dictate exactly what the feed is. In sky-feeder, I am envisioning different types of algorithmic modules that can be chained together with arbitrarily nested boolean logic operators. By building out the fundamental building blocks, I'm hoping that we can provide to engineers, and eventually non-technical folks, the ability to have basic building blocks to design their own rules - the ability to have hard-stop regular expression filters, or the ability to have loose "more like this" transformer-based vector similarity filters, or the ability to even define one's own probability based ML classifier trained on whatever arbitrary optimization strategy you could come up with.

I've created the first feed using this library as of this morning. Feeds in skyfeeder use declaratory manifest.json files that look like the following:

{
  "filter": {
    "and": [
      {
        "regex_matches": [
          {"var": "text"},
          "\\bimportant\\b"
        ]
      },
      {
        "regex_negation_matches": [
          {"var": "text"},
          "\\bunwanted_term\\b"
        ]
      },
      {
        "text_similarity": [
          {"var": "text"},
          {
            "model_name": "all-MiniLM-L6-v2",
            "anchor_text": "This is an important update"
          },
          ">=",
          0.3
        ]
      },
      {
        "model_probability": [
          {"model_name": "news_without_science_model"},
          ">=",
          0.9
        ]
      }
    ]
  },
  "models": [
    {
      "model_name": "news_without_science_model",
      "training_file": "prototype_labeled_dataset.json",
      "feature_modules": [
        {"type": "time_features"},
        {"type": "vectorizer", "model_name": "all-MiniLM-L6-v2"},
        {"type": "post_metadata"}
      ]
    }
  ],
  "author": {
      "username": "devingaffney.com",
      "password": "app-password"
  }
}

In this example, we provide for regular-expression based models, as well as sentence-transformer-based vector distance models, as well as the very primitive aspects of full-on ML models.

In my Big Grand Vision, we use this type of approach to give people full autonomy - people who are good at coming up with interesting machine learning models that optimize for "meaningful engagement," whatever that may end up looking like, can publish their packed-up models, and get ingested and deployed into anyone else's manifest. People can remix, share, modify, build upon, and publish their own feeds, and through radical transparency about recipes and democratized experimentation... maybe we can find a way out?

I don't know. Maybe none of this works! But it's worth a shot. It's at least a colorable path forwards. I'm excited to see what we can all come up with.