Well. Hello everyone.

So with all the nonsense happening on Twitter lately, I decided I’d give Mastodon a shot, like hundreds of thousands of others it seems. Except, even though I have a Twitter account, I hardly ever use it. I never really got into it.

Really, I just don’t do social media in general. I had a Facebook account too, used it for a while, but then gradually just…stopped. I had an unused MySpace back when that was a thing. Google+, too. Really, if you can think of a major social network, I’ve probably had an account there, but it never got used much. Perhaps the algorithms were to blame, I don’t know.

Well, none of that matters now. I’ve got a Mastodon account. In fact, I set up an entire Mastodon server: https://mastodon.ramble.moe. It’s a single-user instance for now, I might invite some close friends on later depending on how it goes. Let me regale you with the story of how I set up Mastodon, which took 3 days, and how that in a weird way led to the birth of this blog.

Let’s join the Fediverse!

I had heard about the Fediverse for a while, and I had been considering setting up an instance, or at least joining one, for most of that time. With all the hubub about Mastodon around the Internet, now seemed like a good time. So I started by examining my options.

https://fediverse.info/explore/projects

Huh. Well. Those are some options. I knew about PeerTube, but wasn’t interested in setting one up. I also knew about Mastodon, not just from the Twitter stuff happening now but from following Hacker News and Reddit for years now. Everything else? New to me. Time to do some research!

I hunted around a lot, Reddit was super useful here but there were blog posts and social media posts all over the place to help inform my decision. Eventually, I narrowed my options down to four, in order of preference, with a optional fifth:

Soapbox/Rebased - a fork of Pleroma and the most polished-looking from what I can find
Akkoma - a different fork of Pleroma with different features and better deployment strategies
Pleroma - An ActivityPub compliant piece of software that should be much more performant then Mastodon
Mastodon - The current poster child and most well-known
Pixelfed - Maybe I want to try an instagram-esque thing instead of microblogging?

Honorable mention goes to Misskey, which I ruled out at this point because I couldn’t find any servers where I could explore the public feed easily. Even if I did find one, being able to quickly rule an option out helped a bit. Admittedly, I do like Misskey’s themeing, but I can live without it.

First, I decided to explore Rebased. Let me just say - none of these tools have good deployment strategies. In fact, let’s take an aside to talk about my infrastructure!

8 ways to k8s

I deploy everything into Kubernetes. I do mean everything. None of my services actually run outside of Kubernetes, and this helps me keep track of where everything is. Even this blog is just a container in my Kube cluster!

I’ll make another post later about how terrible of an idea running self-hosted k8s is, but for now let’s just accept that’s the way things are.

This means that any service I choose must run in containers, and must be deployable into Kubernetes. I can’t go and manually install anything, I won’t “just spin up a VPS,” and I refuse to use a raspberry pi on this. With that in mind, let’s get back to the story.

Back onto my Soapbox.

I started by looking at the Soapbox installation docs. Those only describe installing in the “traditional” way, which doesn’t help me much. Time to search for some help!

This is when I learn an unfortunate truth. Apparently, the developers for both Pleroma and Rebased are problematic. I didn’t really go looking for posts by either of them, or start digging into their backgrounds at all. I can’t really prove this one way or another as a result. That doesn’t actually matter, though.

I might disagree with someone’s politics, I might disagree with their views. If they’re keeping their personal and professional lives separate, I typically don’t mind (within reason, of course). Here, though, there’s a problem. The development of a social network necessarily involves the intertangling of your personal and private life, at least to some extent.

In part because of this, a lot of servers will just straight block installs of Pleroma or Rebased, regardless of what’s on them. I don’t really want to deal with that, I want to be able to follow who I want and not have to worry about server admins defederating with me just because of my choice of software. Between that and what I’ve read about some of the devlopers, I mentally bump these down a few notches on my list.

That’s not enough to completely deter me though! Rebased is probably out, but some of the bigger instances, like mastodon.tech, run on Pleroma and are fine. Maybe that, or the Akkoma fork where the developers are less disliked in the community, will be good!

So I investigated. Turns out, Pleroma doesn’t quite work with containers (correctly) on it’s own because of Elixr. I ruled it out.

Akkoma supports docker! This was an exciting development, because Pleroma and the Akkoma fork both have better resource utilization than Mastodon and this means I should be able to deploy Akkoma in Kubernetes! Except, nope. If you go through those docs, you’ll see that this doesn’t quite work in a way that’s conducive to Kubernetes. I need to be able to just deploy and let it go, with no interaction. This might be possible with some effort, but I decided to move on to the next option instead.

Time to stop avoiding the Mastodon in the room.

I guess I’m using Mastodon after all. So, I needed to figure out how to deploy it into Kubernetes. Considering it’s meant to be “horizontally scalable,” this should be easy to do, right?

Well, “easy” is relative. After hunting around, I discovered that there’s an official helm chart for Mastodon. That’s great! Nothing else so far has had anything remotely close to that. They don’t make it easy to find–it’s not in the docs at all–but it’s there.

Getting the helm chart working was interesting in its own right. First, they don’t actually publish the chart, a sort of critical part of using Helm. It isn’t strictly necessary, but it makes life much, much easier if the chart is published somewhere officially. Luckily, one of the devs of the chart, Alex Nordlund, has a helm chart for it at https://charts.xd.cm/mastodon/. That helps, and gives us a starting point!

By the way, if you go this route, make sure you put Mastodon in its own namespace. I started out just using my default Kubernetes namespace (I know I know), and man, that was a mistake. This helm chart spins up a ton of pods. Sixteen of them in fact! It’s best practice just to have these things in their own namespace anyway, so just do yourself a favor and do it right from the start. Don’t be lazy like me.

Anyway, I run the helm install command, aaand… huh. It’s not spinning up postgres or redis, and the db-migrate job is failing as a result. After debugging this for a while, it turns out that I need to specify the storageClass for these two containers (and later I realize I need to do this for elasticsearch too). Okay! That’s done. Great, now all the containers are spinning up!

It’s still not working, though. What’s going on in the db-migrate that’s failing? Role 'mastodon' does not exist. Uh, isn’t the helm chart supposed to configure this for me?

Looks like I’ll have to take the Helm.

Everything else seems to be spinning up fine, so I don’t think the helm chart is handling postgres correctly. After banging my head on trying various permutations in the values file to get this working, and even pulling the helm chart down and manually bumping the dependency versions, I eventually give up on the postgres in this helm chart.

Fortunately, for another service I run in Kubernetes, I already run a postgres server. I’m familiar with a container that I know does the job: the one made by Sameer Naik. So, in the Mastodon namespace, I spin up just the postgres container using the README that Sameer has written as my guide. Finally, I appropriately update the values file to not spin up its own postgres and instead connect to this one.

After fighting a bit with DNS, it manages to connect! At this point, db-migrate is working! Yesss! Okay, now I just need to let it go and see if it works.

I let it bake for about a half-hour. I don’t know if it actually took this long, I just kinda set my laptop aside and went to do other things. When I came back to it, it looked like it finished up! Did it work? Did it not? I didn’t know yet, there was one more thing to do.

Now I just need to get Traefik, my reverse proxy, to allow Mastodon to work. This is pretty simple, really–all I need to do is configure an IngressRoute, along with an HTTPS redirect. I also need to set up a webfinger redirect as described in the docs, because I want my account to just be @ramble.moe, not @mastodon.ramble.moe.

So, I build the manifest, apply it, and try to go to https://mastodon.ramble.moe, aaand… Success! Time to play with Mastodon!

Why aren’t follows working‽

So, after an hour or so I notice none of the folks I followed are actually, well, followed. At least, they aren’t listed in my profile. Federating seems to be working though, as I can see their posts in my federated timeline. What now?

Mastodon, when you try to Follow someone, tells you that you are “making a follow request” and that it is “waiting for a response.” My first thought was “Oh, this is just how Mastodon works,” and I actually thought this was true for over 24 hours, while I kept playing with Mastodon! How embarrassing. After talking with someone and realizing “Wait, those folks with thousands of followers. No way are they approving each individual follow, that would be their whole life! Surely some, if not most, people just have some sort of auto-accept?” Instead of doing the sane thing and looking at the settings to see if this was possible, I started by looking at the sidekiq logs.

Sidekiq was saying that the remote servers which had people I was trying to follow were returning 401 error messages. Now why is that? After I search around for a bit, I discover that it while it could mean many things, one of the most common problems is an issue with the remote server accessing the webfinger service.

See, when a Mastodon server tries to find a specific user on another server, it uses a protocol called WebFinger to do so. WebFinger gets its name from the old finger service and command in the early days of networking. finger itself isn’t frequently used anymore for various reasons, and WebFinger is actually a fair bit different from the original finger. At their heart, though, they both can give you the same basic info: does this person exist at this location on the internet.

WebFinger does this by trying to hit a specific resource via HTTPS: /.well-known/webfinger. If that resource exists, whatever service is providing WebFinger, Mastodon in this case, will perform a user lookup and return whatever information is appropriate.

What’s important to my problem, though, is that remote Mastodon instances do not know about mastodon.ramble.moe. When I follow someone, all they see is a follow request by “@[email protected]”–not “@[email protected]”. What Mastodon is doing under the hood is taking my full account name and converting it to a standard HTTPS WebFinger request against the only domain it’s aware of - ramble.moe.

This is why the redirect was so important earlier. So why isn’t it working now?

I messed around with this for a bit before realizing that Traefik wasn’t actually performing the redirect. See, I told it to do a RegexRedirect for the webfinger endpoint, except I forgot a very, very important detail.

RegexRedirect is in Traefik what is called “Middleware”. That means that it handles the request after Traefik picks it up at the entrypoint, but before it sends it to the appropriate route, or endpoint. Traefik can’t actually perform “Middleware” tasks if there’s no pipeline for the request to go into, that is, if there’s no entrypoint and endpoint. The IngressRoute that’s configured needs to be able to configure all three things.

Until this point, ramble.moe didn’t actually exist as a service, only various subdomains on it existed. In order for WebFinger to work, that had to change. Just to get something going, I spun up the basic nginx container with nothing in it at all, so that Traefik can create the route.

It worked! My follows slowly came in as Sidekiq reprocesses the jobs. People can follow me now, too! And people actually are! Amazing, who would want to listen to me? I’m happy about it, though.

Mastodon now appears to be working.

What do I do with nginx now?

So now I’ve got an nginx container sitting there doing nothing. This isn’t good. In theory I could just spin up an IngressRoute pointing at a Service that doesn’t have any associated Pods, but where’s the fun in that? Not to mention that means I’ve got random bespoke objects lying around that are just kinda taking up space.

After thinking about it for a bit, I decided on a blog. I was going to do a landing page that might redirect to Mastodon, or maybe a links page to help me get around my infrastructure. The blog seemed like the best option, though.

I’ve tried blogging a few times before, and never really followed through. I was trying those old blogs on my “professional” website, so felt I had to stick to a high standard. I think this was ultimately just too limiting, and in the end I never felt anything was high enough quality to stick up there, or felt that I would be too biased about things one way or the other. There’s no way to separate that website from my identity either, the domain is my real name.

Over here though, I don’t feel quite as burdened by professionalism. It’s not like my real identity is too hard to find (though I would appreciate it if you didn’t try). I’ve actually linked this identity to my real name before, so I know going into this that I’m not truly anonymous. But there’s just that tidbit of separation, just enough psudeo-anonymity, that I think I’ll feel more comfortable making less “professional” and more “fun” posts.

I’ll still post about “professional” things from time to time, mostly tech stuff since I’m in the tech industry. I’ll stay away from anything directly related to my job though. This just doesn’t seem like the place for it.

Does this mean I’ll blog frequently? Almost certainly not. Does this mean I’ll keep it around? Only time will tell. Does this mean I might actually have fun doing it? Yeah, probably.

And so, here we are.

Welcome to my blog, born from my adventures in setting up Mastodon. A crazy experience to be sure, and certainly was not in my plans for this service. Heck, wasn’t even in my plans at all, considering my past experiences with blogging.

It’s all good though. I hope you enjoy whatever nonsense I’m rambling about whenever you stop by this way.

Find me on Mastodon: @[email protected].

Follow up.

I haven’t even published the blog yet and I already have a follow up!

While I was setting up this blog, my Mastodon instance fell over. I wasn’t too surprised, it’s a fairly brittle piece of software from what I can see. I was still annoyed, though.

Investigating, it appears the issue was now in Redis. Neither Sidekiq nor the Web container can talk with the Redis master, they’re both getting Connection Timed Out errors. My Redis master was spitting out a lot of errors that looked like:

1
* Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.

Well, that’s not good. This was, I presume, slowing redis down enough that the Bitnami scripts for liveness and readiness were failing, resulting in Kubernetes making the instance “NotReady”, resulting in the various Mastodon services crashing. So, let’s investigate why that was.

Working hard drives are important.

I noticed that Redis isn’t the only thing having issues. As I kept investigating, it turns out anything with I/O hitting persistent storage in Kubernetes is having problems. That’s suspicious, and take the heat off of Mastodon and Redis being the problem.

It turns out that for some reason my I/O speed has reduced drastically. I use NFS to mount persistent volumes off of an eight disk btrfs RAID10 filesystem. Each disk is a 7200 RPM 4TB drive, from varying manufacturers to reduce the liklihood of multiple drive failures.

This isn’t the first time I’ve experienced a drastic slowdown with I/O. Each time previously has been due to a drive failure in this filesystem, and I tend to experience about one per year. The problem is, I’m usually travelling when the failure happens and so I can’t deal with it right away.

This year is no different.

I haven’t been able to pinpoint which specific drive is failing. Three of the eight are suspicious, but only one of them is the true problem. The others are probably in a pre-failure state, so should likely be replaced soon as well, but can survive for a few months at least. Not that any of that matters right now–I’m thousands of miles away from my server, how the hell do I install a new drive into it?

Well, the answer is I can’t. In the meantime, despite the fact that I didn’t want to do this and said I wasn’t going to, I’m temporarily hosting both Mastodon and this blog on a VPS. Who knows, maybe I’ll prefer to have it there and won’t mind dropping the cash for it each month. I doubt it though.

That’s where we are today. I’m migrating a bunch of my more critical services away from the server for the time being. I’m not horribly concerned with the data, because I do have backups of everything important. As long as I can get some key services going again, including this Mastodon thing that I just got set up, then I’ll be able to survive my travels.

I hope.

Let’s join the Fediverse!#

8 ways to k8s#

Back onto my Soapbox.#

Time to stop avoiding the Mastodon in the room.#

Looks like I’ll have to take the Helm.#

Why aren’t follows working‽#

What do I do with nginx now?#

And so, here we are.#

Follow up.#

Working hard drives are important.#