Stupid Bits From Waterfall's Code Over The Last Eighteen Months
I'm going to be honest, I know you're meant to use title case in titles but I don't actually know which words you're not meant to capitalise
I was dealing with a bug report in the Discord earlier and it hit me - all currently known bugs are artefacts of some code I wrote over a year ago and shoved in a file, never to be touched again.
Or, well, no, that's a little bit of an exaggeration. But not much - working on Waterfall has taught me a lot over the last year, so I thought, while I'm working on the API rewrite, it might be amusing to go over some of the more stupid bits of code I've written, the bad decisions, and just things that made me laugh.
All the dumb shit here is fixed in the API rewrite.
The bug report that spawned this post is this one, actually. Someone knows they've reblogged something, but the button never turns green. Annoying, since, other than XKit, we were the first Tumblelog site to implement that feature to my knowledge. Let's look at what could be the problem, shall we?
Notes are stored in their own table in the database, and it's currently over a million entries long. If something is meant to be displayed to a user, it comes from this table. In the case of detecting a reblog, it searches a combination - has [blog ID] reblogged [post source ID]? If yes, it'll be green.
The problem here arises from the simple fact that the notes table is not expected to be 100% consistent. In fact, it's the lowest priority part of the site - while obviously it's not desired, it's considered that if some notes don't get logged, that's fine. Avoid it if possible, but if it happens, no big deal. All the data to reconstruct the table is elsewhere anyway. We could nuked it and have every entry regenerated in less than an hour if we needed to.
And there's the problem. Searching the notes table is inefficient and stupid, especially since it's not consistent. What we SHOULD be doing is searching the Posts table, and asking whether that blog has reblogged a post with the same source post. About 5x faster, and significantly more accurate.
NSFW tags are mandatory. Always have been. Side ramble: I've started seeing a couple blogs putting "I define NSFW as..." in their blog descriptions. No, you're wrong, go and read the rules again.
Anyway, the code, right now, has hardcoded checks for specific tags. When we overhauled the tag system a few months ago, we pre-seeded two tags - "nsfw" and "NSFW". These are tag IDs 1 and 2 respectively (others were added to the database in the order they first appeared in posts - so the first actual tag on the site has an ID of 3). If you're logged out, a minor, or just have adult content turned off, the site runs a separate query to get posts for you that deliberately excludes posts tagged with that. Over time, the query has been amended to also include tag 938 in the query, because someone on mobile did "Nsfw" at some point.
You can see where this is going. Every tag variation, it needs to be added manually to the queries. The same is true of DNR and DNI tags - variations in casing keep appearing and have to be added.
This doesn't go into people who use the tag "nsfw gif" or some such, and not the plain nsfw tag - those are completely omitted from the queries and not caught. There's a really, REALLY obvious solution I missed here, and I'm upset it took me so long to realise it.
When the post is loaded, just. Have it check the first four letters of all the tags and see if they match "nsfw", and if it is, skip it and get another one if appropriate to do so. It's so simple. I'm so dumb. Writing this I realise I haven't gotten it set up in the API rewrite to use the same tactic for DNR and DNI yet. I'm an idiot.
3. Upload Limits
This was just dumb on my part. Most people think of megabytes as being 1000 kilobytes. In actuality, it's 1024 kilobytes. A gigabyte is 1024 megabytes, and a kilobyte is 1024 bytes, etc. When I added the upload limits, I completely forgot about this fact, as well as, apparently, the fact that I have a degree in computer science, and used 1000 as the figure.
This is fixed now, but there is still an issue relating to uploading stuff near the limit that's probably some weird base64 encoding issue. Oh yeah, speaking of.
This particular bug related to when you're editing a post rather than making a new one. If you uploaded a gif in the original and go to edit it, the GIF plays normally. But when you hit save, it turns into a static image at whatever frame it was on when you hit save.
And you know what? There's at least three more I should put here. I really want to. There's another dozen I could put here but aren't as interesting. But I've held my head in my hands three times already reading stuff, so I'll leave it there.
The point is - this has been one hell of a learning experience.