Developer / Potential Contributor Question: how to add a custom post/comment ranking algorithm to Lemmy?

CrashLoopBackOff@lemmy.ca · 12 days ago

What if the lack of comments were because comments weren’t proportionally representative?

Someone sees a discussion that interests them, so they see what the top comments are. But if the Hive Mind™ has spoken (even if just by awarding the top two or three comments to the same viewpoint), will they engage, or will they go somewhere else?

Remove the Hive Mind, and maybe you’ll get more engagement?

CrashLoopBackOff@lemmy.ca · 12 days ago

I was thinking of it as a drop-in replacement for “hot” just so that it doesn’t require any changes on the UI to implement. I’m a bit rusty with UI development, lol. The frontends wouldn’t have to add a new button, and the Lemmy API wouldn’t need to add a new sort type. That said, maybe that sort of thing is easy to do?

As far as it would work, Thiele’s elimination rules is computed roughly as follows (I’m assuming that only upvotes are counted; I haven’t considered yet if the process works if disapprovals count as a vote of “-1” or how the process could remain scalable if an abstention counts as a vote of “0.5”:

begin with the list of posts, list of users, and list of votes

# initial weighting, takes O(E)
for each post:
    for each vote on the post:
        lookup the user that voted on the post
        based on the number of votes the user has given, determine how much the user would be made "unhappy" if the current post was removed
        # the basic idea here is that if the user didn't vote for a post, then they won't care if its removed
        # if the user did vote for a post, but also voted for 100 others, then they probably won't care if one gets removed as long as 99 remain
        # if the user did vote for a post, but only voted for 2 or 1 others, then they'll care more if this one gets removed
        # if this is the only post the user voted for, then they'll care a lot if it gets removed
        # LiquidFeedback uses a formula of "1/r", where r is the total number of votes the user has given
        # as posts get removed, the votes get removed too, so surviving votes get more weight
        # for the sake of efficiency, I'll probably use a formula like "if r > 20 then 0 else 1/r" so that users only start to contribute weight to posts once they only have 20 approvals left. Replace 20 with a constant of your choice
        add the user's resistance to the post being removed to the post

# initial heap construction, takes O(C)
construct a min-heap of the posts based on the sum of the users' resistances to the post being removed

# iterative removal of posts
while posts remain in the heap: # O(C)
    remove the first post in the heap - this has the least resistance to this post being marked 'last' in the current set # O(logC)
    yield the removed post

    for each vote for the removed post: # in total, O(E) - every vote is iterated once, across the entire lifetime of the heap
        lookup the user that voted on the post
        compute this user's resistance to this post being removed
        remove this vote from the user
        based on the number of remaining votes the user has given, compute the user's resistance to the next post being removed
        compute how much the user's resistance to their next post being removed increased (let this be "resistance increase")
        if "resistance increase" is nonzero (based on my formula, this will happen whenever they have less than 20 votes remaining, but not if they have more than 20 votes remaining):
            for each vote for a different post by this user:
                increase the post resistance to removal by "resistance increase"
                perform an "increase_key" operation on the min-heap for this post # this will be O(logC)

               # worst-case, each user will perform 20 + 19 + 18 + ... "increase_key" operations - 
               # they only begin once there are 20 votes remaining 

               # when they have 20 votes remaining, they have 20 increase_key's to do
               # when they have 19 votes remaining, they have 19 increase_key's to do
               # etc.

               # because this is a constant, it doesn't contribute to the time complexity analysis.
               # so each user performs at worst a constant number of O(logC) operations
               # so the overall time complexity of the "increase_key" operations is O(VlogC)

For this algorithm, the yield the removed post statement will return the sorted posts in reverse order. So worst to best. You could also interpret that statement as “Give the post a rank in the final sorting of count(posts) - (i++)”.

Thiele says that process can be used to elect a committee of size N by stopping your removal when N votes remain. But because it’s a “house monotonic” process (electoral speak for "increasing the size of the committee by one and re-running an election is guaranteed not to cost any existing members their seat), I figure it could be repurposed to produce a ranking as well - the top one item is “best one”, the top two items are the best two, the top three are the best three, etc.

To make the above process work for approvals that decay over time, we’d just treat a decayed approval as a partial approval. I still have some work to do on how exactly to integrate partial approvals into the “resistance to removing each post” calculations without ruining my time complexity. But basically it’s a proportional score voting election instead of proportional approval.

CrashLoopBackOff@lemmy.ca · 12 days ago

Huh. TIL!

CrashLoopBackOff@lemmy.ca · edit-2 12 days ago

This is exactly the info I’m looking for, thanks! I knew there’d have to be some kind of scheduled task to recompute the rankings (IIRC the lemmy docs say ~10 minutes for recomputing ranks), I just wasn’t sure where it was.

The change that would require the least effort to implement my voting system (whether the lemmy maintainers would accept the change notwithstanding) would be to target the schedule task, and introduce a server-wide configuration option that would allow admins to pick whether they’re using Block Approval (what we have now) or Proportional Approval (what I’m introducing) based algorithms for their server’s “hot” algorithm. No API or frontent changes required. Then, work towards community mods being able to set that on a per-community basis.

Something for me to experiment with, anyway.

CrashLoopBackOff@lemmy.ca · 12 days ago

I don’t think it would work for my specific algorithm, unfortunately. To compute PAV, I need access to the “raw votes” of each individual user.

PAV doesn’t need to know the identity of the user behind each upvote, but it does need to be able to correlate which upvotes originated from the same user so that once a user is determined to be “satisfied” by having a comment they upvoted given a high rank, all of their other upvotes need to be deweighted for the remainder of the process to “make room” for other users’ opinions.

I checked the Lemmy API docs, and while that information is available at /api/v4/post/like/list and /api/v4/comment/like/list, so I could have a frontend that scraped every comment and then every like of every comment in a community (ignoring how inefficient that would be), but both of those endpoints are tagged as “Admin-only”.

Plus, even if I could do that, to compute the rankings my process does need the upvotes of every comment in a post (or every post in a community) before it knows which one is on top, which seems too much to put on a client.

CrashLoopBackOff@lemmy.ca · 12 days ago

Then the conversation would be dominated by whoever has the most time to make posts, which isn’t what I’m going for. I’d rather have a method that satisfies Proportional Representation.

CrashLoopBackOff@lemmy.ca · 12 days ago

Developer / Potential Contributor Question: how to add a custom post/comment ranking algorithm to Lemmy?

CrashLoopBackOff@lemmy.ca · edit-2 7 months ago

How to keep bots out without violating user privacy - something like Canada Post Identity+ ?