YSK: Your Lemmy activities (e.g. downvotes) are far from private
Edit: obligatory explanation (thanks mods for squaring me away)...
What you see via the UI isn't "all that exists". Unlike Reddit, where everything is a black box, there are a lot more eyeballs who can see "under the hood". Any instance admin, proper or rogue, gets a ton of information that users won't normally see. The attached example demonstrates that while users will only see upvote/downvote tallies, admins can see who actually performed those actions.
Edit: To clarify, not just YOUR instance admin gets this info. This is ANY instance admin across the Fediverse.
To anyone surprised at this: welcome to the fediverse, please treat everyhing you do or say as public.
The way to achieve privacy around here is by following the long forgotten arts of the old internet before Facebook was a thing: use a Nick name and don't tell strangers on the internet your real identity.
Your home instance will act as a proxy and only they have access to your email and IP address. That does stay private.
So, as long as you trust your home instance to not leak or disclose your connection or sign up data (which would be illegal in EU countries), just sign up with an alias.
A very positive aspects of this is that it should allow us to detect voting manipulation by correlating the activity of certain potentially malicious actors. If Lemmy instances take vote manipulation seriously and do their best to block bots this has the chance to make Lemmy / Kbin much more transparent and credible than Reddit ever was.
To illustrate op's point I'm going to spin up an instance, federate with everyone, and not tell anyone what that instance is.
Then I'm going to feed all that data into my new website, called Open Lemmy Stats, where anyone can query the user data ive accumulated. The homepage will be ripe with insights, leaderboards and all kinds of data on prolific users.
Additionally, I'll display a snapshot/profile of a random user by feeding that users data to GPT4 to make inferences about the user's political affiliations and display the results.
Worst of all, I'm not going to out my instance for everyone to know it as the one to defederate. In fact I'm spinning up a few instances that will host innocuous communities that I plan to mod and support to give my instances cover for their true purpose: redundant fediverse datastreams for my site, Open Lemmy Stats.
I'll also have a store where anyone can buy my collected fediverse data for a handsome sum.
Just kidding I'm not doing any of this. But someone absolutely will or already is.
People raise a good point that in countries where political dissent can actually be dangerous, this would very much dissuade people from voting on things they believe in, or even coming anywhere near Lemmy period.
A better approach I think would be to have the user's host instance save their votes (the database obviously needs to remember what you voted on), but when federating those votes with other instances just hand over a cumulative total, e.g., "here on vlemmy.net we have +18 votes for this comment", which the other instances can then add. There's no need to send user information with that data.
Reading these comments, seeing so many excuses, sarcastic responses, and handwaving, makes me realize a great deal of users really need to develop some imagination.
This is not about privacy. It's about data that can easily be used for targeting and profiling users, and how that creates countless avenues for targeted harassment and wide scale retaliation. It's about all of the innumerable ways public vote information can and will be abused to manipulate scoring across the site with targeted/automated shadow banning and shared blocklists. Raise your hand if you trust every single admin to never abuse such a tool to curate the outward appearance of an instance to fit a narrative.
For a different example: I could say something about how great Nazis are right now, and have a bot programmed to read every single person that downvoted me, add those names to a shared blocklist, and viola, I've made myself and all my alts invisible to the people that would challenge me on a massive scale.
I promise you this is going to be a big issue as tools for this site get more sophisticated over time.
Activities are public and easily viewable on kbin. It's been interesting. Seems mostly positive other than people harassing those who down-vote them demanding explanations.
There's something amusing about people feeling violated by their activity being made public, but not necessarily by corporations hoarding and capitalizing on that activity & data. I mean, one of them is out in the open. The other is pure abuse.
It's the only way to avoid double voting from the same account or to remove the reverse vote if one changes one's mind and votes the other way.
Did you think that it was any different on Reddit and that no random employee with access to their database could run a similar SQL query with a couple of joins and end up with nicknames, e-mails and IP addresses?!
Do you know who are the Reddit employees with access to their database or a copy of it? Have you had a chance to vet them? I don't think so.
At least here it's a bit more transparent.
The only shocking thing in this is that anybody is shocked by it.
Not to sound harsh or anything, but those of you saying that it's okay that all this data is public are insane. This completely goes against the entire philosophy of the Fediverse and FOSS in general. The reason we all are fleeing from Big Tech is because they collect so much data on us. At least, they keep it hidden from public view. This is a major issue in my opinion, and needs to be addressed ASAP before we claim to have superior platforms on the Fediverse. Why can't this data at least be encrypted?
I downvoted the beans and I don't care who knows about it. I'd do it again.
This is useful to know though, thanks. I guess assume everything is public short of your password (unless your admin is particularly nefarious and has altered the code to store passwords in plaintext for some reason).
Suppose there is someone who wants to maintain their anonymity and privacy on Lemmy so that it couldn't be tied to their real identity, what do you think is the best way to do that?
Hmm, I, famous Hollywood actress Margot Robbie and star of "Barbie", sure am stumped.
Isn't that kind of the point? You don't get very far hiding in a social setting. You're on a public website talking to other people. Your posts should be public, comments, etc. At least people should treat all websites or apps they didn't develop personally like they're public. I mean you don't really have a right to privacy in public.
And I'm not trying to say this with some malicious tone or anything but it's just my view on it.
At first I agreed with the general "whatever" sentiment. It has some important implications, however.
It discourages people from voting if they're concerned about other people seeing their activity. This could result in a lower quality of scoring for posts.
So when Threads decides to federate, they can slurp all this information.
That would be massively concerning and that should be blocked. Ideally votes should remain only on the current instance. Anything shared with other instances should be anonymised. This would need to be re-architected imho.
People come here to get away from Reddit now that trust has gone. Trust and a feeling of safety is vitally important to continue to build this platform.
So any instance admin can analyze all users upvotes/downvotes and possibly derive political standpoints, likes/dislikes, opinions and location data from it
Redditors already scream at people when they get a downvote and blame it on the person that replies to them, even if that person didn't downvote them.
I can see this being dangerous and leading to a lot of bullying. I know k-bin already publicly shows this. I can see who downvotes my comments/posts when I open up the post in a k-bin instance, without even being a member.
Couldn’t we just use a hash for the usernames instead?
Nothing too over the top, but just a simple hash and match that instead?
Also, there’s way too much trust in instances. Like, one person could easily make a post on lemmy.world, go on their personal instance, and just give themselves, say, 2000 upvotes.
Instances should have their own settings on what instances are allowed to keep a local copy. (Default behavior should be to get the post itself from the instance “hosting” it).
For me, it makes so much sense. Likes and dislikes, besides serving as a means of sorting posts and comments, also serve as a shortcut for leaving a comment saying, "This^" or "I disagree."
the comment_like database table in Lemmy also has a timestamp on it, "published" field, that discloses what time you voted. This reveals patterns of your Lemmy usage to other federated servers.
Can someone explain why r/privacy is so up in arms about this? Seems fairly obvious that my actions in the public domain are public, but they’re all “Lemmy doesn’t care about your privacy”. Why?
I would hope this would be obvious to anyone. If your client can highlight which posts you have upvoted in the web and app UI then the fact that your user specifically upvoted that post must be recoverable from the instance server and thus must be recoverable by the instance admins. I would not expect anything different.
I'm already questioning the whole system behind it, not just votes.
Say you have critical information that you want to delete but other instances can just ignore this deletion request, than I could technically write a plugin that uses an extra instance, to always display all deleted comments to me, despite me being a regular user.
For other sites you'd need a crawler, catching this information and all this in a rapid fashion to be usable, with a lot of programming extra work.
At this point we can as well remove the option to delete or edit a comment as everyone can host their own, which wouldn't be possible with proprietary tools.
If someone can simply see votes the same way, we can as well add a mouse hover function that will display the username of whoever upvoted.
Our data has never been 'invisible'... We've just trusted that places like Reddit and their staff will do the right thing. That's literally how it already works.
If you sign up for Reddit, Reddit staff can see your posts and votes if they want to.
If you sign up for a private forum the admin there can also see database contents.
One way encryption is not possible without stopping functionality... If data about you was encrypted then posts you make couldn't be displayed. If you include a means to decrypt then there was no point encrypting anyway.
This is how it's always been, and Lemmy doesn't change this status quo much.
A faceless corporation that has had access to your data is just replaced by a variety of admins distributed across instances.
This isn't a good or bad thing, the potential for abuse does exist, but when we have literally made agreements with places like Reddit that they can use and sell our data... then what difference does it make it an admin takes a peek?
It wouldn't be great... but nothing is perfect.
It's still worth working on however, to see if a better solution can be found, but at this time I'd say just be aware that it is possible that your data can be seen and understand the only safeguard against that if you need to communicate something private would be to use direct messaging with end to end encryption.
Sounds like a "non-issue" to me, really. That's kind of the point with the fediverse. If I run an instance, I have access to its database and, thus, everything stored in it. That was the case with old PHPBB forums, admins could see everything.
The questions is what ends up stored from outside my own instance. I haven't looked at the source, but I would hazard a guess that it's mostly some json blobs and/or pointers to users/instances.
I mean... you can get information accessing the database. Can anyone access the instance DBs? No. How would you know reddit doesn't log these in its database somewhere?
On it's own, it's not a problem IMO. Why would you want to show all information stored on the frontend? But, if you have to investigate something, it's not that bad you have stuff in your database that can help it.
Granted, if an admin is a shitface, they can look at these information. And then...? Make fun of downvoting people? Go to other instance and that's it.
It's not just upvotes and downvotes. Instance admin also knows your email and can store your password in plaintext if they want to. It's up to user to decide whether to trust the instance admin
I wonder what the GDPR implications of this is. As far as I understand, even free, privately run services are required to abide by GDPR and offer data insight and deletion. They're also required to state clearly what happens to user data.
Edit: Apparently people have varying takes and feelings on what the GDPR does and does not say, so I urge you to please read the summary of GDPR data privacy here: https://gdpr.eu/data-privacy/ as well as the summary of what constitutes personal data here: https://gdpr.eu/eu-gdpr-personal-data/ It's easier to have a good and fruitful discussion if we talk about what the GDPR actually says.
Admins can see literally everything. If you can see it (from your end, like whether you've upvoted something), it has to be stored somewhere and of course the server owners can see it
"unlike reddit" mm I'm sure they have RIGOROUS controls over which creepy staff / disgruntled plutocrats / repressive regimes get access to their voting database..
Is the poster's IP address, system, or other system identifier/location, tracked?
If I have users giantshortfacedbear and throwaway123. Then it could be inferred or impled that they are same person if there are from the same IP or phone.
The things I upvote and downvote are in line with my personal values and I am not ashamed of that. I have no issues with anyone knowing my reaction to a post. On Discord anyone can see who leaves reactions on a message. Same with Facebook. It will show you who added what reaction.
Out of curiosity, is there a particular set of circumstances where knowing how you voted on certain posts a bad thing? I would imagine that if you didn't want people to know you're voting/looking at specific posts, then you either don't vote/look at the posts, or you set yourself up an alt account on a different server. But let's be honest, if you'd be embarrassed by something you're looking at, maybe you shouldn't be looking at it. Just my 2¢.
Fully expected to be buried since I'm late to the party.
That's really only half of it, there is no real erasure possible when everyone's holding a cached copy. Personally... I kind of like it, I don't hold any value to the words I contribute here as long as they're for everyone.
But everything and everyone is living in concentric glass houses here.
That said, don't just call people out who downvote you. No one owes you an explanation if they thought your post was bad. I've already seen it once and it was pretty childish.
If you are doing anything tgat could get you in legal trouble on the internet, only use acounts that can not be linked to your real life identity, and always use tools like Tor. Do not depend on tools like private messages, private voting, etc. In those cases, there is always someone who can give you away, and service admins will give out information when the feds come knocking.
Back in my day everyone knew that once you put something on the internet it's there forever to be seen by all. Has everyone already forgotten this?
This is nothing new and in fact the way it's always been!
Now get off my lawn!
Shortly after joining I realized I was being a bit too honest on here lol. Can't help it. Haven't been on SM in a few days, in hiding from people, now back to my ditch to die. Love you!
For transparency, this is what a Like payload looks like. The first part is just context for the activitiypub protocol and is pretty much the same for each message. The second part contains the actual data of the message, and the most personal detail in it is the url of your own profile, and the url of the post/comment you like:
{
"@context": ["https://www.w3.org/ns/activitystreams", "https://w3id.org/security/v1",
{
"lemmy": "https://join-lemmy.org/ns#",
"litepub": "http://litepub.social/ns#",
"pt": "https://joinpeertube.org/ns#",
"sc": "http://schema.org/",
"ChatMessage": "litepub:ChatMessage",
"commentsEnabled": "pt:commentsEnabled",
"sensitive": "as:sensitive",
"matrixUserId": "lemmy:matrixUserId",
"postingRestrictedToMods": "lemmy:postingRestrictedToMods",
"removeData": "lemmy:removeData",
"stickied": "lemmy:stickied",
"moderators":
{
"@type": "@id",
"@id": "lemmy:moderators"
},
"expires": "as:endTime",
"distinguished": "lemmy:distinguished",
"language": "sc:inLanguage",
"identifier": "sc:identifier"
}],
"actor": "--URL OF THE USER PROFILE--",
"object": "--URL OF THE POST OR COMMENT--",
"type": "Like",
"id": "-- URL TO THE INSTANCE THAT PASSED THE MESSAGE--",
"audience": "-- URL TO THE COMMUNITY THE POST IS PART OF--"
}
Well, that's probably a wrong kind of 'open' to what FOSS means by 'open' yet I'm not convinced. With the whole 'anybody can make an instance and collect all the data they wan't it's kind of awkward and messy. How much of the said data you can obscure/encode without losing the openness between instances?
Because if one instance can't verify actions of another then you have an issue dealing with bots and overall the platform becomes way more obscure and less reliable as a source of information.
And like if the buttons themselves had an ability to openly show who upvoted/downvoted a post - how much of a difference would've been here? I don't feel like it's such a concern.
The point about deletion/edits - it's not about removing your info from the internet, it's about correcting what's wrong for the sake of providing correct. If it's on the internet once it's there forever. I don't see people complaining about weyback archive doing their thing. Yet it's doing exactly the same thing possibility of which upsets so many people here.
If you monkey brain posted you home address and where the keys are - it's on you, not on the internet for storing the info.
The only real point I see here is corporations/governments scraping all this data for their use. Yet as long as they can federate there's nothing much to do and if you try to restrict federation then it's just a bunch of forums with extra features.
Obviously, this isn't ideal. But this isn't as damning as some of the other commenters believe.
The way reddit operates, is that they are "trusted" with all our data. They can (and do), sell any data they like, to whomever they like. They store much more information than simply who upvoted what. They can't simply allow upvotes with no claimant, they'd have no way of stopping or identifying bots or illegitimate upvotes.
This system is not ideal, but it's also not necessarily worse. We're still operating under that system, the only real difference is, we get to choose who that trusted party is. We get to move instances if the hosters interests become misaligned with our own.
Ultimately, there needs to be a smart solution to this problem to ensure it's not abused. We can't completely remove collection of the data, otherwise upvotes will be meaningless and hijacked by agendas. We can't simply encrypt the data, if there's a genuine use for it (which we've discussed), who SHOULD be allowed to decrypt it?
I completely understand the concern, and I share it. But this isn't an issue so much with Lemmy, it's an issue with upvotes on distributed social media.
Edit: Okay, ANY instance admin is where the issue lies. That much I agree with.
Just muddling around I've built queries that:
(a) list all of my post & comments, everybody who voted on them, and their votes
(b) tally how many times specific users have upvoted or downvoted me.
(c) identifies the most prolific voters across the Fediverse and the communities they are voting in
(d) identifies users with the same username or display name across all instances and correlates the activities across those accounts.
These are all for the sake of learning and are innocuos the way I'm using them. It is plain to see that someone with skills and an agenda could make more out of it than I have.
I'm fine with it too. Don't think I'd be here if I wasn't okay with sharing these sort of things. If I wanted privacy for my upvotes or downvotes (why tho?), I'd do it anonymously.
And yeah, I upvoted the beans as well. Ate beans 90% of the time as a student. Still farting from it 20 years later.
I think this is to be expected - some instances have downvotes disabled but that doesn't seem to be the rule of thumb.
There are quite a few questions about data retention, usage, retrieval, compliance and how it is shared which will need to be addressed as the platform grows.
I agree that this is a good fit for YSK, however, I think it's important to keep in mind that privacy isn't a main goal of the system. It's designed to distribute the cost and responsibility and be difficult to take down or influence as a whole network, but it does not appear to be designed to hide user activities.
In fact, I propose that we keep this information publicly listed so that users are under no illusion that their interaction with Lemmy is private. Transparency and communication prevents misunderstandings.
Good data if you're trying to find the homophobes and transphobes who think they're "infiltrating" and voting down every single one of those posts. They out themselves.
Wait, is there a granular way to give access to my information? Like say I don't mind people seeing my comment history but would like to hide what posts and comments I upvote and downvote.
If you ask me, I'd make upvotes/downvotes public overall. Always hated how on reddit some miserable people downvote lots of innocent stuff, hiding behind their anonymity.
Lemmy & Reddit are public discussion platforms, everything you do here should be public, it's not like you use them to store private information.
Is it just user activity that's public? Curious to know about what is preserved on the backend, like if user removed posts/etc get stored somewhere accessible like this too.
Yes ... That's how social networking works. ANY site you go to will have this much info if not more since most "social networks" want YOU. Your personal info etc. Lemmy is just a username attached to posts and comments. So in a way it's actually less than other networks like meta for instance
Well of course. The instance stores all data in a postgres database. How else will it be able to remember anything?
Maybe this is not obvious to non-programmers but you never see everything in the user interface for any system. There are tons of records needed for the system to track everything that goes on.
Since posts are federated, they will exist in the local db as well as on each instance.
Does anybody know if your subscriptions can be seen by admins of other instances? It doesn't seem like that information would need to be shared, but maybe it is anyway.
This is what lemmy.world tells me when I want to delete my account:
"Warning: this will permanently delete all of your data from this instance. Your data may not be deleted on other, existing instances. Enter your password to confirm."
Edit: So if we want to own our data we should only post, comment and vote within our own instance or just keep in mind that whatever we do on other instances might be there indefinitely.
I don't mind this, but what about my email, is that also publicly available? What about my password? I had to give my email to confirm my sigup to this instance. It would be pretty shitty if my email was up for grabs now. Think of the poor idiots who use the same password for every service they use.
I think this is a good conversation to have, I'm assuming there are no security checks to make sure instances connecting to each other are legitimately released and code reviewed by the community? I'm also curious if you could run a malicious instance that garners a lot more information from your users than is necessary or uses security holes to gather information from other instances. This could send this entire experiment down the toilet very fast. For instance HTTPS guarantees you are connecting to who they say they are and are from a trusted source. At the very least it would be nice to be able to have control over your credentials and history, and only release it to trusted instances.
Good point, it's really easy to categorize different users by their voting habits alone. There's actually a platform that does this called meneane.net, and it isn't like it has burned down completely. It is more data available for marketing or worse in regards to Big Data, but when available to the users, it does allow you to see who's going on downvoting sprees and see from what bubble they are coming from. In Meneame, some people have resorted to using separate accounts just to downvote from.
Besides that, it's not that bad, just imagine yourself making a comment that says "I'm downvoting you", and honestly, everyone should try being able to view that information.
I have actually been really surprised by the amount of anti free speech and anti privacy attitudes that I have seen since joining Lemmy. It seems that a lot of the people that made Reddit the shit hole that it was, are the ones who have been early adopters of Lemmy.
God I miss Voat, that was true free speech with a heavy emphasis on privacy.