Anonymity - Stylography protection (Running a Local LLM and copy pasting messages) #13
Labels
No Label
/!\ On Priority - High Quality Tutorial
? Impossible Currently ?
Complex
Doable
Simple
To be improved / simplified / finished / fixed
pushed to prod (1 month external review)
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: nihilist/blog-contributions#13
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Assigned to pippin
price: 40 euros
deadline: 24th September
past the deadline, unassigning
see https://github.com/jasonacox/TinyLLM?tab=readme-ov-file#run-a-chatbot might be a good candidate
to be showcased :
-how to setup locally, in a whonix VM https://github.com/jasonacox/TinyLLM?tab=readme-ov-file#run-a-chatbot or https://github.com/lmstudio-ai (LMSTUDIO AI MAY complain about threads if you only give it 4 vCPUs >> not adapted to a VM setup ???)
--> IT CANNOT BE RAN OUTSIDE OF THE WHONIX VM! need to find a way.
-make it run a TINY model to reduce the cpu usage as much as possible https://lmstudio.ai/model/llama-3.2-1b-instruct ???
-how to use it, from within the whonix VM (and see if the performance sucks or not)
assigned to: WonderfulEpitome
price: 50 euros
deadline: 19th November
potential solution: g66ol3eb5ujdckzqqfmjsbpdjufmjd5nsgdipvxmsh7rckzlhywlzlqd.onion/post/c67d64ec4355ec872373
The Best Way to Evade Linguistic Analysis | llamafile Setup Guide
by /u/inadahime • 17 hours ago in /d/OpSec
The tutorial section of this post assumes a Linux-based operating system, and the presence of common programs like
curl
; however, because llamafile[1] runs on any operating system, you should be able to replicate this anywhere with relative ease!==========
The subject of stylometry, or linguistic analysis, has come up rather frequently here on Dread. However, in case you’re wondering:
Background // What is stylometry?
According to Wikitionary, stylometry is defined as: ”A statistical method of analyzing a text to determine its author.” This is synonymous with the term linguistic analysis. In practical use, stylometry is a form of OSINT where forensic analysis is applied to determine a text’s origin. As an example, presume you have an account both here and on Reddit, with the Reddit profile tied to your real-life identity in some way. Let’s assume your OpSec is bloody perfect; there’s no traditional way to correlate your clearnet identity to your Dread account. Here’s when stylometry comes into play. As I said, it’s a sort of OSINT strategy; meaning that LEA can scrape Dread and Reddit, thus building linguistic profiles for every member of a certain subreddit, especially ones that may be correlated to Dread use (/r/DreadAlert, /r/onions, et cetera). After scraping both platforms and building profiles for each user, they can then compare the profiles to find which Dread users have the most similar styles to which clearnet users. Now, even though all other areas of your OpSec are flawless, linguistic analysis can help LE narrow down the search for your clearnet account, especially if they start with a broader scope than the sole two subreddits I listed.
A practical example: linguistic analysis allowed the FBI to successfully acquire a search warrant for the Unabomber, Ted Kaczynski[2].
If you’re interested in the more technical aspects of stylometry, or how it works behind the scenes, this paper offers a good overview: “A Survey of Modern Authorship Attribution Methods” by Efstathios Stamatatos (2009)[3]
Methods of Obfuscation
Over the years, there have been a variety of ways to obfuscate one’s “true” writing style, but the important ones:
Local LLM use // Choice of Model, Software
Model
I’m going to focus on local LLMs. When the entire point of this process is data privacy and anonymity, the last thing you want is to send your data over to OpenAI.
Language models are categorised by parameter count, or how many “B” they have (where B means billion parameters). My favorite model for stylometric obfuscation is Gemma 2 2B IT Abliterated, which is an uncensored version of an incredibly small model developed by Google (See ref. [4] for GGUFs). When choosing your model, try to keep the parameter count below 7; and regardless of model, you need to get a GGUF (quantised) version of it - this format allows us to actually run the model. When picking your quantisation (GGUF), try to stay above “Q4.” For example, with gemma-2-2b-it-abliterated, I use the Q6_K version of the GGUF.
Software
The best software for running LLMs locally is llamafile[1]. This is a fork of another program, llama.cpp[5], with more optimisations and the capability to be a single executable that runs on nearly any operating system. In the rest of the post, I’ll expand upon llamafile and show you how to create a single program that takes your text input and provides an anonymised version of it as output.
llamafile Setup // CounterStylometry.llamafile
Now, let’s create a program that anonymises your text! For this section, it is assumed that the “GGUF” of your model is gemma-2-2b-it-abliterated-Q6_K.gguf. First, download (and then mark as executable) the
llamafile
andzipalign
executables from the llamafile release page on GitHub. In the terminal, you would do this like:Next, let’s write to a file named “.args” (without the quotation marks) with this exact content:
To remove the now-unneeded files:
Anonymising Your Text // Using CounterStylometry.llamafile
We're all done! Assuming you have executed the steps prior to this correctly, usage is simple! Just run the program in your terminal like this, and type your text in: