The whole article is quite funny, especially the lists of most used tankie words, or the branding of foreignpolicy as a left-wing news source.
The whole article is quite funny, especially the lists of most used tankie words, or the branding of foreignpolicy as a left-wing news source.
Since you are reading it, what is the ranking in the screenshot based on?
So far, on the arxiv page, no data or source code have been provided alongside the paper. I’d expect jupyter journals, or something like that at least, for reproducibility. Perhaps they will be added later or they are provided in a URL within the paper that I have not yet read.
In any case, the screenshot is of Table 11, and it is found in Appendix D, Domain Analysis:
Describing foreignpolicy.com as left-wing is an example of miscategorization by the authors, as is calling redsails.org a “Chinese far-left platform.” Neither of these are accurate statements, and they undercut trust that the authors are correctly and thoroughly labeling and interpreting their data. Between this and other glaring oversights in Table 12 – which purports that domains like “redditsave.com,” “ko-fi.com,” “twimg.com,” and “archive.is” are “representative domains of tankies” specifically and supposedly not heavily found in other similar far-left communities (as per the authors’ description of the Tf-Idf algorithm and their motivation for its use) – there is a compelling case that the authors (1) do not themselves possess a sufficient level of understanding of left-wing ideology – much less Marxist-Leninist ideology – to label it accurately, and (2) may have been sloppy with their data analysis (though this can’t be definitively known without access to the underlying datasets and analytics source code).
Majestic is described on the cited URL as: “The million domains we find with the most referring subnets.” Basically, of the 7,049 different domains contained in the 146,078 URLs the authors found in their crawl, remove any that are found in the top 1,000 domains as defined by Majestic. Domains like google.com, facebook.com, reddit.com (whether or not the authors recognize the potential problem with excluding that particular result from the table is unknown at this point; I have not finished reviewing the paper).
Thanks
What of the many rankings? The website ranking?
The one that places lemmygrad.ml on 4th. It’s in one of the screenshots in Roderick Day’s tweet.
The ammount of times the domain was shared on a tankie sub reddit, after removing the most common 1000 domains.
Then they apply some algorithm and post results and Lemmygrad becomes the 3rd most tankie site.
I thought it was based on how many times the website was mentioned. That’s how Reddit-save and foreign policy are too.