The whole article is quite funny, especially the lists of most used tankie words, or the branding of foreignpolicy as a left-wing news source.
The whole article is quite funny, especially the lists of most used tankie words, or the branding of foreignpolicy as a left-wing news source.
I think I’ll actually review the paper. Because I think it’ll make a great use-case for the argument that you can’t automated-sentiment-analysis your way to a cogent political assessment of entire populations. No matter how hard you want to.
A review of the paper. I’ll try and update this as I go.
Abstract
Perhaps there is a reason that most of the research on extremism finds itself looking at right-wing examples.
“Relatively high toxicity” screams horseshoe theory. What and/or who the extremists are “being toxic” about matters, not merely that they “are toxic.” (Spoiler alert: far-left “extremists” score very high on being “toxic” about fascists and fascism; not exactly a novel observation)
Introduction
Comparing “increasing prevalence of populist parties and leaders” to “a steady rise in political rhetoric charcterizing mainstream political parties as far-left extremists” is not the comparison the authors think it is. “Actually existing far-right parties and leaders” aren’t in the same ballpark as “some people say that some other people are far-left.” Further, this doesn’t state where that political rhetoric is coming from. So I checked the sources:
Lo and behold, the “other side” of the far-right extremism coin is… the far-right complaining about the far-left.
That “or” is doing some heavy leg work to try and equivocate between “unreputable” and “overtly biased” sources. Let’s see what source 122 is about:
And some choice quotes from the article:
These are the only quotes in the source that could conceivably have some way of bolstering the claim that “many of the characteristics and behavior we associate with right-wing extremism online have historically applied to hardline left-wing extremists as well.” The first is the closest that comes to support. Alas, it doesn’t apply because “partisan liberals” aren’t far-left. The next two could only conceivably “apply” in a very hand-wavy “China = far-left” sense (which, as we’ll see later, the authors make liberal use of). The last is merely a re-stating of of the claim without supporting evidence.
Not a good start.
I think this might be a misprint? As in, it was supposed to read “despite the impact of left-wing online extremists.” Because structurally the sentence doesn’t make sense otherwise. And also, there is no citation given for “a history of violence and chaos attributed to far-left extremists” either. Which is odd, because there are examples you can dig for and cite within the United States, a la the Animal Liberation Front and the Earth Liberation Front.
The definition is crude but in the ballpark, excluding the “Stalinist” jab, given that Stalin died in 1953, the Hungarian uprising was in 1956, and Khrushchev was not at all a fan of his predecessor Stalin. Curiously, the authors already are aware of this distinction (Appendix C Misalignment Analysis):
Perhaps different parts of this paper were written in isolation by each of the authors. In any event…
Examining the sources:
That is actually a healthy listing of sources. I may or may not come back to review each of them in turn. I’ve been at this for several hours now :) (TODO)
Using “CCP” instead of “CPC” is a telling choice of terminology. One that they consistently use throughout the paper until they have to examine “tankie subreddits” specifically later, and find themselves needing to use the correct “CPC” version for misalignment analysis (Tables 4, 10), as well as:
Moving on…
I mean, at least the authors recognize that Russia is “non-socialist.” And it is true that socialists of varying stripes are against NATO, not just “tankies.”
Examining the sources:
These sources faithfully recount the fact that Marxist-Leninists (“tankies”) are not uncritically accepting NATO’s framing of the war. Using the Foreign Policy article as an example:
The article stretches hard to say that horseshoe theory is real and its basis is a yearning for populism, but it is a decent read at least for getting inside the mind of someone who considers themselves not on either end of an extreme. If nothing else, it does support the authors’ contention that “tankies” – though of course, other socialists as well – are anti-NATO. A contention that I don’t think anyone here would object to.
The “Uyghur genocide” narrative has been debunked ad naseum. Denying the “Uyghur genocide” is in no way comparable to denying actual genocide. But for the sake of completeness, user /u/ComradePubIvy has already taken a peek at the source:
And in the “NOTES” section of this book, here are the sources given for its preface:
And here are the first ten sources for its introduction:
RFA, SCMP, Zenz, et. al. Not exactly reliable sources.
Not sure what they mean by “scant literature that exists on tankies” when they just cited seven sources concerning the term’s etymology and history. Perhaps they mean scant literature on the evolved definition which they get into in section two (Background and Related Work). But regardless, this does sound like an interesting way to approach analyzing political communities within Reddit.
(emphasis added)
Hoo boy that’s not a good methodology. You’ll want to examine links made by users within one subreddit to another subreddit and weigh the edges accordingly. Otherwise, the only sampling you’re getting is from moderators and admins of the subreddits – seeing as they are the only ones with the ability to update the sidebar – and the only weight you’re getting is binary yes/no on links existing. That’s a start, I suppose, but you’re gonna have some heavy bias and skew in there.
This might be interesting, depending on how they measure engagement within individual subreddits to ascertain overlap.
And this is where the sentiment analysis will come in. These tools are notoriously flakey, but we’ll take a look at how they’ve been deployed, and how their limitations have been accounted for.
This could actually be interesting! Do specific users migrate over time in identifiable paths? E.g., “I was a liberal, then a Bernie supporter, then a Democratic Socialist, then a Marxist-Leninist.”
I’m guessing this is where Lemmy.ml and LemmyGrad.ml come in.
Background and Related Work
TODO
For $500k USD, you can get the low quality ArXiv article; for free, you can have this high quality teardown of said article.
Thank you for the amount of effort this took to put together. I’ve done only a quick skim but I’m going to give it a full read. Some stuff that definitely stood out to me is: the horseshoe theory nonsense; and the “rude words mean evil person” nonsense. Use of charged words or negative sentiment don’t make you bad or wrong; arguably, negative sentiment is the only rational response to a lot of the topics at hand.
But they used AI, and the Free one used Human Power
And they say commies are the lazy ones! This is great!
Background and Related Work
What is a “tankie?”
The first cluster of sources (along with source 43) is the same as earlier in Section 1 (I have yet to interrogate all of them; TODO, though I suspect the overall thrust of the sources will accurately characterize the history and etymology of the term “tankie”). Source 104 has also already been briefly examined and leans heavily on Zenz, Radio Free Asia, and South China Morning Post in the sources that were examined from it. The remaining sources (72, 10, and 44) are:
The article from Made In China Journal is from someone who appears to be a Maoist.
Consulting an author with opposing ideology (Maoism) to the ideology in question (“tankie”; more specifically here though, perhaps “Dengist”/“supporter of Reform and Opening Up”) is, charitably, an exercise in dialectical materialism of a sort, I suppose. Nevertheless, Marxist-Leninists do broadly support China’s policies, with varying degrees of enthusiasm or restraint, and Maoists broadly don’t. So that does at least offer some categorical boundaries for the authors to work with in forming cohorts around different far-left ideologies.
The article from The Diplomat, however, is a far less nuanced take:
There was no massacre of students in Tiananmen Square. There was certainly fighting in the streets --away from where the students in the square were-- and the CPC itself even lists the dead from this fighting at 241, a far cry from the “around 1,000 student protesters” given, both in terms of the number and in terms of who died.
Regardless, here, tankie is “young western supporters of communist authoritarian regimes.” This definition is, at best, orthogonal to the previous ones proffered. The article has some other choice bits:
The sources and claims either stand up to scrutiny or they don’t. That holds for all inquiry.
Again with citing those with opposing ideologies from the ideology in question. Though I suppose this does dovetail nicely with citing a Maoist.
Not a particularly objectionable definition to me, though also incredibly broad. From the introduction up until now, the paper has struggled to pin down what, precisely, constitutes a “tankie.” I’ll give the authors some slack, in that ideologies are fluid and dynamic things that, to some extent, certainly seem to intentionally defy neat categorization. And we can of course also recognize the nature of Contradiction more broadly and take a charitable overview of the authors’ frequent citation of an ideology’s opponents in coming to define it. No ideological framework can be entirely free of contradiction, after all. But that slack can also be used to hang oneself in later analysis. Specifically, I can think of two scenarios where that might happen:
At this point, I’m strongly suspicious of the second option having occurred at least, especially given the quality and ideological leanings of the sources cited so far.
Studies on Extremist Online Communities
Sources in question…
All three share a common author, Ryan Scrivens. If I were concerned with an unfair bias against right-wing extremism, I might dig into the networks of authors involved to root out that bias. And yet that doesn’t occur here. More to the point though, all three of these sources are focused on right-wing extremism. This undercuts their assertion in the next sentences:
Apart from being unsupported by the sources cited in the prior sentence, the sentences themselves are uncited. The citation given next:
also does not support a “both-sides” reading of left-wing and right-wing extremism, as the “overlap” in question is between stages of a pipeline within an ideological gradient, not between thoroughly contradictory ideological gradients.
If we have evidence of broad diversity across these two wings, and the strongest examples we have of left-wing and right-wing extremism being similar to each other is both sides saying “ISIS bad” and fighting against them, then perhaps that lends credence to the alternative answer: that the similarities either are not strong, or do not even exist.
Nothing particularly objectionable here. Social media is important for all political leanings, left and right, extreme and moderate.
So… the only study that could be found works against the notion that the right-wing and left-wing extremists are comparable.
This entire segment is, in itself, adequate explanation for the complaints of the next section: that there is “imbalance in research on online extremism.” There is imbalance because left-wing and right-wing extremists are not, in fact, isomorphic. There are differences that matter, and those differences inform where researchers spend their limited time, budget, and energy.
If I can help let me know, I am less than 4 pages in and this is already the worst paper I have ever read, somehow doing worse than the paper I read saying that renewables will never succeed because a solar field takes up more space than a coal powered power plant.
It’s a grift on supposed support for “data-driven” analyses. It’s just a specultive opinion piece. Its data handling and analysis is anything but academic.
“there are well documented limitations to our tools and methodology but as we can support the conclusions we wanted to make, we’re going to trust the data anyway”
Please, do! It gets so tiring seeing all of those AI data “scientists” believing they can say whatever they want about fields they know jack about because “the p-value is low.” They’re the biggest reason I decided to quit.
Are you saying a data science major lets me write a paper that calls theirs BS because the p-value is low?
Just train a doc2vec regression on a dataset of designated bad papers, then apply to their articles. If you disagree with the regression just stir it until it looks right.
Good idea! Maybe we can use ChatGPT to do the stirring so it’s still machine-generated
Since you are reading it, what is the ranking in the screenshot based on?
So far, on the arxiv page, no data or source code have been provided alongside the paper. I’d expect jupyter journals, or something like that at least, for reproducibility. Perhaps they will be added later or they are provided in a URL within the paper that I have not yet read.
In any case, the screenshot is of Table 11, and it is found in Appendix D, Domain Analysis:
Describing foreignpolicy.com as left-wing is an example of miscategorization by the authors, as is calling redsails.org a “Chinese far-left platform.” Neither of these are accurate statements, and they undercut trust that the authors are correctly and thoroughly labeling and interpreting their data. Between this and other glaring oversights in Table 12 – which purports that domains like “redditsave.com,” “ko-fi.com,” “twimg.com,” and “archive.is” are “representative domains of tankies” specifically and supposedly not heavily found in other similar far-left communities (as per the authors’ description of the Tf-Idf algorithm and their motivation for its use) – there is a compelling case that the authors (1) do not themselves possess a sufficient level of understanding of left-wing ideology – much less Marxist-Leninist ideology – to label it accurately, and (2) may have been sloppy with their data analysis (though this can’t be definitively known without access to the underlying datasets and analytics source code).
Majestic is described on the cited URL as: “The million domains we find with the most referring subnets.” Basically, of the 7,049 different domains contained in the 146,078 URLs the authors found in their crawl, remove any that are found in the top 1,000 domains as defined by Majestic. Domains like google.com, facebook.com, reddit.com (whether or not the authors recognize the potential problem with excluding that particular result from the table is unknown at this point; I have not finished reviewing the paper).
Thanks
What of the many rankings? The website ranking?
The one that places lemmygrad.ml on 4th. It’s in one of the screenshots in Roderick Day’s tweet.
The ammount of times the domain was shared on a tankie sub reddit, after removing the most common 1000 domains.
Then they apply some algorithm and post results and Lemmygrad becomes the 3rd most tankie site.
I thought it was based on how many times the website was mentioned. That’s how Reddit-save and foreign policy are too.