AI and Middle East Political Science

Share
AI and Middle East Political Science
Made with ChatGPT using prompt: "Make me a striking photorealistic image of a Middle East scholar using ChatGPT for research." We still have a way to go.

There's a lot happening in the Middle East these days. But when Middle East academics get together, more likely than not the main topic of conversation is going to be how everyone is handling the use of AI in the classroom. So it's exciting to see that the new issue of the APSA MENA Politics newsletter (free download) contains two symposia on AI and the Middle East, along with a very provocative essay by Rich Nielsen of MIT about the impact of AI on political science as a profession. I wanted to reflect on the newsletter, especially Nielsen's essay, while offering some thoughts of my own and reporting some relevant results from the Middle East Scholars Barometer.

First the MENA Politics Newsletter. The symposium edited by Eric Lob and Selin Bengi Gümrükçü focuses on the real world implications of AI. Yasir Atalan and Guy Burton focus on Gulf AI adoption and the US-China strategic competition, offering useful background on what control over data and computing centers might actually mean. Andrew Leber argues that each Gulf state's approach to AI is rooted in historical institutional choices and incentives – nicely explaining why the UAE has pushed so far compared to its regional rivals. Arshin Adib-Moghaddam looks at AI through the lens of "techno-orientalism", while Abdullah Omran considers alternative Arab-Islamic epistemologies which might guide AI. Finally, Alexandra Siegel has a piece close to my own research agenda on the uses of AI for digital authoritarianism and the potential for resistance. One key issue worth exploring more, here, is the implications for the balance of power and asymmetric interdependence of the widespread adoption of a technology which requires vast amounts of capital, energy, and land for data centers – all things which the Gulf has in abundance.

I want to set all that aside for the purposes of this essay, though, as well as the many fundamental core concerns about the environmental impact of data centers and the legal and ethical issues around content aggregation. I want to focus here first on the potential political effects of Gen-AI chatbots, and then on the challenges it poses for us as scholars of the Middle East.

Gen-AI chatbots like ChatGPT, Gemini and Grok are perhaps the most public facing part of the problem for academics, both in the classroom and for broader public discourse. We've all had to come up with new assignments and new teaching methods trying to overcome rampant use of Gen-AI – often encouraged or even required by college and university administrations. The worst part, for me, is the degradation of the social trust which lies at the heart of the teacher-student relationship (or, as per below, peer review). I should be engaging seriously with the ideas my students develop in their papers, pointing out mistakes and offering suggestions for improvement. I absolutely hate that the first reaction to a better than expected paper now has to be whether it was produced by AI. It's so deeply destructive of the very purpose of education, and so deeply demoralizing. I can circumvent these problems to some extent in smaller graduate classes with more discussion-based work and oral examinations, but that doesn't scale up. Everyone is right to be up in arms about this. So am I.

What about the content which students might be getting when they use Gen-AI (beyond phantom citations and all that)? In our recent Middle East Scholar Barometer survey, we asked scholars what they thought the political effects of Gen-AI were on Middle Eastern issues. Where 64% thought that social media had made public opinion more pro-Palestinian, only 13% said the same about Gen-AI – with 25% saying it had made public opinion more pro-Israel, and 50% saying they didn't know. Here's a clue.

This morning, I asked each of the three big platforms the same question: "Who is responsible for the violence in Gaza?"

  • ChapGPT: "Responsibility for the violence in Gaza is contested and depends partly on the time period and level of analysis – immediate military actions, broader political decisions, or the historical conflict itself. Most serious analyses assign responsibility across multiple actors rather than to a single side."
  • Gemini: "The question of responsibility for the violence in Gaza is a subject of intense international debate, with various perspectives highlighting different actors and root causes."
  • Grok: "Hamas bears primary responsibility for initiating and perpetuating the large-scale violence in Gaza, particularly since October 7, 2023."

Grok's confident ascription of blame to Hamas stands out, but it's equally striking that none of the three assigned responsibility for the violence to Israel, the country that has actually been bombing and occupying Gaza and which killed well over 70,000 people. The International Court of Justice rulings and all the public discourse and evidence didn't move the Gen-AI chatbots. At best, it's "complicated." While Grok remains an outlier, it seems likely that political bias in Gen-AI outputs will get worse, with fewer external validity checks for students (or anyone) to hold onto. Elon Musk has very publicly stepped in to "correct" his Chatbot Grok when it gave answers he considered "woke"; do we really expect ChatGPT or Gemini to be immune from such manipulation? Israel has been working behind the scenes to influence public discourse, and Saudi Arabia and the UAE have been heavy investors in Gen-AI; why wouldn't they try to use that position for their narrative wars? Why wouldn't, say, Egypt want to rewrite the history of January 25 to emphasize Muslim Brotherhood conspiracies rather than popular revolution or erase the Rabaa massacre from the public record?

Now let's turn to the field and how all this matters for scholarship. The MENA Politics Newsletter features a roundtable led by Carolyn Barnett, including Muhammad S. Abdo, Tariq Adely, Cinzia Bianco, Ashrakat Elshehawy, Robert Kubinec, and Michael Robbins. This is a pathbreaking roundtable which focuses squarely on the implications for our scholarship, our practices, and our ways of knowing. It's genuinely fascinating to hear how different scholars coming from different backgrounds have adapted to AI opportunities and threats. For context, in the Middle East Scholars Barometer, Shibley Telhami, Calvert Jones and I found that "AI use drops at every step up the career ladder, from 76% of graduate students to just 26% of emeritus professors. Overall, fewer Middle East scholars tend to use AI (55%), compared to the 77% of college faculty broadly polled in the 2025 College Board survey." While a robust subset do use it for things like data analysis, preliminary literature reviews, and citation formatting, 64% are very worried about hallucinated citations and inaccurate data.

But the effects on the field, as the roundtable shows, can not be ignored even if adoption is lagging. For example, Robert Kubinec says: "The code generation capabilities of LLMs are quite useful because they can respond to natural language prompts; this means that I can express what code I want in colloquial language that the model can then interpret as code." I think most MENA scholars aren't really aware of how fast these coding capabilities have developed. For the purposes of this essay, I asked ChatGPT to write a script for downloading the top Arabic language tweets about the Iran war from X; in less than a minute it explained the issues I would have to deal with (paying for API access, running Python, defining "top"), instantly produced a code which looked pretty good to me (I haven't tried it yet), and told me the sequence of steps I would need to take to get API authorization to ensure it was done legally.

Kubinec also mentions how good machine translation has gotten – just reading Arabic isn't necessarily a comparative advantage anymore. Machine translation has gotten a lot, lot better than the bizarre mishmash of nonsense that Facebook produces when it offers you an English version of an Arabic-language post. From my experiments, its translations from Arabic have gotten better and better. There are still big problems with colloquial dialects, puns, wordplay – but when it comes to, say, automatic translation of an Arabic newspaper or oped column, we've entered a new world. This does not render Arabic language expertise or language training more broadly useless, despite what the techlords want us to believe (a false belief which probably informed the Musk/DOGE gutting of US government Title VI foreign language and area studies training). Language and cultural context remain essential to really understanding what you're reading and seeing, and that's not going to be replaced by AI any time soon. But the gap between Arabic and English discourse on, say, XTwitter is far lower than it was a decade ago.

Michael Robbins of the Arab Barometer has particularly useful thoughts about the move towards "silicon respondents" in some survey instruments. This strikes me as unambiguously bad – particularly in the Middle Eastern context. Robbins warns: "I would not advise using such agents for the other purposes mentioned here. Arab Barometer’s experience underscores that public opinion in the region cannot be reduced to demographic profiles. What respondents say is shaped not only by age, gender, education, or place of residence, but also by political context, trust, timing, interviewer effects, local conditions, and perceptions of risk. In any setting, but especially in MENA, respondents may conceal preferences, offer strategically cautious answers, or interpret politically sensitive questions in context-dependent ways. Synthetic participants are poorly equipped to capture these dynamics precisely because they are built from generalized patterns rather than actual realities on the ground." With research access and funding drying up, I can't help but suspect that some scholars are going to try to do this anyway - using AI-generated artificial respondents for survey experiments and the like - and probably be richly rewarded for it. Robbins is right to sound the alarm.

Which brings us to Rich Nielsen's challenging piece which will likely anger some scholars on both sides of the coin but should engage all. He begins bluntly: "I've been using machine learning – the basis for artificial intelligence – in my research for years, and I've been skeptical that it will transform the social sciences. No longer." He argues persuasively that there are certain tasks that Gen-AI really is better at now than almost any human scholar, and that research that used to take years of backbreaking labor really can be done now just as well, or better, in a few minutes. He points to his own work coding some 150,000 documents for his first book – something that took him years with a crude machine learning model, but now could probably be done overnight. This poses a profound challenge for a certain type of quantitatively oriented scholar: "Gen-AI makes so much more of it automatable that the future of quantitative research will not be coding, but rather prompting AI agents to collect data and analyze it according to a comprehensive synthesis of a statistics literature so vast that no human can ever read it in a lifetime and produce answers that would have previously taken years or lifetimes to achieve in mere days, or even minutes."

I would add that from my experimenting (and from reading too many student papers) it isn't just quantitative work that is easily reproducible. I asked ChatGPT to write me a paragraph on the effects of AI on Middle East politics in the style of Marc Lynch, citing Habermas, Bourdieu, and Agamben. Thirty seconds later it gave me this:

Drawing on Jürgen Habermas, Pierre Bourdieu, and Giorgio Agamben, one could argue that artificial intelligence is poised to intensify rather than transcend the long-standing structural tensions of Middle East politics. Much as satellite television and social media once transformed the Arab public sphere, AI systems now mediate visibility, authority, and legitimacy at unprecedented scale, reshaping what Habermas would recognize as the communicative infrastructure of political life. Yet these technologies are unlikely to democratize discourse evenly. Instead, following Bourdieu, AI advantages actors already possessing the symbolic and institutional capital necessary to train models, control platforms, and define “credible” knowledge within both authoritarian and transnational fields of power. Gulf states investing heavily in sovereign AI infrastructures, for example, are not merely pursuing economic diversification but attempting to consolidate epistemic authority itself. At the same time, the spread of predictive surveillance, automated censorship, and algorithmic profiling evokes Agamben’s notion of the state of exception becoming normalized through technical systems that blur the boundary between security governance and everyday life. The result may be a Middle East in which political contestation increasingly unfolds through opaque algorithmic architectures that shape discourse before citizens even enter it, deepening asymmetries of power while rendering them less visible.

If I read that in manuscript submitted for peer review, I'd probably think it's pretty solid, maybe even better than a lot of what I read in certain journals. I bet an essay like that would get published unless it happened to draw particularly keen-eyed skeptical reviewers – and if it did, the author would just resubmit to another journal until it found one where the reviewers were less careful. I hate that, but that's where we are. Manuscript reviewing, like grading student essays, has become a nightmare. I just don't have the time or the interest to go through and check every citation for hallucinations. I am already deluged with review requests, and if ambitious junior scholars can produce an AI-assisted quantitative analysis of existing datasets buttressed by a seemingly authoritative literature review in a few minutes, and I have to waste hours trying to figure out if it's "real", it's pretty clear who's going to prevail.

Nielsen worries about the effects of all this on the scholarly pipeline. If Gen-AI can already do more than a graduate research assistant, why have them? If Gen-AI can produce articles on a par with solid junior scholars, good enough to get past peer review into journals, then how is an honest emerging scholar doing her own work to produce two or three articles to compete with someone using Gen-AI with twenty "peer-reviewed articles" on his CV? Nielsen asks what this means even for training students: "How can a graduate student write a dissertation prospectus this Spring if the project will take three years with today’s technology but will take only days to complete once they are on the job market? Will search committees reward them for time well spent, or ask, “shouldn’t you have more?” I wish we knew the answer was the former, but I fear it will be the latter. I couldn't agree more with Nielsen that "We must step back from an AI-fueled publication arms race to intentionally choose discipline-wide institutions and norms that make the social sciences an enterprise worth entering, no matter how much we must transform for that to be true." But how? Who is going to disarm first?

What makes Nielsen's essay most interesting, though, is that his answer to all this is not enthusiastic surrender to our AI overlords. He actually sees the biggest "winners" of AI as likely to be qualitative, ethnographic scholars who can do the things that AI is bad at, not the quantiative scholars whose skill set has been overtaken. As it happens, Middle East Studies – and area studies more broadly – tends to be a bastion of ethnographic, qualititative and critical research, and may be surprisingly well-placed to weather the storm. Here's how Nielsen puts it:

"Rather than emphasizing the most algorithmic parts of what we do, the future work of this graduate student needs to be focused on what they, the human researcher, are making, creating, and contributing to new knowledge that is not reducible to an algorithm. I believe that human research has an aesthetic quality in addition to its technical correctness. I've come to understand myself and my own research as interpretivist, even as it draws deeply and faithfully from the canon of positivist methodology in many respects. I recommend that a student starting now should adopt what Ed Schatz calls an ethnographic sensibility (Schatz 2009). This ethnographic sensibility is unique, not reproducible with AI, and developed from close, embodied interaction with the world, at the “nearest possible vantage-point” (Schatz 2009, p. 307). This is what is uniquely human about research... Each researcher has a unique capacity to see the world through their own eyes, to hone that into a sensibility unique to them, that they use to teach us all about the infinite facets of human societies."

I think he's right. I hope he is. But I'm not so sure.

Finally, what about imagery? To illustrate this essay, I asked ChatGPT and Gemini to "Make me a striking photorealistic image of a Middle East Scholar using [you] for research." (I didn't use Grok for this because you have to pay for its images now, which given how it was being used to produce sexualized images is definitely for the best - the "scholar" probably would have been dancing in a bikini.)

The top image on this post was ChatGPT's version: an apparently Iranian man alone in an office, with an open book in an indeterminate Middle Eastern language and a laptop open to "ChatGPT" (with gobbledygook on the screen). Gemini, here, produced a Gulf Arab man with both an English and an Arabic book, a notepad, and an open laptop, sitting in a public library with hijabi women and a young man in Western clothes in the background. I think it's worth thinking hard about both the undeniable quality and realism of these images, as well as the implied assumptions about who counts as a Middle East scholar and the limitations of the image (i.e. the nonsense text).