How “Israel” Developed an Arabic AI Model to Control Palestinians

The British newspaper The Guardian published a report by Harry Davies and Yuval Abraham, in which they stated that an Israeli military surveillance unit used a dataset of intercepted Palestinian communications to build an AI tool similar to ChatGPT, hoping it will revolutionize their spying capabilities.

A joint investigation with the 972 magazine and the Hebrew site Local Call revealed that Unit 8200 developed an AI model capable of understanding Arabic by using large amounts of phone and text conversations intercepted from the occupied Palestinian territories.

According to sources familiar with the project, the unit began developing the model to create an advanced tool similar to a chatbot, capable of answering questions about the people being monitored and providing insights into the massive amounts of surveillance data it collects.

Unit 8200, which compares its capabilities to those of the U.S. National Security Agency (NSA), accelerated its efforts to develop the system after the Israeli occupation’s attack on Gaza in October 2023. The model was still in the training phase during the second half of last year, and it is unclear whether it has been deployed yet.

The efforts to develop the large language model (LLM), an advanced learning system that generates human-like text, were partially revealed during a public lecture, which went unnoticed, given by a former military intelligence technology expert who supervised the project.

The former official, Shaked Roger Joseph Sidoff, said at an AI military intelligence conference in Tel Aviv last year: “We tried to create the largest possible dataset [and] gather all the data Israel ever obtained in Arabic.”

Sidoff added, “The model requires ‘massive’ amounts of data,” and three former intelligence officials familiar with the initiative confirmed the existence of the machine-learning program, sharing details about its construction.

Several other sources described how Unit 8200 had used smaller machine learning models in the years leading up to the launch of this ambitious project and the impact this technology had already made.

A source familiar with Unit 8200’s development of AI models in recent years said: “AI enhances power.” They added, “It’s not just about preventing gunfire attacks; I can track human rights activists, monitor Palestinian construction in Area C [of the West Bank], and I have more tools to know what everyone is doing in the West Bank.”

The report comments that the details about the size of the new model shed light on how Unit 8200 retains, on a large scale, intercepted communications content, enabling it to conduct thorough surveillance of Palestinian communications, as described by current and former Israeli and Western intelligence officials.

The project also shows how Unit 8200, like many intelligence agencies worldwide, seeks to leverage advancements in AI to conduct complex analytical tasks and understand the vast amounts of information it routinely collects—information that increasingly challenges human processing alone.

However, integrating systems like the large language model into intelligence analysis carries significant risks because these systems typically exacerbate biases and are prone to errors, as experts and human rights activists warn. Additionally, the opaque nature of AI may make it difficult to understand how conclusions generated by AI systems are arrived at.

Zack Campbell, a researcher in surveillance at Human Rights Watch, expressed concern that Unit 8200 might use AI programs to make critical decisions about the lives of Palestinians under military occupation. He said, “It’s a guessing machine, and ultimately these guesses could be used to criminalize people.”

A spokesperson for the Israeli military refused to disclose whether they use the large language model but said, “The army uses a variety of methods to identify and thwart the activities of hostile organizations in the Middle East.”

The report further indicates that Unit 8200 has developed several AI-powered methods in recent years. Systems like “Gospel” and “Lavender,” which were quickly integrated into combat operations during the war on Gaza, played an important role in Israeli military strikes by helping identify potential targets (both individuals and buildings) for deadly airstrikes.

The report continues, “For nearly a decade, the unit has also used AI to analyze intercepted and stored communications, using a series of machine learning models to sort information into pre-designated categories, identify patterns, and make predictions.”

It also highlights that when the Israeli military mobilized hundreds of thousands of reservists, a group of experienced officers in building large language models returned to the unit from the private sector. Some of them came from major American tech companies like Google, Meta, and Microsoft.

Google stated that “the work of its employees as reservists is ‘unrelated’ to the company,” while Meta and Microsoft declined to comment.

The newspaper quoted a source who said, “The small team of experts started building a large language model capable of understanding Arabic, but they had to essentially start from scratch after discovering that commercial and open-source Arabic models were trained using standard Arabic, used in formal communications, literature, and media, rather than spoken Arabic.”

One of the sources said, “There are no call transcripts or WhatsApp chats available online in sufficient quantity to train such a model.” They added that the challenge was “gathering all the spoken Arabic texts the unit has ever acquired and centralizing them.”

They stated that “the training data for the model ultimately consists of about 100 billion words.” A source familiar with the project told The Guardian that this vast quantity of communications includes conversations in both Lebanese and Palestinian dialects.

Other sources, according to the newspaper, said that “the unit also sought to train the model to understand specific military terminology used by armed groups. However, the process of gathering such massive training data appears to have involved collecting large amounts of communications with no intelligence value about the daily lives of Palestinians.”

They explained, “Unit 8200 is not the only intelligence agency experimenting with generative AI technology.” In the United States, the CIA has launched a ChatGPT-like tool to sift through information from open sources.

Similarly, intelligence agencies in the UK are developing their own AI programs, which are also said to be trained on datasets from open sources.

However, many former security officials in the U.S. and UK have said that “the Israeli intelligence community seems to be taking greater risks than its closest allies in integrating new AI-based systems into intelligence analysis.”

A former Western intelligence official said, “Israel’s extensive collection of Palestinian communication content has allowed it to use AI in ways that ‘would not be acceptable’ among intelligence agencies in countries with stricter oversight of surveillance powers and handling of sensitive personal data.”

Campbell from Human Rights Watch stated that “using surveillance materials to train the AI model is a human rights violation,” and that Israel, as an occupying power, is obligated to protect Palestinians’ privacy rights.

He added, “We’re talking about highly personal data taken from people who are not suspected of any crime and using it to train a tool that could then be used to criminalize people.”

Nadeem Nashif, director of “7amleh,” a Palestinian digital rights group, said, “Palestinians have become subjects in Israel’s lab for developing these technologies and weaponizing AI, all for the purpose of maintaining an apartheid and occupation system where these technologies are used to control a people and control their lives.”

According to the newspaper, “Above all, experts warn of the mistakes AI models could make. Brianna Rosen, a former senior official at the U.S. National Security Council, now a leading researcher at Oxford University, noted that while a ChatGPT-like tool could help intelligence analysts detect threats that humans might overlook, even before they appear, it also risks drawing false links and making erroneous conclusions.”

She said, “It is crucial that intelligence agencies using these tools can understand the logic behind the answers they generate.” She emphasized, “Mistakes will happen, and some of these mistakes could have very severe consequences.”

In February, the Associated Press reported that “intelligence officers may have used AI to help select a target in an Israeli airstrike on Gaza in November 2023, which killed four people, including three girls. A message reviewed by the news agency indicated that the airstrike was carried out by mistake.”

Sunna Files Free Newsletter - اشترك في جريدتنا المجانية

Stay updated with our latest reports, news, designs, and more by subscribing to our newsletter! Delivered straight to your inbox twice a month, our newsletter keeps you in the loop with the most important updates from our website