Tag Archives: data sequestration

Change text to "We The People"

We The Subjects — Plundering Health Data

When geneticist Jingyuan Fu heard that an artificial intelligence (AI) group in China had downloaded a large biomedical dataset her team built in Europe, she felt pride — and a jolt of unease. “We spent millions on that dataset,” says Fu, a professor of systems medicine at the University of Groningen in the Netherlands. “And the Chinese bought the whole thing for around €2,000.” In recent years, Fu’s group, like many others, has also begun using such data as feedstock for artificial intelligence. The AI group in China that downloaded her dataset had the same goal. “The Chinese wanted all our data,” Fu says. “And they also wanted our insights into how to mine it for AI development.”
From her perspective, today’s global scramble for biomedical data looks increasingly lopsided. “China has collected a huge amount of data,” she says. “But their own data sharing and openness is very limited.”… China already holds the largest data repositories, with 1.4 billion people using the WeChat app, many of whom are already connected to hospital databases for data integration, analysis and even healthcare delivery. “China also runs the largest number of clinical trials in the world generating massive drug response and real-world-evidence datasets.”

[A]fter decades of policies pushing ‘open science’, governments are now promoting ‘data sovereignty’ — the idea that sensitive datasets should remain under national control and foreign access should be conditional. [In Europe] the stance is defensive. [Europe] is embarrassed about having allowed Chinese AI developers to plunder European biomedical databases, even while China blocks foreign access to Chinese datasets. They are now belatedly closing international access to biomedical databases, after years of championing cross-border sharing…“According to the European Commission “there are currently no partnerships involving the sharing of such data with China or the United States for AI development”….

As of April 2025, the 2.5 petabytes of omics data in the US Cancer Genome Atlas Program database are now closed to Chinese researchers, and UK Biobank data, containing whole-genome and exome sequences for 500,000 people, is no longer internationally downloadable. UK Biobank data must now be analyzed on the Biobank’s own platform, which provides a cloud-based ‘reading room’ without allowing individual data downloads…In September 2025, the US National Institutes of Health issued new regulations for genomic data repositories and users aimed at “protecting Americans’ sensitive personal health-related data from misuse by foreign adversaries” while enhancing “the privacy and autonomy of research participants”….In December 2025, the US State Department launched its Pax Silica initiative, aimed at forming an international AI alliance that hedges against China. [Furthermore], data that are generated and held by hospitals, insurers, device makers, drug makers and data platform companies. are abundant For example, US-based electronic health records vendor Epic Systems Corporation manages records for over 300 million US patients and says that it has more than 150 AI features in development…

[But]AI models developed using sequestered datasets often ‘overfit’ to the specific demographics or clinical practices of their training environment…. “Without external, international validation, these biases are frequently only discovered after they have caused clinical harm,” …[For example] many high-performing AI tools for melanoma detection show a precipitous drop in accuracy when applied to darker skin tones. “Because major datasets are often skewed toward light-skinned northern European or North American populations, “these tools can misclassify malignant lesions as benign in under-represented groups.”

Excerpt from Paul Webster, Who Owns by Health Data?, Nature Medicine,  April 24, 2026