Tag Archives: Data-Driven Discovery of Models (D3M)

Who Cares? Clicking Away Privacy Rights

The latest developments in a high-profile criminal probe by  US special counsel John Durham show the extent to which the world’s internet traffic is being monitored by a coterie of network researchers and security experts inside and outside the US government. The monitoring is made possible by little-scrutinized partnerships, both informal and formal, among cybersecurity companies, telecommunications providers and government agencies.

The U.S. government is obtaining bulk data about network usage, according to federal contracting documents and people familiar with the matter, and has fought disclosure about such activities. Academic and independent researchers are sometimes tapped to look at data and share any findings with the government without warrants or judicial authorization…

Unlike the disclosures by former intelligence contractor Edward Snowden from nearly a decade ago, which revealed U.S. intelligence programs that relied on covert access to private data streams, the sharing of internet records highlighted by Mr. Durham’s probe concerns commercial information that is often being shared with or sold to the government in bulk. Such data sets can possess enormous intelligence value, according to current and former government officials and cybersecurity experts, especially as the power of computers to derive insights from massive data sets has grown in recent years.

Such network data can help governments and companies detect and counter cyberattacks. But that capability also has privacy implications, despite assurances from researchers that most of the data can’t be traced back to individuals or organizations.

At issue are several kinds of internet logs showing the connections between computers, typically collected on networking devices such as switches or routers. They are the rough internet equivalent of logs of phone calls—showing which computers are connecting and when, but not necessarily revealing anything about the content of the transmissions. Modern smartphones and computers generate thousands of such logs a day just by browsing the web or using consumer apps…

“A question worth asking is: Who has access to large pools of telecommunications metadata, such as DNS records, and under what circumstances can those be shared with the government?…Surveillance takes the path of least resistance…,” according to Julian Sanchez, a senior fellow at the Cato Institute.

Excerpts from Byron Tau et al., Probe Reveals Unregulated Access to Data Streams, WSJ, Feb.. 28, 2022

Addictive Ads and Digital Dignity

Social-media firms make almost all their money from advertising. This pushes them to collect as much user data as possible, the better to target ads. Critics call this “surveillance capitalism”. It also gives them every reason to make their services as addictive as possible, so users watch more ads…

The new owner could turn TikTok from a social-media service to a digital commonwealth, governed by a set of rules akin to a constitution with its own checks and balances. User councils (a legislature, if you will) could have a say in writing guidelines for content moderation. Management (the executive branch) would be obliged to follow due process. And people who felt their posts had been wrongfully taken down could appeal to an independent arbiter (the judiciary). Facebook has toyed with platform constitutionalism now has an “oversight board” to hear user appeals…

Why would any company limit itself this way? For one thing, it is what some firms say they want. Microsoft in particular claims to be a responsible tech giant. In January  2020 its chief executive, Satya Nadella, told fellow plutocrats in Davos about the need for “data dignity”—ie, granting users more control over their data and a bigger share of the value these data create…Governments increasingly concur. In its Digital Services Act, to be unveiled in 2020, the European Union is likely to demand transparency and due process from social-media platforms…In the United States, Andrew Yang, a former Democratic presidential candidate, has launched a campaign to get online firms to pay users a “digital dividend”. Getting ahead of such ideas makes more sense than re-engineering platforms later to comply.

Excerpt from: Reconstituted: Schumpeter, Economist, Sept 5, 2020

See also Utilities for Democracy: WHY AND HOW THE ALGORITHMIC
INFRASTRUCTURE OF FACEBOOK AND GOOGLE MUST BE REGULATED
(2020)

An Army of Virtual Data Scientists

Popular search engines are great at finding answers for point-of-fact questions like the elevation of Mount Everest or current movies running at local theaters. They are not, however, very good at answering what-if or predictive questions—questions that depend on multiple variables, such as “What influences the stock market?” or “What are the major drivers of environmental stability?” In many cases that shortcoming is not for lack of relevant data. Rather, what’s missing are empirical models of complex processes that influence the behavior and impact of those data elements. In a world in which scientists, policymakers and others are awash in data, the inability to construct reliable models that can deliver insights from that raw information has become an acute limitation for planners.

To free researchers from the tedium and limits of having to design their own empirical models, DARPA today launched its Data-Driven Discovery of Models (D3M) program. The goal of D3M is to help overcome the data-science expertise gap by enabling non-experts to construct complex empirical models through automation of large parts of the model-creation process. If successful, researchers using D3M tools will effectively have access to an army of “virtual data scientists.”..

“We have an urgent need to develop machine-based modeling for users with no data-science background. We believe it’s possible to automate certain aspects of data science, and specifically to have machines learn from prior example how to construct new models.”..

Some experts project deficits of 140,000 to 190,000 data scientists worldwide in 2016 alone, and increasing shortfalls in coming years. Also, because the process to build empirical models is so manual, their relative sophistication and value is often limited.

Excerpts from DARPA Goes “Meta” with Machine Learning for Machine Learning Data-Driven Discovery of Models (D3M) seeks to increase pace of scientific discovery and improve military planning, logistics and intelligence outcomes,  DARPA Website, June 17, 2016