Tag Archives: data science

Who Cares? Clicking Away Privacy Rights

The latest developments in a high-profile criminal probe by  US special counsel John Durham show the extent to which the world’s internet traffic is being monitored by a coterie of network researchers and security experts inside and outside the US government. The monitoring is made possible by little-scrutinized partnerships, both informal and formal, among cybersecurity companies, telecommunications providers and government agencies.

The U.S. government is obtaining bulk data about network usage, according to federal contracting documents and people familiar with the matter, and has fought disclosure about such activities. Academic and independent researchers are sometimes tapped to look at data and share any findings with the government without warrants or judicial authorization…

Unlike the disclosures by former intelligence contractor Edward Snowden from nearly a decade ago, which revealed U.S. intelligence programs that relied on covert access to private data streams, the sharing of internet records highlighted by Mr. Durham’s probe concerns commercial information that is often being shared with or sold to the government in bulk. Such data sets can possess enormous intelligence value, according to current and former government officials and cybersecurity experts, especially as the power of computers to derive insights from massive data sets has grown in recent years.

Such network data can help governments and companies detect and counter cyberattacks. But that capability also has privacy implications, despite assurances from researchers that most of the data can’t be traced back to individuals or organizations.

At issue are several kinds of internet logs showing the connections between computers, typically collected on networking devices such as switches or routers. They are the rough internet equivalent of logs of phone calls—showing which computers are connecting and when, but not necessarily revealing anything about the content of the transmissions. Modern smartphones and computers generate thousands of such logs a day just by browsing the web or using consumer apps…

“A question worth asking is: Who has access to large pools of telecommunications metadata, such as DNS records, and under what circumstances can those be shared with the government?…Surveillance takes the path of least resistance…,” according to Julian Sanchez, a senior fellow at the Cato Institute.

Excerpts from Byron Tau et al., Probe Reveals Unregulated Access to Data Streams, WSJ, Feb.. 28, 2022

Who Owns Your Voice? Grabbing Biometric Data

Increasingly sophisticated technology that detects nuances in sound inaudible to humans is capturing clues about people’s likely locations, medical conditions and even physical features.Law-enforcement agencies are turning to those clues from the human voice to help sketch the faces of suspects. Banks are using them to catch scammers trying to imitate their customers on the phone, and doctors are using such data to detect the onset of dementia or depression.  That has… raised fresh privacy concerns, as consumers’ biometric data is harnessed in novel ways.

“People have known that voice carries information for centuries,” said Rita Singh, a voice and machine-learning researcher at Carnegie Mellon University who receives funding from the Department of Homeland Security…Ms. Singh measures dozens of voice-quality features—such as raspiness or tremor—that relate to the inside of a person’s vocal tract and how an individual voice is produced. She detects so-called microvolumes of air that help create the sound waves that make up the human voice. The way they resonate in the vocal tract, along with other voice characteristics, provides clues on a person’s skull structure, height, weight and physical surroundings, she said.

Nuance’s voice-biometric and recognition software is designed to detect the gender, age and linguistic background of callers and whether a voice is synthetic or recorded. It helped one bank determine that a single person was responsible for tens of millions of dollars of theft, or 18% of the fraud the firm encountered in a year, said Brett Beranek, general manager of Nuance’s security and biometrics business.

Audio data from customer-service calls is also combined with information on how consumers typically interact with mobile apps and devices, said Howard Edelstein, chairman of behavioral biometric company Biocatch. The company can detect the cadence and pressure of swipes and taps on a smartphone.  How a person holds a smartphone gives clues about their age, for example, allowing a financial firm to compare the age of the normal account user to the age of the caller…

If such data collected by a company were improperly sold or hacked, some fear recovering from identity theft could be even harder because physical features are innate and irreplaceable.

Sarah Krouse, What Your Voice Reveals About You, WSJ, Aug. 13, 2019

Who Controls Peoples’ Data?

The McKinsey Global Institute estimates that cross-border flows of goods, services and data added 10 per cent to global gross domestic product in the decade to 2015, with data providing a third of that increase. That share of the contribution seems likely to rise: conventional trade has slowed sharply, while digital flows have surged. Yet as the whole economy becomes more information-intensive — even heavy industries such as oil and gas are becoming data-driven — the cost of blocking those flows increases…

Yet that is precisely what is happening. Governments have sharply increased “data localisation” measures requiring information to be held in servers inside individual countries. The European Centre for International Political Economy, a think-tank, calculates that in the decade to 2016, the number of significant data localisation measures in the world’s large economies nearly tripled from 31 to 84.

Even in advanced economies, exporting data on individuals is heavily restricted because of privacy concerns, which have been highlighted by the Facebook/ Cambridge Analytica scandal. Many EU countries have curbs on moving personal data even to other member states. Studies for the Global Commission on Internet Governance, an independent research project, estimates that current constraints — such as restrictions on moving data on banking, gambling and tax records — reduces EU GDP by half a per cent.

In China, the champion data localiser, restrictions are even more severe. As well as long-established controls over technology transfer and state surveillance of the population, such measures form part of its interventionist “ Made in China 2025 ” industrial strategy, designed to make it a world leader in tech-heavy sectors such as artificial intelligence and robotics.

China’s Great Firewall has long blocked most foreign web applications, and a cyber security law passed in 2016 also imposed rules against exporting personal information, forcing companies including Apple and LinkedIn to hold information on Chinese users on local servers. Beijing has also given itself a variety of powers to block the export of “important data” on grounds of reducing vaguely defined economic, scientific or technological risks to national security or the public interest.   “The likelihood that any company operating in China will find itself in a legal blind spot where it can freely transfer commercial or business data outside the country is less than 1 per cent,” says ECIPE director Hosuk Lee-Makiyama….

Other emerging markets, such as Russia, India, Indonesia and Vietnam, are also leading data localisers. Russia has blocked LinkedIn from operating there after it refused to transfer data on Russian users to local servers.

Business organisations including the US Chamber of Commerce want rules to restrain what they call “digital protectionism”. But data trade experts point to a serious hole in global governance, with a coherent approach prevented by different philosophies between the big trading powers. Susan Aaronson, a trade academic at George Washington University in Washington, DC, says: “There are currently three powers — the EU, the US and China — in the process of creating separate data realms.”

The most obvious way to protect international flows of data is in trade deals — whether multilateral, regional or bilateral. Yet only the World Trade Organization laws governing data flows predate the internet and have not been thoroughly tested through litigation. It recently recruited Alibaba co-founder Jack Ma to front an ecommerce initiative, but officials involved admit it is unlikely to produce anything concrete for a long time. In any case, Prof Aaronson says: “While data has traditionally been addressed in trade deals as an ecommerce issue, it goes far wider than that.”

The internet has always been regarded by pioneers and campaigners as a decentralised, self-regulating community. Activists have tended to regard government intervention with suspicion, except for its role in protecting personal data, and many are wary of legislation to enable data flows.  “While we support the approach of preventing data localisation, we need to balance that against other rights such as data protection, cyber security and consumer rights,” says Jeremy Malcolm, senior global policy analyst at the Electronic Frontier Foundation, a campaign for internet freedom…

Europe has traditionally had a very different philosophy towards data and privacy than the US. In Germany, for instance, public opinion tends to support strict privacy laws — usually attributed to lingering memories of surveillance by the Stasi secret police in East Germany. The EU’s new General Data Protection Regulation (GDPR), which comes into force on May 25, 2018 imposes a long list of requirements on companies processing personal data on pain of fines that could total as much as 4 per cent of annual turnover….But trade experts warn that the GDPR is very cautiously written, with a blanket exemption for measures claiming to protect privacy. Mr Lee-Makiyama says: “The EU text will essentially provide no meaningful restriction on countries wanting to practice data localisation.”

Against this political backdrop, the prospects for broad and binding international rules on data flow are dim. …In the battle for dominance over setting rules for commerce, the EU and US often adopt contrasting approaches.  While the US often tries to export its product standards in trade diplomacy, the EU tends to write rules for itself and let the gravity of its huge market pull other economies into its regulatory orbit. Businesses faced with multiple regulatory regimes will tend to work to the highest standard, known widely as the “Brussels effect”.  Companies such as Facebook have promised to follow GDPR throughout their global operations as the price of operating in Europe.

Excerpts from   Data protectionism: the growing menace to global business, Financial Times, May 13, 2018

An Army of Virtual Data Scientists

Popular search engines are great at finding answers for point-of-fact questions like the elevation of Mount Everest or current movies running at local theaters. They are not, however, very good at answering what-if or predictive questions—questions that depend on multiple variables, such as “What influences the stock market?” or “What are the major drivers of environmental stability?” In many cases that shortcoming is not for lack of relevant data. Rather, what’s missing are empirical models of complex processes that influence the behavior and impact of those data elements. In a world in which scientists, policymakers and others are awash in data, the inability to construct reliable models that can deliver insights from that raw information has become an acute limitation for planners.

To free researchers from the tedium and limits of having to design their own empirical models, DARPA today launched its Data-Driven Discovery of Models (D3M) program. The goal of D3M is to help overcome the data-science expertise gap by enabling non-experts to construct complex empirical models through automation of large parts of the model-creation process. If successful, researchers using D3M tools will effectively have access to an army of “virtual data scientists.”..

“We have an urgent need to develop machine-based modeling for users with no data-science background. We believe it’s possible to automate certain aspects of data science, and specifically to have machines learn from prior example how to construct new models.”..

Some experts project deficits of 140,000 to 190,000 data scientists worldwide in 2016 alone, and increasing shortfalls in coming years. Also, because the process to build empirical models is so manual, their relative sophistication and value is often limited.

Excerpts from DARPA Goes “Meta” with Machine Learning for Machine Learning Data-Driven Discovery of Models (D3M) seeks to increase pace of scientific discovery and improve military planning, logistics and intelligence outcomes,  DARPA Website, June 17, 2016