Tag Archives: DARPA Memex

The Reckless Gambles that Changed the World: darpa

Using messenger RNA to make vaccines was an unproven idea. But if it worked, the technique would revolutionize medicine, not least by providing protection against infectious diseases and biological weapons. So in 2013 America’s Defense Advanced Research Projects Agency (DARPA) gambled. It awarded a small, new firm called Moderna $25m to develop the idea. Eight years, and more than 175m doses later, Moderna’s covid-19 vaccine sits alongside weather satellites, GPS, drones, stealth technology, voice interfaces, the personal computer and the internet on the list of innovations for which DARPA can claim at least partial credit.

It is the agency that shaped the modern world, and this success has spurred imitators. In America there are ARPAS for homeland security, intelligence and energy, as well as the original defense one…Germany has recently established two such agencies: one civilian (the Federal Agency for Disruptive Innovation, or SPRIN-d) and another military (the Cybersecurity Innovation Agency). Japan’s interpretation is called Moonshot R&D. 

As governments across the rich world begin, after a four-decade lull, to spend more on research and development, the idea of an agency to invent the future (and, in so doing, generate vast industries) is alluring and, the success of DARPA suggests, no mere fantasy. In many countries there is displeasure with the web of bureaucracy that entangles funding systems, and hope that the DARPA model can provide a way of getting around it. But as some have discovered, and others soon will, copying DARPA requires more than just copying the name. It also needs commitment to the principles which made the original agency so successful—principles that are often uncomfortable for politicians.

On paper, the approach is straightforward. Take enormous, reckless gambles on things so beneficial that only a handful need work to make the whole venture a success. As Arun Majumdar, founding director of ARPA-e, America’s energy agency, puts it: “If every project is succeeding, you’re not trying hard enough.” Current (unclassified) DAROA projects include mimicking insects’ nervous systems in order to reduce the computation required for artificial intelligence and working out how to protect soldiers from the enemy’s use of genome-editing technologies.

The result is a mirror image of normal R&D agencies. Whereas most focus on basic research, DARPA builds things. Whereas most use peer review and carefully selected measurements of progress, DARPA strips bureaucracy to the bones (the conversation in 1965 which led the agency to give out $1m for the first cross-country computer network, a forerunner to the internet, took just 15 minutes). All work is contracted out. DARPA has a boss, a small number of office directors and fewer than 100 program managers, hired on fixed short-term contracts, who act in a manner akin to venture capitalists, albeit with the aim of generating specific outcomes rather than private returns.

Excerpt from Inventing the future: A growing number of governments hope to clone America’s DARPA, Economist, June 5, 2021

Investigating the Deep Dark Web

DARPA’s Memex search technologies have garnered much interest due to their initial mainstream application: to uncover human trafficking operations taking place on the “dark web”, the catch-all term for the various internet networks the majority of people never use, such as Tor, Freenet and I2P. And a significant number of law enforcement agencies have inquired about using the technology. But Memex promises to be disruptive across both criminal and business worlds.

Christopher White, who leads the team of Memex partners, which includes members of the Tor Project, a handful of prestigious universities, NASA and research-focused private firms, tells FORBES the project is so ambitious in its scope, it wants to shake up a staid search industry controlled by a handful of companies: Google, Microsoft,  and Yahoo.

Putting those grandiose ideas into action, DARPA will today open source various components of Memex, allowing others to take the technologies and adapt them for their own use. As is noticeable from the list of technologies below, there’s great possibility for highly-personalised search, whether for agents trying to bring down pedophiles or the next Silk Road, or anyone who wants a less generic web experience.

Uncharted Software, University of Southern California and Next Century Corporation
These three have produced the front-end interfaces, called TellFinder and DIG, currently being used by Memex’s law enforcement partners. “They’re very good at making things look slick and shiny. Processing and displaying information is really hard and quite subjective,” says White.

The ArrayFire tech is a software library designed to support accelerated computing, turbo-boosting web searches over GPUs. “A few lines of code in ArrayFire can replace dozens of lines of parallel computing code, saving users valuable time and lowering development costs,” the blurb for the technology reads.

Carnegie Mellon University (CMU) is building various pieces of the Memex puzzle, but its TJBatchExtractor is what’s going open source today. It allows a user to extract data, such as a name, organisation or location, from advertisements. It was put to good use in the anti-human trafficking application already in use by law enforcement agencies.

Diffeo’s Dossier Stack learns what a user wants as they search the internet. “Instead of relying on Google’s ranking to tell you what’s important, you can say, “I want the Thomas that’s in the UK not the US, so don’t send me anything that has US-oriented information,” explains White.

Hyperion Gray’s crawlers are designed to replicate human interaction with websites. “Think of what they do as web crawling on steroids,” says White. Its AutoLogin component takes authentication credentials funnelled into the system to crawl into password-protected areas of websites, whilst Formasaurus does the same but for web forms, determining what happens when fields are filled in. The Frontera, SourcePin and Splash tools make it easy for the average user to organise and view the kind of content they want in their results. Its HG Profiler code looks for matches of data across different pages where there’s no hyperlink making it obvious. Hyperion Gray also built Scrapy-­Dockerhub, which allows easy repackaging of crawlers into Docker containers, allowing for “better and easier web crawling”, notes White.

IST Research and Parse.ly: “These tools [Scrapy Cluster, pykafka and steamparse] are major infrastructure components so that you can build a very scalable, real-time web crawling architecture.”

Jet Propulsion Laboratory (JPL). This NASA-based organisation has crafted a slew of Memex building blocks, four of which – ImageCat, FacetSpace, LegisGATE and ImageSpace – are applications built on top of Apache Software Foundation projects that allow users to analyse and manipulate vast numbers of images and masses of text. JPL also created a video and image analysis system called SMQTK to rank that kind of visual content based on relevance, making it easy for the user to connect files to the topic they care about. Its Memex Explorer brings all those tools together under a common interface.

MIT Lincoln Laboratory.  Three of MIT’s contributions – Text.jl, MITIE, Topic – are natural language processing tools. They allow the user, for example, to search for where two organisations are mentioned in different documents, or to ask for terse descriptions of what a document or a webpage is about.

New York University.  NYU, in collaboration with JPL and Continuum Analytics, has created an interface called Topic, which lets the user interact with “focused crawlers”, which consistently update indexes to produce what’s relevant to the user, always “narrowing the thing they’re crawling”, notes White. “We have a few of these different kinds of crawlers as it’s not clear for every domain what the right crawling strategy is.

Qadium.  This San Francisco firm has submitted a handful of utilities that allow for “data marshalling”, a way to organise data so it can be inspected in different ways.

Sotera Defense Solutions. This government contractor has created the aptly-named DataWake. It collects all links that the user didn’t click on but could, and maybe should, have. This “wake” includes the data behind those links.

SRI International.  SRI is working alongside the Tor Project, the US Navy and some of the original creators of Tor, the anonymising browser that encrypts traffic and loops users through a number of servers to protect their identities. SRI has developed a “dark crawler” called the Hidden Service Forum Spider, that grabs content from Hidden Services – those sites hosted on Tor nodes and are used for especially private services, be they drug markets or human rights forums for those living under repressive regimes. The HSProbe, meanwhile, looks for Hidden Service domains. The Memex team is keen to learn more about the darker corners of the web, partly to help law enforcement clean it of illegal content, but also to get a better understanding of how big the unmapped portions of the internet are.

DARPA is funding the Tor Project, which is one of the most active supporters of privacy in the technological world, and the US Naval Research Laboratory to test the Memex tools. DARPA said Memex wasn’t about destroying the privacy protections offered by Tor, even though it wanted to help uncover criminals’ identities. “None of them [Tor, the Navy, Memex partners] want child exploitation and child pornography to be accessible, especially on Tor. We’re funding those groups for testing,” says White.

DeepDive from Stanford turns text and multimedia into “knowledge bases”, creating connections between relationships of the different people or groups being searched for. “It’s machine learning tech for inferring patterns, working relationships… finding links across a very large amount of documents,” adds White.

Excerpts from Thomas Fox-Brewster, Watch Out Google, DARPA Just Open Sourced All This Swish ‘Dark Web’ Search Tech,Forbes, Apr. 17, 2015

For extensive information see DARPA MEMEX

How to Search the Deep Web: DARPA MEMEX

From the DARPA website

Today’s web searches use a centralized, one-size-fits-all approach that searches the Internet with the same set of tools for all queries. While that model has been wildly successful commercially, it does not work well for many government use cases. For example, it still remains a largely manual process that does not save sessions, requires nearly exact input with one-at-a-time entry, and doesn’t organize or aggregate results beyond a list of links. Moreover, common search practices miss information in the deep web—the parts of the web not indexed by standard commercial search engines—and ignore shared content across pages.

To help overcome these challenges, DARPA has launched the Memex program. Memex seeks to develop the next generation of search technologies and revolutionize the discovery, organization and presentation of search results. The goal is for users to be able to extend the reach of current search capabilities and quickly and thoroughly organize subsets of information based on individual interests. Memex also aims to produce search results that are more immediately useful to specific domains and tasks, and to improve the ability of military, government and commercial enterprises to find and organize mission-critical publically available information on the Internet…

Initially, DARPA intends to develop Memex to address a key Defense Department mission: fighting human trafficking. Human trafficking is a factor in many types of military, law enforcement and intelligence investigations and has a significant web presence to attract customers. The use of forums, chats, advertisements, job postings, hidden services, etc., continues to enable a growing industry of modern slavery. An index curated for the counter-trafficking domain, along with configurable interfaces for search and analysis, would enable new opportunities to uncover and defeat trafficking enterprises.

The Memex program gets its name and inspiration from a hypothetical device described in “As We May Think,” a 1945 article for The Atlantic Monthly written by Vannevar Bush, director of the U.S. Office of Scientific Research and Development (OSRD) during World War II. Envisioned as an analog computer to supplement human memory, the memex (a combination of “memory” and “index”) would store and automatically cross-reference all of the user’s books, records and other information.