Government Data Mining in 2019

Facial Recognition Grows Up

Observers could spend every working minute analyzing facial recognition to stay updated with its constant changes. For example, Amazon recently announced a change to its Rekognition software that “improved accuracy for emotion detection (for all 7 emotions: ‘Happy’, ‘Sad’, ‘Angry’, ‘Surprised’, ‘Disgusted’, ‘Calm’ and ‘Confused’) and added a new emotion: ‘Fear’. Lastly, we have improved age range estimation accuracy; you also get narrower age ranges across most age groups.”

Somehow Amazon is still working on age estimation accuracy, but can detect fear.

Facebook also announced new privacy settings for DeepFace, its facial recognition software. That sounds nice, but remember that DeepFace is believed to be the largest facial recognition database in the world thanks to the 250 billion photos that have been voluntarily uploaded to Facebook. The company claims that it beats the FBI’s facial recognition programs with 15% more accuracy.

Google’s Face Match algorithm now makes use of a camera in its Nest Hub smart home display, which is a nice way of saying that Google’s thermostat and light controlling gizmos point an always-on camera at your living space. You can learn more about that in CNet’s excellent “Google collects face data now. Here’s what it means and how to opt out.“

The race to get this facial data isn’t only to sell you more stuff although that’s certainly helpful. Live Nation and its Ticketmaster subsidiary has said that it will use facial recognition at live events. Not so fast, say some artists like the aptly named Rage Against The Machine.

More than half of U.S. adults trust law enforcement agencies to responsibly use facial recognition, according to Pew Research. The approval rating drops to 36% for technology companies and 18% for advertisers. California lawmakers sent a bill last week to Governor Gavin Newsom that would ban state and local police from using facial recognition software on their body cameras.

Tattletale Apps and Ancillary Data

Scary stories about phone apps, browser extensions, and smart devices abound in our society. We’re no longer surprised when we learn that a tech company is selling ovulation data from apps women use to track their periods or that Foursquare doesn’t care if you use their app to check in to a location since they have “passive” data collection.

Personal data from all of your transactions constantly flows into buckets at data brokerages around the world. WaPo columnist Geoffrey Fowler wrote a blockbuster expose this summer about browser extensions that seem innocuous but “leak information” directly to data brokers. In Fowler’s expose, one of the browser extensions was used to magnify images on a screen, but requested the ability “to read and change your browsing history.” The extension had 800,000 users and was packaging each user’s search history.

At a large family gathering this weekend, I was asked to troubleshoot someone’s PC because it seemed like Google was unresponsive. After only fifteen minutes of tinkering I found that there was a Firefox extension that promised private browsing. Instead, it read search data and routed the request to another network. Luckily, they didn’t return to Google but to Yahoo! search, which was my first clue that something terrible was happening.

Don’t forget that the absence of data is also data. Netflix raised eyebrows last month when The Verge found that Netflix was monitoring a phone’s physical activity sensor. Netflix later said it was a test to see if they could improve video quality while people were watching on the move. But the question remains why a video app gets to track your movements and activity. Fitness trackers, phones, and smart watches all have the ability to understand where you are and what you are doing or not doing.

Even medical data isn’t protected despite health privacy laws. ProPublica found 5 million health records on hundreds of computer servers worldwide. Anyone with a web browser or a few lines of computer code can view patient records, they found, including names in some cases. They didn’t do any hacking or nefarious activities because the records—either for consultation or stored for archives—were publicly accessible on the Internet.

Google, Amazon, and Microsoft are part of a new trade group called the CARIN Alliance that is creating a medical records universal standard for patient records. You’re probably already thinking to yourself, “What could go wrong with those three setting up programs accessing my most personal data?” Good news. The federal government, many state governments, and major health insurance companies are also participating.

The point is that your transactions every day create a growing pool of data about you. Here in northern Virginia, our state is one of several using “remote sensing” that checks a vehicle’s emissions when it passes through a toll booth. The program is a great way to monitor air quality but also allows local jurisdictions to understand which vehicles don’t meet emissions standards and the locations that they travel through.

Foursquare would call that a passive check-in.

The Algorithms

DNA testing at home led to big databases stuffed with results—and helped police solve multiple cold case crimes, including a 52 year old murder case in Seattle. GEDmatch, one of the larger aggregators of uploaded DNA data, is the database police most often use. That old Seattle case and the Golden State Killer case received headline attention, but law enforcement agencies are solving dormant cases every week using this unique collaboration between the public and law enforcement.

Users can opt-in to allow police genealogy experts to work with crime scene DNA results, genealogy hobbyist results, and create family trees for people who are still living.

Technology is also fueling the New York Police Department’s real life exampleofa detective movie staple. Using software they developed and then made public for free, the NYPD uses Patternizr to find similarities between crimes. Like the genealogy situation, Patternizr requires human analysts to sort through the program’s output and decide which results to send to detectives.

Police are also finding new ways to use older technology like cameras and scanners. In London, the BBC reported that police tested rail passengers for hidden explosives or knives using new scanners that providing imaging from up to thirty feet away. Cameras are more widely used in other countries to surveil cities according to Comparitech. Their overview shows that London and Atlanta are the only non-Chinese cities on a list of the ten most surveilled cities, but plenty of western cities made the top 20, including Chicago, Sydney, and Berlin.

Benign social media use exists throughout law enforcement. We’ve all read tweets and social media updates about events in our communities as well as efforts to humanize officers. For example, the Gloucester (NJ) Police post images of recovered bicycles on Pinterest. But for every wholesome use of technology, we also see complaints like a 2016 ACLU of California warning about some police departments tracking activists and their movements on social media.

What Happens Next

Ivanka Trump didn’t start the trend, but quickly tried linking gun violence prevention legislation the White House finds troubling to a new federal agency proposal called the Health Advanced Research Projects Agency, or HARPA. Proponents see the agency as a medical science equivalent of the military’s DARPA, which created the technology that evolved into the Internet.

The administration specifically wanted to know if this new agency could help identify people who were on the brink of becoming mass shooters. Washington Post reporting shows that their three page proposal included tracking data from fitness trackers, smart watches, and mobile phones used by mentally ill consumers, which presupposes that gun violence is linked to mental health, something that is in no way proven.

The HARPA example of analyzing Fitbit data is one extreme but real example of government data mining and law enforcement using technology in preemptive ways. Another extreme recent example is Wednesday’s news that the Department of Justice will authorize Homeland Security to collect DNA from all migrants who are detained rather than only those who are arrested. We’ve covered DNA databases before, but this is DNA involuntary seized when a non-American is detained. That DNA will also undoubtedly be used to identify American citizens, leading many to question the constitutionality of the federal government collecting the data.

In addition to physical tracking, government agencies are also increasingly interested in using semantic analysis to question the words people post to social media. This type of analysis has been around for years and is behind robust marketing concepts like search engine optimization and advertising, but government plans call for wholesale monitoring of all platforms.

Israeli startup Zencity expanded into the U.S. last year and already has deals in place with local governments in Chicago, San Francisco, and Houston to monitor social media and telephone calls to city services while classifying citizen sentiment. This is no longer about counting complaints, but using software to classify the severity of the feedback. Federal offices increasingly want this information too, and Attorney General William Barr co-signed a joint US-UK open letter Thursday that urges Facebook not to encrypt communications.

The French government also wants social media access according to The Guardian last Tuesday, but for tax purposes. The French Public Action and Accounts Minister said last year in an interview that “the tax office will be able to see that if you have numerous pictures of yourself with a luxury car while you don’t have the means to own one, then maybe your cousin or your girlfriend has lent it to you, or maybe not.”

China remains the foreign government most invested in social media. The country’s Social Credit System remains a hodgepodge of basic counting (think: number of complaints), business information, and traditional credit reporting (which some may argue is already creepy enough).

China’s vague plans were written about in breathless terms by Western media, especially in America, and have served as the backdrop or inspiration for more than one television show. Since then privacy advocates in the West agree that social credit scores could be very bad indeed, but no one understands how to codify those yet.

A fantastic explainer infographic by Visual Capitalist explains how social credit grew out of financial markets and has been used to stop people with unpaid taxes from leaving China or dog owners who don’t clean up after their dogs to potentially lose them. Both of those penalties sound fine. But there are warning signs too, including citizens being blocked from purchasing air or rail tickets or being eligible for a job.

The Bottom Line: Nothing summarizes the dynamic nature of governments using consumer technology to govern better than what happened as we wrote this series. We developed the idea to write about government data mining at the end of this summer and began the series in September. Since then we have had opportunities to include multiple new stories each week.

What was written about China’s systems in 2015 and 2016 are inaccurate now. Either a new administration or a Trump reelection in 2020 will create additional programs.

And there are ever-increasing numbers of private programs such as the DRN vehicle location database created entirely by companies that repossess vehicles. They’re tracking locations of all vehicles, not only the ones they’re interested in pursuing. They’re likely tracking your car too, which begs an answer to the oft-asked: whose data is it anyway?