Tag Archives: science

Data Mining: The New Gold Rush

Photo via Arbeck of WikiMedia Commons


Data and the insight it provides is power. Simply look at the rash of privacy breaches that struck the NSA, Target, iCloud, Samsung and the United States Postal Service to see what most organizations consider private. Data is growing exponentially, and now more than ever online users need to understand what happens to their data in order to avoid, as Dropbox CEO Drew Houston infamously put it, a “trade off between privacy and convenience.”

The Digital Universe is doubling in size every two years. By 2020 the amount of data will have increased from 4.4 trillion gigabytes in 2013, to 44 trillion gigabytes according to a 2014 study done by the International Data Corporation. In more human terms, today the average household creates enough data to fill 65 32gb iPhones per year. In 2020 this will increase to 318 iPhones, according to EMC – a corporation that offers data storage and analysis.

“The amount of data created in the past two years is more than the amount of data we’ve ever had… So there is a huge amount of data and a need for a way to sort through them,” says Hui Yang, an assistant professor in the Computer Science department at SF State.

The bulk of this data is metadata, or information generated when you use technology. It is everyday data collected from consumers’ activities and can contain information such as locations, IP addresses, web searches and other browser histories. By law, most metadata can be stored indefinitely and, through data mining – a field in computer science that analyzes the patterns and connections among data – can be used to classify anything from relationships between genes and diseases, to which internet users are more likely to buy a company’s product. Using this information for commercial purposes is where data mining gets a bad rep.

Although a currently relevant pop culture term, for decades “data mining” has played an intangible role in the growth and comprehension of the digital universe. It helps find patterns among vast amounts of data that human eyes cannot discover. And while data mining analyzes everything from medical data to business data to human rights, it is one of the tools used by data brokers – companies that collect, maintain, and sell data on millions of consumers generally without the consumer’s permission or knowledge.

The negative stigma that now surrounds any and all kinds of large data collection is a more recent development that is more apparent than the data being acquired, and can largely be attributed to the business built around selling people’s metadata.

According to last year’s report from the International Data Corporation, a market research and analysis firm, “In 2013, two-thirds of the digital universe bits were created or captured by consumers and workers, yet enterprises had liability or responsibility for 85% of the digital universe.”

Data brokers are among these enterprises.

Much of the personal information analyzed through data mining and collected by data brokers is demographic and transaction information about the user, the device, and the activities occurring in between. But credit card information, census data, and more public records are also included.

“This information makes clear that consumers going about their daily activities – from making purchases online and at brick-and-mortar stores, to using social media, to answering surveys to obtain coupons or prizes, to filing for a professional license – should expect that they are generating data that may well end up in the hands of data brokers… without their permission to construct detailed profiles on them reflecting judgments about their characteristics and predicted behaviors,” reads a 2014 Senate committee report.

Generally, analyzed metadata only aims to deduct codes and statistics like IP addresses, but when tracked across multiple platforms, the paper trail can become pretty direct.

Even then the Senate report goes on to say that, “Some privacy and information experts have expressed concerns that re-identification techniques may be used with such data, and questioned whether data that identifies specific computers and devices can truly be considered anonymous.”

Anonymous from who? When the Senate asked data brokers who buys their gathered information, companies across all platforms were named.

“12 of the top 15 credit card issuers; seven of the top 10 retail banks; eight of the top 10 telecom/media companies… three of the top 10 pharmaceutical manufacturers; five of the top 10 life/health insurance providers; nine of the top 10 property and casualty insurers,” reads the 2014 Senate report.

Some of the most known offenders: Yahoo, Twitter, Youtube, Google or DoubleClick, and AOL. But what’s surprising is the type of companies who buy and sell consumers metadata.

Just recently, the Associated Press reported the Affordable Care Act website, where Americans can sign up to receive health care, was sending users’ information to a number of third party companies.

So it seems that no matter how personal, some information is not private information, at least not to these companies. The lack of transparency about the amount and type of information gathered and analyzed is ultimately unknown to most users which makes opting out of having your data collected almost impossible.

But fear not, online user security is becoming more of an immediate concern. In February, President Obama announced new rules requiring intelligence analysts, like the NSA, to delete private information they may accidentally collect about Americans. The President also spoke at The White House Summit on Cybersecurity and Consumer Protection at Stanford University on February 13, discussing legislation intended to strengthen cybersecurity, an issue that he likened to “the wild wild west” according to the New York Times.

Since 2009 and continuing into 2014, the Unites States Federal Trade Commission has recommended that Congress develop legislation that allows consumers to view the information data brokers hold about them. One of the few online consumer rights laws is California’s “Shine the Light” law, which requires companies doing business with Californians to allow customers to opt out of information sharing, or disclose how personal information will be shared.

Hence obscure and needlessly long privacy terms and agreements being more relevant than ever.

“Data mining is relatively new and it’s affecting everyone but it does not have many laws. It’s like a free market,” Yang says. ”People definitely feel like they are being watched, but if you look at privacy and then what people post, (privacy) needs a lot of work.”

To some extent, the fear about data mining can be attributed to a general lack of knowledge and regulation, fueled by headlines about the NSA. On the other hand, users are actively creating and allowing the collection and analysis of their information.

Last quarter Facebook reported an average of 890 million active daily users. A 2012 survey done by Pew Research Center shows that, “More than half of social networking site users (58 percent) say their main profile is set to private.” That still leaves the data of 42 percent of social media users unprotected.

Data is constantly being created, but in the current age it has also come to mean more to not only users but to the companies who consume the data. Data has become a panopticon, a platform on which we create our own images and through which others see our constant updates.

Ultimately, it is up to the user to manage what information they put online. Data mining and other computer sciences can be used by consumers as both an advantage and a disadvantage.

When data is pooled about locations and transactions, business with the companies who analyze this data can be much more personalized. Take Google Now as an example. If you input information such as the location of your home or work, your favorite sports teams, your most frequently made food or grocery orders, or even your airplane tickets and Google Now will provide “relative suggestions” on routes to work, restaurants and events in your area, provide updates on your favorite team, and remind you of when your flight is and when you should leave to arrive on time.

In this setting, what can be considered private information can be sacrificed for convenient personalization.

On the other hand, organizations like Stop Data Mining provide “opt out lists” with links to the opt out pages of companies that collect data. Or for a simpler solution, almost all major browsers contain a “Do Not Track” preference. There are other options to remove or manually manage “cookies” that collect metadata, alternative browsers like DuckDuckGo, which doesn’t collect or share personal information, and of course privacy settings on social media.

As the amount of data continues to grow exponentially, there will be a need for more ways to organize and sort it. What’s data mining’s future?

“More of it. More people from more backgrounds becoming data scientists. More tools for data scientists. More schools teaching data science. More products built on data understanding. Oh, and robots.” says Todd Holloway, a data analyst at Trulia and an organizer of the San Francisco Data Mining Group that teaches how to effectively use data mining to say, for example, win at fantasy football.

Whichever way you bend, know the power of the data you put out and the transparency that it carries.


A look into beer making with Method Brewing

Paul Tiplady, one out of four guys in Method Brewing, keeps an eye on the water pump and plate chiller as they transfer and cool the beer into buckets right before adding the yeast as they were making a few batches of beer in San Francisco Sunday, Feb. 22. Photos by Daniel Porter

By day, they are scientists—immersed in labs, handling cells, and manipulating sterile cultures… But when the weekend rolls around, they are debugging data on the science of brewing.

Ryan Dalton and Kenton Hokanson, graduate students of the University of California San Francisco’s neuroscience program, began homebrewing together shortly after becoming roommates.

“Essentially everyone in the life sciences seems to brew their own beer. This is like the only skill you pick up as a biologist, other than doing biology,” Dalton says.

Dalton and Hokanson began attending other people’s brew sessions and picked up on the process of beer making. Soon after, they met Paul Tiplady, a software engineer, and Robert Schiemann, a software developer, through mutual friends, and bonded immediately.

The Method Brewing team was formed from pure experimentation and have been homebrewing obsessively for the past three years. Their expansive drink list includes hundreds of unorthodox flavors not typically seen in the realm of craft brewing, including: jalapeño, coconut, mole, and yogurt.

“It used to be an afternoon social occasion,” remembers Tiplady.

Kenton Hokanson and Robert Schiemann, two out of the four guys in Method Brewing, pours in the yeast for the jalapeno IPA they just finished brewing in San Francisco Sunday, Feb. 22.
Kenton Hokanson and Robert Schiemann, two out of the four guys in Method Brewing, pours in the yeast for the jalapeno IPA they just finished brewing in San Francisco Sunday, Feb. 22.

They would get together, brew bubbly concoctions, and make a mess of the house. Shortly thereafter, they began throwing around unconventional ideas for flavor combinations, eventually brewing them all.

“We brew every recipe that we can think of to see what works and we’re not really afraid of making a bad beer–just dump it if we don’t like it,” shares Hokanson.

On Feb.11, their innovative beers made their first appearance at San Francisco’s Beer Week. Their event, BrewFlood VII, drew favorable responses from the beer community and local entrepreneurs.

Their beer recipes are created with just about anything you’ve ever had in your fridge. The idea of creating their signature Jalapeño Imperial India Pale Ale (JIIPA) came about simply; they all liked jalapeño and they all liked beer.

The five-hour process to create a 10-gallon batch begins with a culture of yeast bubbling in a flask on top of a hot plate.

Kenton Hokanson, one of the four guys apart of Method Brewing, grinds up barley for the base of the beer in San Francisco Sunday, Feb. 22.
Kenton Hokanson, one of the four guys apart of Method Brewing, grinds up barley for the base of the beer in San Francisco Sunday, Feb. 22.

Outdoors, in the shady backyard patio, Hokanson begins to mill the grain, crushing it, just enough to expose riveted sugar pellets. Behind him, fire raises the temperature of 15 gallons of water to 185 degrees. The scalding hot liquid is poured into the accumulation of grains, and as it settles, begins to bubble ferociously. The batch begins to look like a witch’s brew as Schiemann stirs it with a large wooden paddle.

The simple name of the JIIPA, sitting in one of the 23 kegs made for SF Beer Week, is enough to send beer expert Jared Funkhouser running for the hills.

“I’m a sucker for spice,” he exclaims as he picks up the imperial IPA loaded with fresh jalapeños and house made jalapeño tincture.

First, he takes a whiff. “The first thing I smell is the initial bite of the jalapeño,” he says, nervously.

Then, he takes a sip. “It’s amazing,” Funkhouser blurts out, along with a shocked expression. There is just enough spice, without being daunting. It’s nice and smooth with a settle floral note to balance out the flavors, he says.

Paul Tiplady, one out of four guys in Method Brewing, slices and takes out the seeds of jalapeno's for their jalapeno IPA they are brewing in San Francisco Sunday, Feb. 22.
Paul Tiplady, one out of four guys in Method Brewing, slices and takes out the seeds of jalapeno’s for their jalapeno IPA they are brewing in San Francisco Sunday, Feb. 22.

Two unseeded jalapeños go into each gallon of beer; today’s batch has 20 total. Tiplady remembers when they first began using jalapeño in their beer and when he used his fingers, as opposed to a spoon, to un-seed.

“It was a searing pain that lasted for days on the inside of my fingernails,” he recalls.

The beer’s spice is regulated by creating a tincture made by soaking jalapeño seeds in Everclear, a grain alcohol. This method results in a concentrated neon green liquid that is Scoville tested, a systematic spice measurer.

“We don’t try to make a pepper beer that everyone likes, we try to make a pepper beer that some people won’t like but the people who like peppers will love and kill for,” tells Hokanson.

On a recent Monday, they sit on the roof of 326 1st Street, a rundown building in between the SOMA district’s skyscrapers, and the future site of their brewery, Methodology.

The brewery, set to open after a full-blown demolition and remodel, will have a ground level bar, modeled after an industrial laboratory with their beers on tap and a rooftop beer garden that will alternatively serve as a relaxed, warm setting.

Historically, San Francisco has been underserved in terms of bars per capita compared to beer meccas like Portland, according to Tiplady. In 2014, nineteen new breweries opened up in San Francisco, and nine are currently in planning, according to San Francisco Brewers Guild. Tiplady predicts that in the next five years, local brewpubs will grow exponentially, as well as provide the freshest of beers.

“There are a lot of breweries that are doing classic styles, and a lot of people like that kind of beer. But we are trying to push out and do very experimental, very weird stuff and that’s kind of a niche but there’s no one doing that in San Francisco,” says Tiplady.

Kenton Hokanson and Rober Schiemann, two out of the four guys in Method Brewing, pour out some water that is up to temperature to mix with the ground barley to make the mash for the beer in San Francisco Sunday, Feb. 22.
Kenton Hokanson and Rober Schiemann, two out of the four guys in Method Brewing, pour out some water that is up to temperature to mix with the ground barley to make the mash for the beer in San Francisco Sunday, Feb. 22.