Register now!

Sept. 28 - Oct. 3, 2013

DataWeek 2013 Conference and Expo
Browse DataWeek News and Submit Articles

Archive for the Category DataWeek 2013


Neo Technology Discusses An Introduction to Graph Databases

We at Neo Technology believe there’s a whole world of information out there where size is not king, and connectedness assumes the throne. Everything in the real – and digital – world is connected and the amount of value in these relationships is tremendous.

Historical events, for example, are interconnected with political arenas and individual participants. Gene expression is derived from both DNA and environmental factors. Networks, computers, applications and users form intricate interaction networks. The truth is that every aspect of our lives is dominated by connected information and things. Today, big internet companies are trying to harness this power with efforts like the Google Knowledge Graph or Facebook Graph Search.

And whenever we want to store this real world data in a database, we somehow have to take care of this fact. Usually the connections are ignored, denormalized or aggregated to fit in the data model and make operations fast enough. What you lose by doing this is the richness of the information that you could have retained with a different data model and database. That’s where the property graph model and graph databases show up. If graph shaped data shows up in a relational database on the other hand, you’ll easily recognize it by the sheer amount of intermediate join tables and join statements in your queries (and dropping performance levels).

Graph theory is much older than anyone would think. Treating graphs explicitly with database semantics like ACID Transactions is new however. Graph databases are part of the recent NoSQL movement that mostly means non-relational databases. Most of them are open source, developer-friendly and come with a dedicated data-model that suits a certain use case.

Graph databases like Neo4j are well suited to storing, retrieving and quickly querying interesting networks of information. This kind of connected data is also know as graphs – not to be mixed up with artwork, charts or diagrams. Graphs consist of nodes and directed, typed relationships, both of which can hold arbitrary numbers and types of attributes (key-value properties). That is all there is to the graph model.

Graph databases are used in all sorts of interesting applications, including the following:

  • Facebook Graph Search by Max De Marzi imports data from Facebook and converts natural language queries into Cypher statements

  • Rik Van Bruggen’s beer graph shows that even a non-technical person can create a graph model and data and then run interesting queries on it

  • Open Tree Of Life is working on creating a graph of all the organisms in the world

  • Shutl finds the best courier and route for instant (minutes) delivery of goods purchased through e-commerce channels

  • Telenor handles complex ACL resolution algorithms on top of a graph model

Ready to learn more about graphs and graph databases and how they can make your life and development easier? Check out the book Graph Databases, attend a local GraphConnect conference or join a Neo4j training. You can learn more at www.neotechnology.com.

Author Bio


Michael Hunger has been passionate about software development for a long time. He is particularly interested in the people who develop software, software craftsmanship, programming languages, and improving code. For the last few years he has been working with Neo Technology on the Neo4j graph database. As the project lead of Spring Data Neo4j he helped developing the idea to become a convenient and complete solution for object graph mapping. He is also taking care of Neo4j cloud hosting efforts. Good relationships are everywhere in Michael’s life. His life concerns his family and children, running his coffee shop and co-working-space, having fun in the depths of a text-based multi-user dungeon, tinkering with and without Lego and much more.  Follow Michael @mesirii and Neo Technology @neo4j.

Data Week: Three Startups That Tell Amazing Stories With Their User Data

Three Startups That Tell Amazing Stories With Their Data

Most startups struggle to get their story heard by the press. Fortunately data can be  a secret weapon.

Data tells a story all its own, by giving a consistent voice to the behavior of large, diverse groups of users. In an increasingly noisy media landscape, companies that can tell great stories with their data stand the greatest chance of establishing valuable relationships with journalists, grabbing the attention of customers, and engaging strategic partners.

The following are three examples of companies who turned their platform data into entertaining, enlightening and compelling stories. In the process they became regular fixtures in the press, grew their brand and expanded their customer base.


OK Cupid

The OK Cupid company blog OK Trends is the mother of all data storytelling sites. Although it hasn’t been updated since April 2011, it will be a case study for years to come. The popular free dating site had 3.5 million users when it was purchased by Match.com for $50 million, and much of the value was the detailed personal information users shared about themselves in hopes of finding love.

OK Cupid co-founder Christian Rudder was master of dissecting, interrogating and interpreting their data to reach spellbinding conclusions. What’s the best type of photograph to get dates? What shutter speed and exposure will get you noticed? And can someone’s preference for beer determine whether you’re going to get lucky? These are all questions OK Cupid was able to answer from hard data.

Journalists, bloggers and online influencers couldn’t get enough of OK Trends, which is why you’ll still hear so many people express dismay that the blog is no longer operational.


Flurry is a mobile app analytics platform that provides its customers deep insights into app store trends, mobile app usage and detailed predictions on the future of mobile computing. Flurry claims to reach 1 billion mobile users per month, with the ability to gather information from 3.5 billion app sessions daily. That’s a lot of data.

Flurry reports also set the tone for a lot of coverage Recently TechCrunch declared that the age of the paid mobile app is dead, based on Flurry data which show the average price of an iPhone app is just 19 cents. Furthermore, Flurry is able to dive app trends by device and by region, giving abundant ammunition to journalists who cover key technology topics, such as China, South America, mobile gaming, education and entertainment.

By regularly supplying journalists with such juicy data on the state of mobile computing Flurry has become virtually synonymous with mobile analytics, and keeps its name in heavy rotation among top news outlets.


If you use social media for business, you’re probably familiar with link shortener bit.ly, which makes long links fit snuggly into 140-character tweets. While the ability to say more in a tweet is nice, the real power of bit.ly is its ability to gain uniqe insight into what is happening on the social web, and by extension, what matters to people out in the world.

At the 2011 Web 2.0 Summit Bit.ly data scientist Hilary Mason shared the story of how her company watch the events of the Arab Spring unfold in real-time by tracking what content was being shared, and by whom. From small flashes of sharing activity, to a flood, Bit.ly’s real-time data was able to unmask the key online influencers, and tell the story of a social movement in a way that even a journalist could not.

And just this week Bit.ly unveiled its Real-time Media Map, which shows the state and location where content is being viewed and shared for top American publications. Though it might like the gravity of the Arab Spring, it’s a no less impressive technical feat.

And the real lesson is that by giving journalists something that’s easy to cover, it’s easy to fastrack your company’s story in the press. Journalists love nothing more than to talk about themselves. I should know, I was one.


There are many great ways to turn your company data into great news stories. I’ve covered three quick examples. Data gives large groups of your customers and users a voice in aggregate, and provides journalists and writers a new way to see human behavior. And this is what makes data-driven storytelling so powerful.

The data you collect is unique, specialized and timely, which means that it provides a window into the lives of hundreds of thousands or millions of people as events are unfolding. By sharing his data with journalists you’re able to tell a story that no one else knows, and this is exciting and enlightening.

Whether you choose to form charts, infographics, or simply share the results of a user survey, there are many great ways to tell stories with data that will make you stand out from a crowded field, and get the attention your startup deserves.


About Author

Chikodi Chima is a former VentureBeat staff reporter whose consultancy Moonshot helps startups with their public relations and marketing. His writing has appeared in Fast Company, Mashable and GigaOm. Read his blog: PR Tips For Startups.


Why Mobile Security is All About the Data

Mobile malware may dominate headlines, but according to the recent Linkedin Information Security Community survey of 1,600 IT administrators, data loss is a bigger priority in their organizations than malware (75% versus 47%). With 28% of corporate data accessed through mobile devices, it’s no wonder they’re concerned.

Today,62% of workers use their personal smartphones for work. While the majority of these users are not thinking about the security of corporate data, corporate security teams need to be on alert and proactively addressing the risk. As a former CISO, I have faced this problem first hand. I remember the moment when we began to trade user experience for the sake of security. My job became all about saying, “no” which didn’t work then, and increasingly won’t work now, in a world where users can bring their own apps and devices to be more productive. A mobilized workforce means increased flexibility and productivity, but it also means a dramatic shift in the way that organizations handle security.

While most organizations make investments in mobile management, the majority of solutions available today focus on IT asset management and configuration of devices, not on securing data, enterprise access, and the end user. The old model of data protection in a walled garden just doesn’t apply to the ever-changing enterprise where data flows in and out of SaaS services through employee-owned devices. The wave of mobile security threats we see rolling in means that we must begin with a new approach to address both threats and user needs so they won’t need to go around IT controls to do their job.

(Users have shown themselves to be highly effective in circumventing mobile security controls with a quarter of them having done so to get their jobs done, and when given a choice, they will simply not participate in BYOD programs.)

To avoid a user rebellion, we must embrace mobile security that doesn’t sacrifice user experience or enterprise security needs thus allowing users to be productive with the apps and devices they need. And to that end, begin the long and important process of building a data security model that fulfills, and does not conflict with, the spirit of BYOD.

About the Author:

Adam Ely is the Founder and COO of Bluebox. Prior to this role, Adam was the CISO of the Heroku business unit at Salesforce where he was responsible for application security, security operations, compliance, and external security relations. Adam was named one of the top 25 security influencers to follow in 2012 for his industry contributions and is the author of the forthcoming McGraw-Hill book, Information Security Business & Strategy Essentials.  Follow Adam on Twitter @adamely.


Top 5 Data Industries Attendees Want to Know More About at DataWeek 2013

DataWeek Attendee Interest by Industry.  Interactive Visualization.


This week over 4,000 data professionals are gathering in San Francisco for DataWeek 2013.

Sign-ups for Dataweek have been rolling in over the past few months and we wanted to find out what industries in the data universe were of interest to DataWeek attendees.


We partnered up with Algorithms.io to create a chord diagram of the Dataweek attendee professional graph.


We sampled 1,000 Dataweek attendee profiles to create the diagram.  It shows the strength of relationship between different data industry segments.


Here are the Top 5 Data Trends at DataWeek:

  1. Business Intelligence hit our list at number 5 for number of attendees interested in BI.

  2. The Business Software industry is deep  in data and represent a strong 4th place on our list.

  3. Geo/Mobile data barely missed our 2nd spot for 3rd largest data industry hot at DataWeek.

  4. Data Scientists for Business Analytics companies are the 2nd largest data industry of interest at DataWeek.

  5. Taking the #1 Spot, Data experts indicated Big Data as the must-know at DataWeek 2013.


The diagram shows the relationship between different market segments that attendees work in. For example, if someone works in both Big Data and Advertising, then you would see a chord connecting those two.  The thicker the chord, the more shared relationships between those two segments.


An interesting result is what industries “Big Data” is connected to.  Many people self-identify as being a Big Data company, but based on the chords coming from the Big Data segment we getter a better understanding of what industries Big Data companies are actually working in.


This also gives an indication on where the current business revenue opportunities are in Big Data (more companies = more revenue).


About the Author:


Andy Bartley is co-founder and CEO of Algorithms.io. Algorithms.io was developed in 2012 and now provides algorithms as a service for application developers and data scientists to build intelligent applications. Follow Andy on Twitter @algorithms_io.


Transparency as a Service: How We Got Here

During our time building privacy friendly data products at Enliken we’ve learned a lot about consumers’ opinions and perceptions of the information that describes them. Along the way we also talked to dozens of companies about how they gather and use data.

Our key insights:

  • Consumers are happy to share some information with businesses they trust, and fiercely protect the rest.

  • When consumers say “privacy” they don’t mean isolation, they mean control.

  • There are 4 factors that influence a consumer’s perception of privacy – transparency, content, use and retention.

  • The data that advertisers want most (intent and affinity data) is information that consumers are happy to share.

Our new product leverages those insights to create a solution to the privacy paradox facing digital marketers today: how can i provide relevance while respecting privacy? Last week Julie Bernard described the paradox at D2:

“There’s a funny consumer thing ,” she said. “They’re worried about our use of data, but they’re pissed if I don’t deliver relevance. … How am I supposed to deliver relevance and magically deliver what they want if I don’t look at the data?”

As it turns out, this might be a false choice. Consumers don’t want to deny brands the ability to capture and use data as much as they want the ability to control the terms under which it happens. Especially for brands like Macys that wield such strong brand equity that they could easily ask for more data. Research by PWC, DMA(UK), McCann and others has confirmed this.

So what are these magic terms that make consumers comfortable with advertisers using their data for marketing? How can we tap the full potential of data driven marketing?


It’s our opinion that individuals simply want to know what data is being gathered, what it’s being used for, and for certain types of data, how long it’s going to be kept for. In the case of digital advertising the use is obvious and retention isn’t a factor because the content is innocuous.

Let’s talk more about the content. In the case of most behavioral advertising, content is an age range, salary range, the types of things you are shopping for, maybe if you own pets, etc. You’d expect people to normally rate this stuff as pretty harmless. They do – earlier this year we showed 600 people their profiles from 5 major online data brokers and asked them to rate sensitivity. Overall they rated only 9% of datapoints as sensitive, after drilling down to look at data about travel, shopping intent, and interests we found their sensitivity tracked very close to 0%.

After people saw their profiles in most cases they shrugged and said “this is what all the fuss is about??” We saw the same behavior after the Acxiom dashboard was launched last week, and their data is arguably more sensitive because it includes much more granular details and is tied to a name and social security number.

So consumers aren’t alarmed or offended when they see data. But how do regulators feel about transparency?

“A recurring theme I have emphasized — and one that runs through the agency’s privacy work — is the need to move commercial data practices into the sunlight. For too long, the way personal information is collected and used has been at best an enigma ‘enshrouded in considerable smog.’ We need to clear the air.”

The Privacy Challenges of Big Data: A View from the Lifeguard’s Chair by FTC Chairwoman Edith Ramirez

“Consumers have a right to access and correct personal data in usable formats, in a manner that is appropriate to the sensitivity of the data and the risk of adverse consequences to consumers if the data are inaccurate.”

President Obama’s Privacy Bill of Rights

It’s clear that regulators are in favor of transparency, and consumers are satisfied and generally disinterested once they see profiles, but the advantages of transparency go beyond appeasement. As consumers are made familiar to their online profiles they will find data usage even less sensitive; allowing marketers to leverage data safely in more ways with even less risk of pissing off the consumer.

The best part about transparency is that it’s simple, easy and doesn’t require a business to change how they gather or use data. This is why we’re re-launching Enliken today as Transparency as a Service. We’ve made it easy for any digital marketer to securely and safely disclose consumer profiles.

About the Author: marc_allthingsd Marc is co-founder and CEO of Enliken, a consumer friendly data company. He’s an advocate for transparency around the gathering, use and retention of data. Previously Marc was founding CEO of Spongecell, he holds a degree in Social Decision Sciences from Carnegie Mellon University.  Follow Marc on Twitter @guldi.

Zipfian Bets on Immersive Data Science Education

Zipfian Academy, a school for data science, threw open its doors this week for its inaugural class of students, each starting a career in what has been called the “sexiest profession of the 21st century” by the Harvard Business Review.  Monday kicked off the 12-week intensive training program, where students are learning the multi-faceted craft of distilling intelligence from data.

“We’re seeing incredible demand for data science education,” says Zipfian Academy co-founder and CEO Ryan Orban. “This is one of the most sought-after careers of our time, and universities are not moving fast enough to meet the demand. Companies are ready to fight for great candidates, but there are not many academic settings that merge statistics, business analysis, and computer programming with the critical thinking skills that make a great data scientist.”

Over 220 applicants vied for just 13 spots in Zipfian Academy’s inaugural class. The hand-picked group will spend the next three months immersed in data science fundamentals in a hands-on learning environment, working full-time from a sunny, open office in San Francisco. Zipfian Academy has attracted students with quantitative backgrounds who want to jump feet-first into the emerging profession. Students in the first cohort come from academic research, consulting, and software development backgrounds, and come ready to join top tech companies or startups with their newly developed skills.

Students will learn the full stack of abilities required to be successful as a data scientist from asking the right questions to comprehensive data analysis and effective communication. This includes both critical thinking abilities and problem-solving mindset that is key to success, as well as deep dives into computer programming and statistical tools for uncovering insights from data.

The students are betting that the skills and network they develop will be more valuable than traditional routes of advanced education.

“Education is changing rapidly,” says co-founder and CTO Jonathan Dinu. “Online courses have democratized the knowledge once locked up in the halls of universities. We’re offering a hands-on experience created by data science experts that compiles the most important aspects of data science into one carefully assembled package.”

Students at Zipfian Academy have access to exclusive recruiting events with world-class companies in technology and consumer retail, and a hiring day for smaller companies and startups looking to hire their first data scientist. Negotiations are underway with a number of recruiting partners. The company will also develop its partnerships with a select handful of Silicon Valley tech startups in the months ahead.

“We’re seeing first-hand that companies cannot find the data science talent they need,” said Orban. “The companies we work with want to hire great people who can deliver valuable insights, not just a candidate that looks qualified on paper. We’re part of something here that’s really exciting: teaching our students the tools they need do amazing things with data science in a world where skills matter more and more.”

Zipfian Academy will be presenting the story behind the school on the main stage at DataWeek. They’ll also be exhibiting at the DataWeek Expo, and teaching a 1-Day Bootcamp on Data Science and Machine Learning on Monday, 9/30.

Learn more at zipfianacademy.com.

About the Company:

Zipfian Academy is a school teaching data science through an immersive 12-week course. The program helps students develop the mindset and technical skills they need to launch a data science career. Students receive access to a world-class network of data science experts, personalized instruction featuring hands-on projects, a collaborative learning space in San Francisco, and exclusive recruiting events with top-tier technology companies.   Follow on Twitter @zipfianacademy