Some of this data has been collected in the past, but for various reasons (the biggest of which being the market and channel control by the majors) almost all data points beyond Soundscan and Billboard charts seem to have been ignored. Now, however, there is a perfect storm for music data: more artists than ever before, more listeners than ever before with easier access through more channels than ever before, and more technology for data collection than ever before.
Put simply: music data is both more important and more available than ever before.
This shouldn’t come as a surprise, and John is not alone in asking for more data. Companies are springing up left and right to help answer her prayers. Companies like Next Big Sound, BandMetrics, and Music Metric are busy collecting third party data from around the web in single dashboards. Companies like Bandize and Artist Data are helping bands organize their own data. Companies like HypeMachine and Elbo.ws are aggregating marketing/A&R data in the form of blog and Twitter charts. Companies like Songkick and Gigulate help fans track basic concert data and music news, respectively, creating interesting data sets that are sure to surface soon.
The issue isn’t in the data collection or availability, it’s in the connecting, processing, and understanding. Like all data, the points themselves are completely irrelevant without context – a MySpace play doesn’t inherently mean anything, nor does a friend on Facebook. Data can only be understood with relevant ratios.
Connecting the data might be the biggest obstacle at the moment, but I’m hopeful that will change. It’s primarily an issue of a few major players (namely, the retail channel) not having open APIs, or sharing any of the data with the artists – when a fan buys an album on iTunes, that fan data belongs to iTunes and iTunes doesn’t want to part with it.
The other obstacle in connecting the data is the breadth to which music discovery, engagement, and purchasing have spread. You used to get your discovery metrics solely from radio numbers, your engagement metrics from concert tickets, and your purchasing metrics from Soundscan. Now discovery can span blogs, P2P, internet radio, and beyond, engagement can be a play on a blog, a MySpace page, an iPod, or anywhere else, and while record purchases are still largely covered by Soundscan, records are an increasingly small portion of the overall business picture for an artist.
Pulling all that data together is not easy, but I think it can be done as long as those collecting the data are ensuring it’s clean and are willing to make it accessible. Purchases should be easiest, as long as retail channels come around. Last.fm and Twones are handling engagement fairly well (though neither has all the necessary channels covered, by any stretch). Discovery is incredibly tough without having the engagement piece nailed – if you don’t have a view of all a fan’s engagement data, how do you know the first time/place they interacted with an artist?
Progress is coming, but only as a stream of one-dimensional data points. More radical change is needed in order to get to the ratios that are truly relevant and key to making informed business decisions. We need to be able to close the loop on fan data – how fans move down the funnel from discovery to engagement to purchasing. Knowing that you had a spike in MySpace plays around the same time you had a spike in iTunes sales shouldn’t be surprising, and is thus practically worthless (if you didn’t see a spike in sales, it would indicate a strong PR effort but not a high enough quality product for fans to take out their wallets, so the data isn’t totally worthless). Knowing that the majority of the folks who purchased your deluxe edition CD through Amazon first heard your song on Pitchfork and downloaded your EP through P2P is highly valuable.
The technology to make this happen is not expensive, nor particularly difficult to implement, with APIs and OpenID or OAuth. Thanks to the web's ever-expanding processing power, we have far easier access to more data than ever before – it’s no longer reserved for the major corporations who can pay hundreds of thousands of dollars to market research firms to run surveys. We just need to have the right technology implemented properly in order to make the data connection easier.
If we can connect all the data collection points properly, the processing isn’t horribly difficult – it’s merely implementing techniques that economists and social scientists have used for ages to slice and dice large data sets. Everyone should be able to be armchair statisticians (as Google Analytics has enabled web hosts to be), the winners will be the ones who have the deepest understanding how to find the most relevant data and take the most appropriate actions as a result.
The understanding is the place where I fear the music business will have the most catching up to do. From my personal experience, data analysis seems to be a new concept to many industry veterans. Many are excited by the prospect of data being available, but few seem to know what to do with it – it serves as little more than eye-candy, another blurry trend chart at a corporate meeting (to be fair, it's not yet easily available in relevant and easy-to-read format without some tech savvy and excel skills). Layer on top of that the fact that people in the music business seem to be among the world’s biggest optimists, and you’ve got a recipe for disaster when it comes to biased or outright incorrect interpretation of data.
The winner of the data wars among the music startups will be the one who can provide the deepest insights, not just the most data. They will preemptively answer both questions of “What does this mean?” and “What should I do?”
The biggest winner will also have the tools to power the “What should I do?” actions. This means marketing, ecommerce, and access to relevant channels. These tools need to empower the artist, but also add value. In the end, I believe success may be measured as some variation of the following equation (would love to hear your feedback on improvements to the equation), and the winners among the startups will be the ones who can prove to consistently impact the quotient the most:
(Engagement + (Investment x Demand Generation Execution))^Quality = $$$
Note that quality is hyperefficient and falls almost entirely on the artist. Even the best data analysis in the world can’t make up for a product that doesn’t appeal. But it can tell you why your product isn’t selling as well as your competitors, and what you can do to make incremental improvements from a business perspective. And that makes data incredibly valuable, and a worthwhile pursuit -- it will undoubtedly unlock a plethora of new business models for music, and allow artists to figure out their own optimal model.