There Is a Subtle Power Struggle for Control of Music Metadata
The word "metadata" requires the presence of another entity to which the data at hand refers. Increasingly, it now points toward the listener instead of toward whatever they're listening to.
Waveforms. Image: Matthew Potter/Flickr
Art is shaped by its infrastructure. This is obvious enough during creation, since a paintbrush and an electronic stylus deposit their marks differently, but it's also true during consumption, because as the presentation changes, the material being shown jockeys for position in the new frame.
Format shifts have already altered the mechanics of music simultaneously several times over the past few decades, and the recent rush toward streaming services like Spotify and Google Play now positions a technology company between the listener and the material. Surely remote cloud storage is a new audio format at least as much as the Walkman?
This is a new kind of consumer relationship, and the play button has a different meaning for each side; to the business, it does more than just switch on entertainment. As a result, there's now a sort of subtle power play occurring over control of the metadata which surrounds the music and connects it to search fields, filters, and playlists. This is unfortunate, because our ability to meaningfully engage with something depends first and foremost on whether we can find it at all.
Many modern businesses now treat data as a valuable asset, even if they are still figuring out what exactly they might want to do with it. This is perhaps an appealing new way to view a music industry that has found its income decimated by copyright infringement; audience-focused data metrics can be used to define success in ways that differ from the dreary traditional sales figures. Unfortunately, it is also a tricky proposition technically, because although the way audio is encoded digitally is great for reproducing sounds, after the drums and guitars get all mashed together it becomes essentially impossible to "search" through the audio in the traditional sense, the way we might extract text passages from ebooks.
The word "metadata" requires the presence of another entity to which the data at hand refers, and increasingly, it now points toward the listener instead of toward whatever they're listening to.
Important metadata about the music thus must usually exist outside the audio stream itself, as distinct text fields, making it easily readable by computers without even requiring any interaction with the audio content. In other words, metadata is not an intrinsic feature of digital audio, and as a result it was always bundled into the file formats used as wrappers when storing them. Now, however, the two are now quietly being decoupled by streaming services.
Music libraries transitioning to online services is a perfectly natural evolution. Similar changes have already happened to movies with Netflix, and even to messaging before that through webmail services like Gmail. But with audio files, the shift to the cloud has already undermined a huge trove of data that had previously existed offline, embedded in the files. It's no longer just about attributes of the songs: The word "metadata" requires the presence of another entity to which the data at hand refers, and increasingly, it now points toward the listener instead of toward whatever they're listening to. Maybe this was inevitable, since the songs themselves don't have any autonomous purchasing power.
Last summer, Spotify unexpectedly ignited a small firestorm after it announced that its mobile app would soon start making requests for new kinds of user data, like tapping into a smartphone's GPS or searching through its contacts and photographs. The proposed uses for the extra information were all relatively mundane—GPS-related workout features for runners, sharing favorite songs with your friends, customizing playlist views with personal photographs—but nonetheless, Spotify CEO Daniel Ek quickly posted a blog entry titled "SORRY."
"We should have done a better job in communicating what these policies mean and how any information you choose to share will—and will not—be used," Ek wrote.
But in theory, one of the purposes of a subscription fee is to isolate users from this sort of data mining: If a company can establish its financial solvency through direct payments, then it won't be forced to commodify the private information of freeloading users. The two strategies aren't mutually exclusive, however, and the tension between them is central to the ongoing efforts to convert music from a discrete product that can be pirated into an ongoing service.
Plenty of companies now have business models that rely on in-depth user analytics and profiling, but Spotify's newly apparent interest in this data is especially illuminating.
In early 2014, the company paid a reported $100 million to acquire a startup called the Echo Nest, originally a project of the MIT Media Lab, which presented a powerful mish-mash of music data capabilities. Flashiest among these was a way to algorithmically determine musical features like tempo and timbre by analyzing the audio, essentially allowing automated detection of metadata reflecting intrinsic structural qualities of the music. Just as crucial, though, was a subset of the functionality dubbed "Rosetta Stone," which allowed queries to use ID codes from other metadata sources and services, functionally resolving multiple data sets to create one giant data interchange. The Echo Nest was quickly absorbed into Spotify, but the results of its algorithmic audio analyses still aren't outwardly apparent to consumers anywhere in Spotify's product offering. The Echo Nest blog hasn't been updated since shortly after the acquisition, but the data service still operates independently, and has at times even been used by Spotify's competitors.
One of those competitors, Rdio, was purchased by internet radio behemoth Pandora last year for $75 million, and then eventually shut down after all the assorted intellectual property and data assets had changed hands. When asked about competitive advantages during the investor call immediately following the announcement of the Rdio acquisition, Pandora CEO Brian McAndrews specifically referred to Pandora's 60 billion "thumbs"—the company's idiosyncratic terminology for the preferences expressed by users as they click thumbs-up and thumbs-down icons. Rdio complemented the thumbs with compelling audio metadata, but it mostly just repurposed a third-party library, licensed for a hefty fee from Rovi, an entertainment and media data service.
Rovi's is a terrific, robust data set which includes factual statistics, short qualitative descriptions (rousing, martial, elegiac) and themes (open road, girls' night out, zeitgeist). They are managed by a staff of roughly 100 editors with specific genre specialties, and are accompanied by extensive custom-commissioned editorial content like album reviews and artist biographies to explain the nuances that can't be captured as pure data points.
And Rovi isn't alone. Apple Music and Spotify both license data from a company called Gracenote, which evolved from an early service that simply stored song titles for use by CD players with text displays. Gracenote's information, like Rovi's, is compelling and meticulous, including both well-researched factual details like the release date, and also subjective descriptive terms, which are in some cases calculated mechanically, and other times compiled by a trained staff, with any closely related terms numerically weighted to characterize the precise nature their overlaps.
Being that all this detailed music metadata is available for batch import from dedicated data services to which the streaming providers already subscribe, it's baffling that so little of it exposed to users. In theory, huge centralized media libraries should make it much easier to provide powerful and accurate metadata, because any given field can just be added, encoded, or corrected once, and all the downstream customers would benefit. But with slight variations, lackluster metadata is also the norm in all the streaming services.
The Spotify interface reveals only a handful of fields for each song: artist name, song title, album title, duration, track number, year of release, popularity, and cover art. Tidal, the beleaguered high-fidelity streaming service launched last year by a coalition of artists led by Jay-Z, usually displays between two and four fields in its "track information" panel for most songs; those fields are often near-duplicates like performer, composer, and lyricist, containing identical or nearly-identical values. This is even more bizarre considering that Tidal's audio file format of choice is FLAC, which has particularly advanced metadata capabilities baked in.
The Apple Music implementation is especially offensive: In the iTunes desktop application, the exact same metadata viewing window is used for both local files and streaming items, but in the latter case it is simply faded out and inaccessible to the user. For local files, however, it exposes more than thirty data points for each song, like "tempo" and "composer" (often different from "artist," especially with classical music), with additional user-customizable fields. The iTunes user interface has grown cluttered as the emphasis has shifted to promoting its various paid services, but for many years it was actually quietly formidable as a metadata tool, with every click optimized for efficient data management. The operational logic of the iPod was completely powered by audio file tags for its first dozen years; in their absence you'd get only playlists consisting of baffling empty grids, track numbers, and the phrase "Unknown Artist." Metadata tags were the local media equivalent of Google's indexing of the internet.
Rovi mostly supplies a concrete data library, and it's their clients who are doing the valuable user-level aggregations about listening patterns, tastes, and behavior – Rovi never had access to Rdio's listening logs, for example. Nonetheless, one of the requests from Rovi clients which has been increasing in popularity most rapidly over the past few years has been "the need for more 'dynamic metadata'–to make music more contextual, searchable, and relevant to the individual," according to Kathy Weidman, Senior Vice President and General Manager for Metadata. "Dynamic" is a euphemism here: Weidman later characterized it as "relevant very specifically, based on previous searches of the individual."
So this is an unfortunate combination: Users can no longer access metadata about the music, and yet that same music metadata is then being used to generate metadata about the users! Maybe it's a good idea on the product side, in that printing too much data on the screen might scare away more casual users. The proportion of listeners who utilized the most advanced metadata capabilities of local media files was always minuscule, so maybe it makes sense to abstract it away.
On the other hand, if the metadata pop-up window is there in the first place, it probably shouldn't be empty, right? Either way, users who make specific requests probably aren't as valuable as users who have been conditioned to trust that the service is smart enough deliver what they want. A customer who can request an upbeat bluegrass song and then find one makes for a less compelling business proposition than a product that can intuit the desire.
If you lose track of something you love because you can't query your way back to it, did you ever really love it in the first place?
This brings us back to the Echo Nest, which finally popped up again in early March with a new Spotify feature. "Fresh Finds" complements the personalized recommendations of "Discover Weekly" by adding five new genre-specific playlists. These are built by first selecting rising new acts by crawling the internet outside Spotify, identifying thousands of anonymous "tastemakers" among the Spotify users who are already listening to those artists, and then analyzing the other things they are listening to. It's a novel approach to music discovery that obviously wouldn't be possible without Spotify's proprietary data about listener behavior, but there's also an angle from which this starts to look like a dystopian science-fiction worst-case scenario for art: brains in vats unaware they're being networked to create logic systems which dictate what everyone else should like.
Technology is always inherently confusing. This is, in fact, one of the very things that makes us consider it technology, rather than mundane machinery. As such, our ability to comprehend the bleeding edge requires familiar reference points. The years conventionally considered to include the collapse of the record industry have also been accompanied by a huge increase in the breadth of easily-accessible music: indie rock bands, home recordings, a generation of critical favorites who can't afford health insurance. Audio metadata alone won't save the music industry, but it could make this newfound chaos comprehensible again if the music industry would stop studying its customers long enough to empower them.
Music with metadata is a fundamentally different product from music without metadata, roughly comparable to the difference between cans of food with and without paper labels. This is about more than audio, though. Increasingly, we are keeping bits of ourselves in artifacts stored on servers—thoughts, messages, plans, art, memories—and our access to our digital metadata will then guide the ways in which we interact with the original entities, whether they're songs, photos, letters, or whatever else gets digitized next. If you lose track of something you love because you can't query your way back to it, did you ever really love it in the first place?
That's a tough question currently, but perhaps the data models will one day become sophisticated enough to tell you. If you're really lucky, maybe you'll even be a part of the algorithm yourself.