Ad Clicks : Ad Views : Ad Clicks : Ad Views : Ad Clicks : Ad Views :
NEWS SMART

Here you will find everything about smart and technology

Speccing the Void: Adventures in the Audiobooks Abyss – Tech Forum 2019

/
/
/
109 Views

– [Host] Our next speaker, Wendy Reid Wendy is a senior QA analyst at Rakuten Kobo Inc

, and has spent the last few years on the other side of the EPUB mysteries and reading system technology She is currently one of the co-chairs of the Publishing Working Group of the W3C and leads the Audiobook TaskForce In her abundant spare time, she likes to learn about new technologies and read as many ebooks as she can Welcome, Wendy – [Wendy] Thank you so much for the amazing introduction

Today, I'm going to be talking about the process of creating specifications where there aren't any in the first place, and then why I'm personally doing this myself If anyone was here yesterday, you probably saw Dave Cramer giving his amazing rant about the state of EPUB and digital publications I'm really hoping 20 years from now, Dave gives that same rant about this talk So, goals I already was partially introduced

I am a senior QA analyst on the product quality and technology services team at Kobo now I love to read I love to listen to audiobooks And I'm just gonna introduce, "Why audiobooks?" Why is this my passion, and why am I trying to change the way that probably some of you work on a day to day basis? Firstly, my interest is personal This image is actually from the fall of 2017

This t-shirt is also from the fall of 2017 Kobo launched Audiobooks, I believe it was the first week of September That time was a blur for me A lot of late nights And nothing says a celebration like giant balloon letters of the thing that you're celebrating

But the story starts several months earlier Software does not get released overnight So I spent about six months I was one of the iOS testers on the iOS development team, and it was 24 hours a day Audiobooks And as the project started, you know, I'm a QA

I log bugs That's my job And so a lot of the testing was finding bugs in the software, and most issues were software related Buttons in the wrong place, the player wouldn't open all the time, lots of things would happen But that was to do with development

That was bad code That was missing implementations, a bad API But then as the project kind of continued, and we were testing, and we were doing betas and getting feedback from some of our users, the bugs were more likely to be related to content, and it was a big struggle for us There's a problem with content And there's a lack of a standard

There was, you knowit was weird because I'm an EPUB person originally I'm one of the people in the company that people go to when there's a book problem

And so when I have a book problem, I have things I can go to to find out what that book problem is I can check the EPUB spec I can check HTML, CSS, the OPF There's so many things that I can look to as a resource for when something goes wrong Audiobooks doesn't have any of that

I can maybe check that an MP3 is valid, but that doesn't take me very far when an investigation happens And each company, as we discovered very quickly, we work with three different distributors, they all had their own way of processing files They had their own way of creating manifests We didn't know what the files looked like before they arrived to us, or even to them And we learned of insane cases where people were manually creating tables of contents by sitting in a room with a pair of headphones and a text document, and noting the exact timestamps of Chapter 1, 23 minutes and 52 seconds long

Crazy And bad process equals bad data And when we had bad data, we would get app issues And the side effects of those issues would be things like chapter skipping, because the processor doesn't understand a slight differentiation in the length of a file That's a human error

Computers don't understand human error And so we struggled with this for a long time Nevertheless, it got shipped Audiobooks launched Our users are very happy

I still see lots of reports on a daily basis about people who love Audiobooks, people who struggle with Audiobooks And we tackle issues as they come in No software is perfect We still have problems with the ebook renderer We know how to handle these things

But my main thought the entire time was, "There's gotta be a better way to do this There's got to be something we can do" But the project was over I moved, actually, to a different team shortly after the launch of Audiobooks And so it wasn't quite on my mind anymore

When I switched teams, I still wanted to be involved in content I had actually moved to doing strategy work, and so I wasn't really doing the rendering stuff anymore I missed it I really wanted to stay in it So I had a colleague that joined the Publishing Working Group

And he suggested, "You like EPUB You should join the Publishing Working Group too" And so later in 2017, I joined and felt very cool, like the characters in Clueless And Publishing Working Group was kind of nebulous to me They were talking about this thing called Web Publications

It might become EPUB 4 one day, which was like where I was looking And there was some talk of audiobooks Nothing solid, but maybe we'll talk about it So to describe the Publishing Working Group, we're focused on the future of publishing The current problems that we face, things like audiobooks, things like, "How do we publish on the web? How do we stay current?," those are the drivers for what we are looking to do in the future

The tricky part about doing specifications work is that you're never lookingwhen you work in an agile environment like I do, I'm looking a month ahead, three months ahead When you're working in specs, you're looking one year ahead, two years ahead, five years ahead

And that's just hoping that your thing, someone adopts your thing, which is sometimes hard to kind of wrap your head around But as we started talking in the group about things that we could do to make a really big impact with our specification, audiobooks kind of kept coming up We were like, "Well, there's no spec for audiobooks" Web Publications was challenging because we already have EPUB, but audiobooks doesn't have anything So in May, 2018, we formed the Audiobooks TaskForce

It was a very small group There's probably about 10 of us that were very passionate people to just sit down and talk about the use cases and things that affect audiobooks that are different from what affects publications like journal articles or books We examined the challenges, the opportunities, and the unique characteristics of this area So I'm gonna show you something scary This is the current state of the union, massively simplified

As far as I can tell in my discussions so far with publishers, this is normal, this feels normal, "Our pipeline works What's the problem?" Books get to users every day Audiobooks are an ever increasing percentage of the industry What can possibly be wrong with this? Publishers send their content to distributors Distributors do whatever they need to get it ready for retailers or their own personal apps, and retailers intake the content

Again, maybe they do some stuff to it to make it work for them, and then readers get it But let's look at it step by step, because this is terrifying So here's the basic A publisher has decided to produce an audio version of Jane Eyre, my favourite book In a basic audiobook, there's about four or five things, depending on the way you build files

Audio files, depends on the format Everyone uses different ones You might have lossless FLAC files, MP3s, MP4s, WAV files There's a lot of variety out there You have a cover usually, some sort of artwork

Metadata, ONIX or, you know, even just the basics, title, author, narrator A manifest file potentially, or just a list of content, what's in the file And as an extra bonus you might have supplemental content, so things like PDFs or images that accompany your book So there we have Jane Eyre And Jane Eyre needs to be sent to the distributor

We want to start selling this new audio copy of Jane Eyre So distributor A has their requirements for the publisher They've said, "We will take the following file formats So audio files need to be MP3s The cover should be a JPG

ONIX is our metadata standard And the manifest, we'll take it as an Excel file, but deliver it separately as an email Don't just ship it" But distributor B is a little bit different Distributor B wants WAV files for audio, and they want a JPG cover, pretty standard

They want ONIX metadata too, but they want a JSON manifest in the distributor's preferred format But the publishers, they're good at that Work flow is something that they're good at, so they get it out there And then distributor A has to send it to retailer, and now the audio is now being delivered via hosted URLs on Amazon Web Services or something And that Excel manifest that they received by email has now been converted into JSON, and is being delivered by that same API that delivers the files

And likely, it's in a format now that is very different from what the publisher sent It's likely a hybrid of what the retailer and the distributor have agreed upon as a, like, as a standard And the files and the URLs are still delivered separately in different feeds One feed sends the audio files Another feed sends the metadata

You know, there's a lot of places where this can go really wrong really quickly And then we close the circle with distributor B Similar thing, JSON, ONIX, JPGs and MP3s And at every step of this way, there's potential for things to change File formats can change

A distributor might convert those WAV files to MP3 They might take your Excel file and turn it into a JSON manifest Structure changes There might be a different expectation on, you know, "Do we include the bitrate for the retailer? Do we not include the bitrate for the retailer?" And this work flow from publisher all the way to reader is not smooth, and there's a lot of change And changes in the process potentially cause places where things can fail

And the person most likely to experience that failure is the reader, not the retail company They'll probably catch it in their feed So when I found out about all of this, this is how I felt, a little bit like Bill Nye with, you know, mind blown So what are we gonna do about it? We're gonna write a specification The first step of this process sounds the most insane, "What is an audiobook?" How do you define the thing that you're trying to specify? If you've ever read a specification, they are incredibly precise and they sound a little bit pedantic

And there's definitions for words that you don't think there probably needs to be definitions for, like the word "should" That gets defined very clearly in the beginning of a spec But when you're doing specification work, you have to be precise Because otherwise, when a technological implementer comes in, if there's any vagueness in your specification, what is that implementer supposed to do with it? So when we talk about implementation, we have to define what we mean by an audiobook So what is an audiobook? I spent the first hour of the Audiobooks TaskForce just talking about what an audiobook was

And these are some of the quotes, you know, "Agree that it is important to define exactly what we mean by audiobook Is it a pure audio-only book, or read along?" That was actually a very long discussion "Audio-only is somewhat misleading, since we really mean audio plus various other metadata that comes with Web Publications And I think we should be careful not to place too much emphasis on simple audiobooks, as even simple audiobooks can have some complexity" You can imagine how that went

But by the end, we came to a simple definition for audiobook We defined an audiobook as a collection of audio resources with a defined reading order, and additional resources like a cover or supplemental content that will be defined in a manifest That is an audiobook, plain and simple So the second step is to define the use cases We gathered use cases from our own experiences

Most of us on the Audiobooks TaskForce are either publishers or distributors, like Kobo, Google, etc And so we have our own personal experiences, and most of us are pretty avid audiobooks users, so we know what we would like to do And use cases are meant to be descriptive and very specific So reading a use case document can be a little bit repetitive But the expectation is that the spec should support them

So if we have decided that a use case is something that we want to address, the spec needs to facilitate that And user agents like a web browser or reading system that are seeking to implement our spec should be able to implement them using the framework that we've created So a sample use case from the Web Publications document about audiobooks is, you know, "Wendy is listening to the The Iliad on her walk to work The audiobook is 14 hours long, but her walk only takes 30 minutes She would like to stop the playback when she arrives, and continue where she left off on her trip home

" That sounds pretty simple, but how would you translate that into spec language? For us, that means that whatever format we create should be bookmarkable A user agent should be able to say that this point is a saved point, and the next time we start a session, that's the point that we work from And we should not do anything in our spec to prevent that, which is surprisingly difficult sometimes So what goes into a spec? What will the spec cover? Our spec will cover a manifest, how to create a manifest file, cover images, audio files, a table of contents, supplemental content, and packaging Deciding on that list is kind of difficult sometimes

How do you define the bounds of what you're gonna cover? Because when people are throwing ideas around, you can get really crazy You can spend a lot of time thinking, "Oh, what if we did this?" We don't want to stifle innovation, but we have to be mindful that we don't really know what's gonna happen five years from now So be flexible, but also be strict It can be a bit challenging We want to create a single, flexible manifest format, a single way to express the core metadata within an audiobook, a single reading order

Metadata like, "Who is the author? Who's the narrator? What is the title? What is the duration of this file?" A single way to express this so that regardless of where this file gets sent, whatever thing that is processing it, be it distributor's CMS system or a retailer's app, it'll know that duration is the length of the book, the narrator is the narrator Because Audiobooks is built off of the Web Publications framework, we have to consider how Audiobooks will work on the web platform Our specification's goal is implementation We're always looking to you know It's great to produce a document, but if no one actually uses it, what's the point? So we need to make sure that user agents' browsers and reading systems will understand the format natively, which means that they can more or less open it without a lot of work, but also that they are gonna be able to implement it with their current standards We're also creating packaging, which is actually a whole other conversation Because we want to be able to take an audiobook and enjoy it not just on the internet, but also on a device So moving that file around in a tight, single package, unlike the way it is today

Because today it's often, as mentioned before, it's a collection of files loosely bound together by maybe a file naming convention But what if it can just be one file? What if like an EPUB, which is essentially just a collection of files in one little package? What if we had like an AudioPUB? What if you can send a publisher or consenting user an AudioPUB file for review? Or you can download your files from your retailer and choose where you listen to them These are all possible And packaging also means that the file will move as a single item through the entire flow So a publisher can send a single file to a retailer, as opposed to one feed with MP3s, another feed with the manifest file, another feed with an Excel file in an email

And everything's a little more controlled What won't the spec cover? It's almost as much about what you choose as what you don't choose, unfortunately And a common discussion we like to have is whether something is considered a best practice or something that we should cover in the spec So some decisions are really, really easy As a W3C working group, we are not allowed to talk about DRM

So I'm sorry if anyone has any questions for me after about DRM I can not answer them But other things are a little more complicated We actually had this discussion very recently about file naming conventions Should we tell content creators how to name their files in an audiobook? And it was, you know, it's kind of a fraught discussion

Because it's like, well, what if you say A sample file naming convention would be like ISBN-underscore-Author-underscore-Title That's a pretty common naming convention

I've seen it in a lot of audiobooks, or ebooks as well But what if a publisher doesn't use ISBNs? Does that fail the system? Does that mean that Kobo won't take your book? No, that's crazy That's not a reasonable thing to expect And we actually trust that there's a lot of trust in that knowing that publishers are really good at getting their files to places that they need to go Everyone wants to sell their books, so let's trust them to do the best for their content

And conversations can go like this forever But the main question that we always ask is, "Will something benefit from a specification? Would there be a net gain, a net positive as opposed to a net negative?" Another question is, "Is there already a standard?" We don't talk a lot about metadata aside from internal metadata, because we have ONIX What's the point of us trying to rewrite ONIX? ONIX is robust and widely used And the other question is, "The thing that we want to include, is it so big that it might need a spec of its very own?" And we have this discussion right now on Sync Media, because Sync Media is fairly closely entwined with audiobooks in terms of its requirements It's closely entwined with Web Publications, because it uses the same technology

But it's also really big It's very complicated Do we need to write a separate spec that just kind of works together with everything else? And we're currently having that conversation right now, so this might change, might be in the audiobook spec But that's the kind of nebulous fascinating thing about specifications Things change all the time, and you won't know until it's kind of done what decisions got made

So there's a process to this, and the process is long Right now, we're actually still at the very, very beginning At the end of this month, I will deliver the first working draft of the audiobook spec I'm very excited Sitting on my computer right now is a very, very rough looking HTML document

But we're working on the working draft And that means that once it is complete, it'll be published to the web People will be able to look at it, and there will be an actual real document that is open for review It means that anyone in the public can take a look at it, provide comment, and report issues if you're a big fan of GitHub, And our working group can modify it as needed It's very nebulous at this point

It can change Our Web Publications working draft has changed like probably 20 times in the last year And not little changes, not spelling corrections, like big, big, big changes So once that process is kind of through It takes a long time to get something to spec, by the way And side note, all W3C specifications are actually called recommendations, which I think is a really cute way to describe something Even though what they're really saying is, like, "HTML is a recommendation in the W3C" I think as we all know that is not a recommendation

Like, you have to do it But hopefully we will get there So once we think that the draft is in a really solid state, we tackled most of the major issues, and we've addressed most of the feedback, we move it to what's called a candidate recommendation And that means that we're opening it up to testing We're asking implementers, people like my company or, you know, Google, to attempt to implement our process

And then what we're asking from them is if you're attempting to implement, tell us what's going wrong Is there something in our spec that we didn't think about that is actually preventing implementation? Or is it causing undue amount of work that would potentially cause like a roadblock later down the line? And for us to go to a full recommendation, we need two successful implementations So we don't want to prevent this process, because without it we can't get to the point where we need to be And at any time in this process, we can go back to a draft We can go back that rough document that changes a lot

And we're always mindful that we need to make things as open and as public as possible in order to get to where we want to be And hopefully, when I get to go to full recommendation, it means that I get to go to publishers and distributors who are not already aware that we're doing this, even though I'm trying my best to make sure that most of them are, that this is a new and exciting way that they can produce amazing content for their retailers and for their readers This is what the goal is Jane Eyre gets packaged It gets sent to distributor A and distributor B

It gets sent as the same file to distributor A and distributor B And maybe distributor A and distributor B, they change things Maybe they change things Maybe they convert the files to MP3s, because it's a little bit easier and the publishers are sending lossless files Maybe they do some massaging of the manifest file to make sure it meets standards

Maybe they do some bug fixing But in general, the file itself should not wildly diverge from what the publisher originally sent And it means that publishers get to create, have pretty much full control over their content They write their manifest files They decide what goes in the TOC

There's no person in a room somewhere listening with their headphones, writing the TOC by hand They can say the timestamps are exact, the order is exact And it's a simple one-file system as opposed to sending an Excel file to one person and a JSON file to another Why are we doing this? Everything was working just fine, right? We're looking for uniformity, we're looking for consistency, and we're looking for accessibility One format for all audiobooks means that there's one format for publishers big, the Penguins and the Random Houses of the world, and small publishers

Maybe independent authors can start publishing their own AudioPUBs It's a non-proprietary format, which means we don't have to worry about whether somethingyou know, the Kindle universe, or the O-Kobo, or anything like that

It's the same file Everyone should be accepting the same file And it's a format that readers can interact with directly or consume via services like Kobo or Audible You can listen to it on your computer You can listen to it on your phone

You get the choice about where you want to listen to your content, much like you can do with EPUB today in certain ways And it's easily as distributable, so accessibility organizations can send files easily and know that they're trustworthy and know that they are contained in a way they their users will be able to enjoy and have full access to And it's simplification There's a complicated ecosystem out there And if everyone's working off the same template, it makes things a lot easier

We value the experience of all parts of the supply chain When we talk about specs, and when we do this work, we're not just talking about readers We're not just talking about publishers We're not just talking about the distributors It's very important to us that content creators are comfortable with what we're creating

We know we are generally asking them to do something new and scary But at the end of the day, it's always about the reader And sometimes it doesn't feel that way, especially when you're getting really deep in the weeds about a discussion about duration We spent two meetings talking about duration, because we couldn't decide what format we wanted to put it in There was a pretty strong discussion because we use a system called Schema

org And Schemaorg had defined duration for a different standard They'd use it on the audiobooks one but they defined it for something else, and they defined it for a résumé So the duration was the duration of employment

So a duration value would have the date and the time Does anyone remember, like, what time of day they started their last job? I don't know why time was in there But they were like, "Yeah, this is the standard for duration You know, so this day to this day, this time to this time" And you didn't technically have to use the date, but it would get kind of buggy if you did

So we looked at it and we were arguing and we were arguing We were like, "Guys, Schemaorg is a little bit wrong about this" And we actually ended up talking to the guys who run Schemaorg to say, "Hey, you know this is a really bad way to express duration

" And they're like, "Oh, you can just use seconds if you want" And it was like, "Okay, problem solved" But it doesn't always feel like that discussion would actually help a real person until you really drill down into the details So by making sure that duration is properly specified, at both a high level and a chapter level, a reading system can use that data and expect to receive it so that they can use it to provide chapter durations in the table of contents They can calculate download sizes for single chapters

There's many ways in that the little decisions that we make or the big decisions that we make can actually affect every part of the system And specs can feel Should have brought water

They can feel big They can feel scary They can feel complicated But that specificity and that precision that we apply at every decision that we make means that expectations are clear for publishers, they're clear for reading systems, and they're clear for the reader The challenges we face today feel really tough

I have really hard conversations with a lot of publishers about, "Oh, things are fine Why do you want to change stuff?" But I'd like to think that we're on the cusp of a new era I'd like to think that today, much like 20 years ago in EPUB, we're just kind of there And that 5 years from now, 10 years from now, 20 years from now, this is not gonna feel any different The talks that we're having will be Dave's rant about what we did wrong in Audiobooks 1

0 And maybe 20 years from now there will be an Audiobooks 20, so that's what I'm hoping for, and we can rant about that later So I want to thank everyone for listening Thank you so much

And I hope this was really helpful and informative for everyone, and inspires people to get involved We are always looking for feedback Contact me by email or Twitter I'll have the links in the presentation If you are someone who works in audiobooks, or works actually in podcasting, let me know because we are looking to make contacts in both of those industries, because we think

well, specifically for audiobooks, we want your feedback We want to know how we can potentially make your work flow easier And if you're in podcasting, we kind of think it might work for podcasts but we don't know, because we've not talked to anyone yet

And if you're really, really, really interested and you want to make an investment in time, and being really cool, and doing standards stuff, consider joining the Publishing Working Group for the W3C It is entirely worth it, and it is a lot of fun And thank you so much

Source: Youtube

This div height required for enabling the sticky sidebar