Amazon Needs Some Catalogers

Usually I like Amazon. They do a lot of things well. Delivery is fast. Customer service is usually good. I save a lot of money in shipping with the Prime account, and the Prime video  is a good supplement to Netflix. Plus, I don’t have a neighborhood bookstore for them to drive out of business, so I don’t have to feel guilty about that, either. However, considering that they started out in the book-selling business, and have been pretty good at it by all accounts, you’d think they would make it easier to find the exact book you want when you’re looking for it. Amazon sometimes has the devil of a time distinguishing between both different expressions and manifestations of the same work, especially of translations.

Here’s an example. After reading A Guide to the Good Life: the Ancient Art of Stoic Joy (definitely recommended) I wanted to read Epictetus’ Discourses, and the Stanford Encyclopedia of Philosophy entry on Epictetus recommends the Robin Hard translation from Everyman’s Library, i.e., this translation. So far, so good. I considered a library copy, but my library doesn’t own that translation (since ordered). But I wanted my own copy and it’s inexpensive. However, Amazon says it’s temporarily out of stock. Okay, fine, I could wait, except I don’t need to because supposedly it’s available in “other formats,” including hardcover. That’s great. I love Everyman’s Library hardbacks because they’re well made.

So I click on over to the used hardcovers, relishing the first sale doctrine and the money it’s about to save me. Had I not been paying attention, weirdbooks would have been a few dollars richer because they advertise the lowest-priced “used–very good” copy and that’s what I usually buy. Fortunately, I glanced at the picture of the book at the top of the page, and knew that whatever that green book was, it wasn’t an Everyman’s Library edition. The title says “Heritage Press” edition. If collecting old translations of classics hardbound in slipcovers is your thing, then the Heritage Press is the publisher for you. Truth be told, that green volume would probably go well with the sofa in my den, so it was tempting. Regardless, I knew at a glance that it couldn’t be the translation from 1995.

From that page, you can click on “return to product information.” I clicked on it, but returned nowhere. Instead I was taken to the product information for for the Heritage Press edition, which lists the translator as P.E. Matheson. Unless Robin Hard was using a pseudonym, or unless P.E. Matheson also translates under his porn-star name, those are probably not the same people, and thus not the same translations. And it gets worse! One of the reviews on the page of the Heritage Edition reads: “I read A. A. Long’s, “Epictetus: A Stoic and Socratic Guide to Life” (2002, also rated five stars). Long wrote that the best translation was by Robin Hard (this edition).” But obviously we’re not on the page for that edition. There are a couple of reviewers skewering the translation and copy editing, but that’s true on the Everyman’s Library edition as well, because the reviews are identical. The same reviews are also on this edition, which is obviously a public domain reprint with no translator even listed. But they’re missing from this edition, this edition, this edition, this edition, and this edition, despite them all bearing the same generic title The Discourses of Epictetus. What gives?

I’ve found the same thing throughout the Kindle store as well, especially because for just about any classic work there are several “publishers” hoping to make a few bucks by copying text from Project Gutenberg, converting it to a .mobi file, and uploading it to Amazon, and the results are all lumped together. All of which leads me to conclude that Amazon either needs to improve their algorithms or hire some catalogers. I’d go with the latter, because technology can only take us so far without some human intervention.

6 thoughts on “Amazon Needs Some Catalogers

  1. I find unless a book has an ISBN number it is nearly impossible to track down. Older stuff you are looking for used can be hit-or-miss on whether you actually get a right edition of something. Amazon probably makes so little on antique stuff like that. They are very sloppy with records, and I don’t like that some of the user reviews are the same, whether the book is audio, print, or ebook. Formats do make a difference (especially when it’s a narrarator).

  2. The unfortunate thing is I know that some people, and not just novice students, use Amazon to find books for research, and the mishmash of records could be confusing. Not that 6 records for the same book in WorldCat isn’t confusing as well, which is how many exist for the Everyman Epictetus.

  3. There are a couple of things you haven’t considered. The first problem is that Amazon relies largely on data provided to it by vendors, merchants, and, yes, library catalogs. These data are riddled with errors. Don’t comfort yourself by claiming that the libraries are getting it right: often, their data is as bad as anyone else’s. I have seen it firsthand.

    Secondly: think about the Amazon catalog. Think about how big it is. How many millions of items. By now, it may be in the billions. All that data, loaded in, and presented to you, the customer. Now, think about your cataloging department. How big is the cataloging backlog? A hundred books? A thousand? What’s the latency between when a new book arrives and when it’s on the shelf? And what’s standing in the way? Original cataloging, or picking a record from the LOC and copying it into your catalog?

    The latency at Amazon is generally a few minutes. That’s the time it takes for the item record to be read by a server, compared to the items in the catalog, and reconciled with any existing data. It’s reviewed periodically after that both by people and by automated systems.

    What I’m trying to say is that there are very few operations that are capable of organizing as much data as quickly as Amazon can. I would argue that few if any libraries can do it as well on that scale.

    Now, you might say that that’s not what you want – that you’d prefer to have your own data carefully looked over. But that means that every niche needs to be filled by someone who can carefully shepherd each book through. I would argue that that’s not going to happen in every case, and there will still be a “long tail” of books that have bad data, or are simply unavailable. Is that better? I would argue that the answer is clearly no.

    So, you’re right. Any big catalog is full of mistakes. There are tons. Congratulations, you found one. Treat yourself to a cookie. But don’t assume you can do it better without thinking carefully about what that really means.

    • Oh man, I’m a big mean grouch. I owe you an apology for swinging hard when a civil counterargument would have sufficed. Having seen the issue from both sides, I can attest that the problems are difficult to address no matter how you go about it. Computers algorithms rely on heuristics and statistical methods that do their best, but are still often (and often hilariously) wrong. People, on the other hand, can resolve ambiguities and put care and love into solving problems that no machine ever will (or so I hope!). But people can never move as fast or operate at such a high precision.

      You have my apology! And I hope you really do get to have a cookie.

  4. Well, you did come on just a little strong, but no hard feelings. I was just messing with you in my last comment. You’re definitely right, and I knew that going in. I sometimes enjoy poking at the margins of giant services like Google or Amazon to show that they’re not always better than human intervention. The kind of thing I wrote about here I’ve encountered quite a bit in Amazon. It’s just that this time I was amused at just how clumsy it was and felt like doing a little writing. I almost compared–and in retrospect I should have–this to WorldCat’s results, where instead of two books being conflated because of some algorithmic mischief, there are five or so records for the Everyman’s edition of Epictetus, creating the opposite problem. And that’s a system that relies much more heavily on human intervention. The precision of the records and the human cataloging means that usually subject headings or ToC aren’t misattributed to another book, but it can be confusing, and I’ve done searches with students where we find dozens of records and it turns out there are only about three books. At this point systems like Amazon are so big it’s a wonder they work so well most of the time.

Comments are closed.