AI Indexes, Accuracy, and Trust

Hello,

How accurate does an index need to be?

Repeating this out loud to myself, it sounds like a trick question. The value of an index is tied, in part, to its accuracy, right? Of course an index needs to be accurate.

Considered another way, I don’t expect a human written index to be 100% accurate. Perhaps instead of “accurate,” better to say “correct.” Humans are fallible, including myself. I know, or at least suspect, that errors make their way into the indexes that I write. Clients will occasionally point out the odd misspelling or omission that they spot during their review. For which I apologize and gladly correct.

But I still strive for the indexes I submit to be as accurate as possible. I believe that my editing process is thorough and I am confident that I catch most errors before indexes ship. So is the accuracy of my indexes good enough?

I have a theory that if a reader finds more than 3-4 errors or barriers in an index, then the reader will lose trust in the index and will stop using it. I don’t actually know if this is true or not; I haven’t scientifically tested it. And, a user may simply be unlucky enough to hit upon the handful of errors that do exist, even if the index overall is quite good. But I think there is something true about people not using tools if they lack trust in the tool’s usefulness. An index needs to prove itself to be trustworthy, and part of that is being accurate.

Before I go further, what do I mean by accuracy? A subject index, after all, has an inherently subjective aspect in the decisions about what to pick up, how to phrase headings and subheadings, how to structure the index, and how best to fit the space available. How can accuracy even be measured?

To start, as a baseline, locators need to match the text. For a subject index, the the broad sweep of the book needs to be recognizable in the index. How exactly the contents of the book are translated into the index will vary from indexer to indexer, and also depend on constraints such as space, but generally speaking, the reader should have an accurate sense of the book from the index. For other types of indexes, such as a name or scripture index, the general rule is that every mention is picked up, and so in that context, accuracy also encompasses thoroughness.

If a reader can trust that an index is accurate in these senses, then I believe they are more likely to use the index. If a reader loses trust because they are finding errors or omissions, my theory is that they will stop using the index and the index is now wasted words on the page.

I’ve been thinking about all of this because I was contacted last week by a new company, IndexLabs, which is developing AI indexing tools. IndexLabs has already released a tool specifically for scripture indexes and appear to also be developing a tool for subject indexes.

For scripture indexes, IndexLabs claims an accuracy rate of 99.1%. I don’t know if that number is based on a single index (the John Piper index they showcase on their website) or if that number is reflective of multiple tests. I’ve asked and haven’t yet received a reply. But it sounds impressive. 99.1% accuracy compared to the original, human written index. One of the criticisms of AI indexing tools, from Elizabeth Bartmess and others who have tested such tools, is that these tools are neither accurate nor thorough. Has IndexLabs managed to crack that problem?

My concern, though, is that an accuracy rate also implies an error rate. For a short index, with about 300 scripture references, a 0.9% error rate would lead to about 2-3 incorrect entries. For 600 references, about 5-6 incorrect entries. For 1500 references, about 13-14 incorrect entries. And so on, as the number of scripture references increases.

Personally, I am not comfortable with there being 13-14 incorrect entries, or more, even for a large book and index. I would expect myself to do better. If I subcontracted, I would expect the subcontractor to do better. Not a 100%, given that we are human, but better.

Humans have the potential to recognize their mistakes and to build in ways to double check and correct. When I write a scripture index, or any index, I frequently stop—a few seconds at a time—to confirm that locators and headings are accurate. So far as I know, AI finds and decides based on an algorithm and probability. It does not review its own work.

If an AI tool has a discernible error rate, then how much error is acceptable? Am I supposed to still approve the work? Is the author and publisher supposed to accept that a certain percentage of the index is likely inaccurate? Or, do I try to edit the output to find and correct errors? Editing someone else’s index can be very tedious and time-consuming, especially if I don’t trust the work and feel compelled to check every entry. So editing the work could negate the time-savings.

My purpose in writing this isn’t to beat on IndexLabs. To be clear, I have not tried their tool and I have no opinion on how accurate or useful the tool actually is. I also acknowledge that I have, in the past, expressed interest in an AI tool to pick up specific, easily identifiable entries, such as scripture references, that can be time-consuming to pick up. I think that is why IndexLabs contacted me. While I remain skeptical about AI’s ability to index, and I believe that any AI tool should be thoroughly tested, I also think it is important to keep a somewhat open mind and to be aware of new developments.

I am concerned, though, when the completeness or readiness for publication of an index is framed in terms of its accuracy. On the one hand, indexes absolutely need to be accurate. On the other hand, if accuracy becomes something that can be easily measured and quantified, does that normalize a certain amount of error?

Being human, I know that no index is perfect, including the indexes I write. I’ve made my peace with that. I also still think that humans are best positioned to write accurate, truth-worthy indexes. Humans best understand the end user and the end goal of the index, and humans can learn how to find their own errors and to course correct. I don’t want to live in a world or work in an industry in which error and substandard quality is accepted simply because the tool is cheap, convenient, and fast. We can do better, as a society and industry. Humans can do better.

Yours in indexing,

Stephen

Stephen Ullstrom

AI Indexes, Accuracy, and Trust

Embracing the Subjective Side of Indexing

Q&A: Biggest Bottlenecks Starting as an Indexer? (+ Book Sale!)

Q&A: Dealing with Lulls Between Projects