By Thomas Bonk
On August 1, 1981, the very first music video that aired on the new, smash-hit channel MTV was Video Killed the Radio Star. eDiscovery practitioners often discuss that the rise of AI portends the same demise of the practice of using keyword search as The Buggles did of radio-only musicians. However, despite persistent, long-lasting rumors of keyword obsolescence, this practice continues as a standard approach utilized by legal teams to cull non-responsive content and identify a narrowed set of relevant data.
To help understand the staying power of using keyword search, I interviewed Kate O’Brien, Managing Director and search expert at Prism Litigation Technology. Kate’s decades of experience working with legal teams to design a comprehensive, narrative approach to search provided a wealth of information on how the practice can maximize efficiency and create a defensible data minimization strategy.
Q&A with Kate O’Brien
Q. Tell me about your unique background in Information and Library Sciences, and how you’ve applied those skills working with legal teams to create search strategies for eDiscovery matters.
A. I have a Master’s degree in Library and Information Science from Syracuse University, where I started my career as a Research Librarian and spent many years working for large consulting firms as a researcher. Through this experience, I gained a deep understanding around the nuances of identifying important information, weeding it away from the noise and locating it within a massive amount of data. This expertise translated easily to eDiscovery use cases. Even though my initial job in eDiscovery was with one of the first eDiscovery vendors to offer conceptual search algorithms, I’ve found that the more basic narrative analysis approach I’ve developed over the years serves our clients more effectively. The foundation of the methodology is based on information retrieval principles combined with linguistic analysis and data mining techniques, and provides not only better results, but greater control, transparency, and defensibility to case teams.
Q. In your experience, what are the key steps that legal teams should take to improve their outcomes when executing a search strategy?
A. A good search strategy requires a thoughtful approach. Throwing a bunch of keywords at the data set to see what sticks just doesn’t cut it. Here’s a few key pointers to planning an effective strategy:
- It’s important to start with an understanding of the end goal of the exercise. Search strategies for production are likely to be different than strategies used for examining an incoming production, identifying privileged information, or preparing for a deposition.
- Get a full understanding of the claims and defenses by reading the case materials and talking to the lead attorney. Understanding their narrative and the story they want to build with the documents is critical.
- Address each issue of the case with at least one search. It may be multiple searches, but this ensures thoroughness.
- Understand the language of the organization. Every company develops an internal nomenclature to refer to certain processes, events, or products — they have their own internal jargon and frequently use acronyms, for example. You need to fully understand how unique language is used. Interview individuals in the enterprise to expand your knowledge of the language of the organization and the data that you are searching.
Q. What are some common gotchas that you address when refining a search term list?
A. Not everyone can be a search expert. I’ve been doing this for 35+ years and have been involved in building search tools, so I’ve seen it all. Typical problems that I observe include:
- The use of wildcards. If overused, wildcards can result in unrelated documents; if underutilized, they can cause you to miss capturing relevant documents.
- Misuse of proximity operators. In determining the proximity to use, think about reading the narrative in the documents, what is the context and relationship of the terms you are searching. For instance, I rarely use the AND operator unless it’s combining two very specific phrases.
- Lack of adjustments for data types. Search syntax for emails should be different than search syntax for texts or chats from a mobile. Again, think about the narratives in the document, how they read, and adjust accordingly.
- Lack of understanding and adjustment for stop words. Depending on the search engine being used, some common terms are stop words, meaning they are not indexed by the search engine, and therefore, won’t show up in a search. In a recent case, the team wanted to search for “Made in America,” but the terms “made” and “in” are stop words in the review platform, so that search returned only documents with the word “America.” Most eDiscovery tools allow for an adjustment to the underlying index to add and revise the stop word listing, but most users are unaware of this.
Q. How does the selected technology platform, whether it’s commonly used eDiscovery applications like Relativity or enterprise systems like Microsoft Purview, impact the results of a keyword search?
A. All search engines are created differently, so it requires some level of understanding not only of their proper search syntax and order of operation, but as mentioned above, knowledge of their internal configurations that may impact stop words and even how they count tokens (the words) which can impact the proximity you use when combining phrases.
Interviewer note: A recent Prism blog post discusses the drawbacks of Microsoft Purview, as highlighted by Phil Favro in his decision as appointed Special Master in Deal Genius LLC v. O2Cool LLC.
Q. Finally, with the emergence of applications using AI and other sophisticated search technology, is the use of keyword search for eDiscovery likely to be diminished or possibly displaced entirely?
A. My first assignment in the legal industry was working for a provider that formally introduced concept search to eDiscovery and document review. At the time of its release, the technology was new and novel, but the black box aspect of it concerned attorneys, and rightfully so. Since that time, simple concept search has grown and expanded with the introduction of TAR and CAL and all sorts of tools using some form of advanced analytics and AI. These are great tools to help case teams organize large volumes of data, meet accelerated production deadlines, and prioritize tranches of related data.
Personally, as a search expert, I think the idea that the surgical precision and transparency that can be obtained with a well-thought-out keyword search strategy will be replaced by AI is short-sighted and unlikely. These are tools that can and should be used in conjunction with each other. Each tool has its place as part of a comprehensive approach, depending on the goal and objectives of the case team. I don’t think a thoughtfully constructed keyword search that gives reliable and repeatable results will ever become obsolete.
Interviewer note: For more information on Prism Litigation Technology’s unique approach to search, please read the white paper, Don’t Stop Believin’; The Staying Power of Search Term Optimization.