Culling and Filtering

cullandfilterLitgisix provides support for law firms by providing eDiscovery solutions to help reduce costs associated with collection, processing, review and productions of electronically storeed information (ESI). One way to reduce costs in review of ESI is culling down the data set prior to attorney review. Litgistix works with law firms to set up criteria that can reduce costs of eDiscovery downstream by setting aside irrelevant documents.

At the earliest stages of eDiscorvery, data culling begins by identifying sources of data, key custodians, key date ranges, and a plan for managing the eDiscovery process. Litgistix uses industry standard tools and techniques to assist in culling down data to a manageable size. Some common techniques in data culling include the de-nisting, deduplication, date/metadata filtering, and keyword searching.

De-nisting is a fairly new concept and refers to the removal of operating system files, program files, and other non-user created data. The National Institute of Standards and Technology is a federal agency that administers a list of know software signatures, and our processing software has the ability to use this list with the data set to screen them against the known software signatures.

Deduplication is the process of segregating exact duplicates in a data set. Using the has value (MD5 and/or SHA-1) that is created when the electronic data is processed and indexed, Litgistix can eliminate duplicate messages in email or duplicate copies of electronic data. Deduplication techniques can differ from case to case. For instance, it can be focused on a lone custodian’s data or across a complete data set. The methodology and scope of applying deduplication is specific to each project and ought to be measured in the context of the ultimate goal of the review team.

Metadata filtering and/or date filtering are common data culling techniques. For email, this includes searching for names of interest, email addresses, domain names, or key words across the Sender, Recipient, To, From, Cc, and Bcc fields. Litgistix can also run searches across key index fields such as subject, email body, and attachments to narrow the scope of review. Date filtering of emails can be applied before, after, or between the date last modified, date sent, or date received. Searching can be done on both email and edata. Metadata and data content can be indexed in advance of searching. Searching can be completed using many diverse parameters, which may be utilized alone or in combination with one another:

  • Keyword searching
  • Phrase searching
  • Boolean searching which allows you to combine words and phrases using the words AND, OR, NOT and NEAR (otherwise known as Boolean operators) to limit, widen, or
  • otherwise define your search
  • Proximity searching finds a word of phrase with “x” words of another word or phrase, for example: blueberry pie w/100 carrot cake
  • Directed proximity searching finds a word or phrase “x” words before/after another word or phrase: blueberry pie pre/100 carrot cake
  • Phonic searching finds words that sound alike, like “Smythe” in a search for “Smith”
  • Stemming finds variations on endings like “apples”, “applied”, “applying” in a search for “apply”
  • Numeric searching finds any number between two numbers, such as between 5 and 95
  • Wildcard searching support allows a question mark (“?”) to hold a single letter place, and an asterisk (“*”) to hold multiple letter places: “apple*” and not “appl?sauce”
  • Fuzzy searching lining your review using tools and techniques discussed in this article will greatly reduce costs associated with eDiscovery in your case.

Litgistix continues to lead as a trusted advisor to law firms in Oklahoma. Call Litgistix today to set up a meeting to discuss a document management strategy for your case.