Question

Lack of OCR in Starter?

Forum|Forum|3 years ago
February 7, 2023
3 replies
429 views

+6

JoanCurbow
Known Participant

I’m saving our press releases that are emailed to staff and faculty with a link. The link takes you to a webpage where the release is posted. I can use Conifer to record the webpage and ingest it into Preservica, but for the best discoverability, it would be nice if these articles could be OCR’d. It doesn’t seem like that’s a feature in Starter, which I understand. How are other Starter users compensating for this?

My own methods, so far: If there’s a photo (which is most of the tme) is to make subject headings of the people identified. I had an article where the photo only named one person, but the article named seven or eight others. I decided to list the names in the Description field of my DC metadata. Don’t ask me why. I’m trying different approaches.

Am I missing something obvious or do you have better methods?

Harland Harris
New Participant
Forum|Forum|3 years ago
February 9, 2023

I don't have the answers but you, but this is a good question. Does Preservica Starter search live text (i.e., non-rasterized and non-vectored) embedded inside a pdf? If it does, you could use another tool like ABBYY reader to OCR the text before uploading to PS. Better, can you request the source file and convert into a searchable PDF/A prior upload?

Like

+6

alarge
Known Participant
Forum|Forum|3 years ago
March 14, 2023

I know that the Enterprise edition does have OCR capabilities. I wonder if there are any plans to implement this function in Starter/Starter Plus...

Like

+2

Aubrey Shanahan
Inspiring
Forum|Forum|3 years ago
March 16, 2023

Text based PDFs and word documents should be full-text searchable. Sometimes it takes a while (up to a few days) for this to be completed, but it should work. For instance, I recently uploaded a PDF created from a word doc and after a couple days it was showing up in my search results. OCR is not included otherwise until Enterprise as Ashley mentioned here, but there are free solutions for OCR that you can use prior to ingesting into Starter.

Here’s an article from techradar on OCR solutions (free options are listed at the end).

Hope this is helpful!

Like

Sign up

Login to the community

Scanning file for viruses.

This file cannot be downloaded