Local computer path in Search Result Snippets

pseudo visitor
Participant
5 years, 5 months ago #8350
We have quite a few old newspapers on our website that we have uploaded and indexed with WPSOLR, and when we search for them (search term was simply 1930, since all of the current papers uploaded are from that year), the following (or something similar) shows up in the snippet of each paper.
These papers were printed in the Cyrillic alphabet, so we are assuming that is why the local computer pathway is the default for the snippet result and the snippet basically makes no sense. We want to know if it is possible to configure the snippet in a way that removes the highlighted yellow portion you see above. Basically, could we eliminate the snippet only for the papers that are printed in Cyrillic, while still keeping the snippets for those that are in English ? As far as I have seen with our current subscription, we can only control the number of characters in the snippet, which we are assuming would be applied to every snippet, not specific ones.
Alternatively, is there a way to configure the WPSOLR Search Engine to recognize other languages (i.e. Serbian Cyrillic or Serbian Latin) ?
wpsolr
Keymaster
5 years, 5 months ago #8351
If you click on the link in the results, can you see the pdf content ?
pseudo visitor
Participant
5 years, 5 months ago #8352
If I click on the name of the file, yes. We managed to fix that error we had been having last time we chatted, so we can see all the pdfs now.
We aren’t really sure how WPSOLR decides what ends up in the snippet, because sometimes meta data shows up there as well.
wpsolr
Keymaster
5 years, 5 months ago #8353
Metadata is indexed with the file content indeed.
WPSOLR replaces the get_excerpt() in your theme’s search loop. You can use something else to prevent displaying metadata.
pseudo visitor
Participant
5 years, 5 months ago #8354
If I search for similar words, the same paper shows up in the search, but one has a complete snippet and the other doesn’t… why would this happen ?
wpsolr
Keymaster
5 years, 5 months ago #8355
When I open your paper, It seems it contains an scanned image, rather than texts that can be indexed.
pseudo visitor
Participant
5 years, 5 months ago #8356
When it comes to the Cyrillic ones (the ones from 1930), they are just images. But all those in English contain text that can be indexed, copied, pasted, etc…
Is there a way for us to integrate these Cyrillic papers into the archive without those strange looking snippets ?
wpsolr
Keymaster
5 years, 5 months ago #8357
If Cyrillic pdfs only contain metadata, then it explains why metadata is prominent in results excerpts.
pseudo visitor
Participant
5 years, 5 months ago #8358
Makes sense to me as well. Basically, we are hoping there would be a way to have these things remain searchable without indexing them, or at the very least removing those snippets to avoid any confusion for users on the site.
wpsolr
Keymaster
5 years, 5 months ago #8359
In your theme’s search.php, replace the_excerpt() with anything you like.
pseudo visitor
Participant
5 years, 5 months ago #8360
I assume that whatever we change the_excerpt() to will become the default for every entry in the search results, correct ?
wpsolr
Keymaster
5 years, 5 months ago #8361
You can adapt your entry output to the entry itself. For instance, replace the_excerpt() only for attachments of type pdf.

Viewing 12 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic.