Index Words in Another Language

  • snfpaper
    Participant
    # 10 months ago

    On our site, we have several digital newspapers being indexed, and many of the words and names that users will want to search are currently not being found in the results. For instance, the surname Pribanic will not yield any results, despite the newspapers being properly indexed and the word Pribanic being in the content.

    Is there a way for us to configure WPSOLR to index words in multiple languages at once (i.e. Serbian [Latin Alphabet] and English)? Basically, we need people to be able to find anything they search for, including words that may not be recognized in English, like those surnames.

    The site is https://www.snfpaper.org if you would like to take a look.

    Thanks!

    wpsolr
    Keymaster
    # 10 months ago

    Hi,

    we have several digital newspapers being indexed

    Can you explain exactly what is indexed? are they .pdf files?

    snfpaper
    Participant
    # 10 months ago

    Hi,

    Yes – the newspapers are .pdf files. We also index the blog posts, pages, etc., but those haven’t been giving us any issues. Our main concern is figuring out how to index words that may not be recognized as English in these pdfs.

    Thanks!

    wpsolr
    Keymaster
    # 10 months ago

    Can you tell me the url of a pdf, already indexed, containing “Pribanic”?

    snfpaper
    Participant
    # 10 months ago

    https://www.snfpaper.org/wp-content/uploads/_pda/2018/10/2017-08-16.pdf

    You will need to register for an account on the site in order to view the pdfs since they are protected. If you prefer, I could create the account for you on this end.

    wpsolr
    Keymaster
    # 10 months ago

    Thanks. I created an account.

    With https://www.snfpaper.org/?s=Pribanic, I get only one result, but for another pdf file: https://www.snfpaper.org/wp-content/uploads/_pda/2018/12/2014-06-04.pdf. But this pdf does not seem to contain “Pribanic” (tested with Ctrl-F).

    So: the pdf with “Pribanic” (https://www.snfpaper.org/wp-content/uploads/_pda/2018/10/2017-08-16.pdf) is not found, and the pdf without “Pribanic” (https://www.snfpaper.org/wp-content/uploads/_pda/2018/12/2014-06-04.pdf) is found.

    Can you give my account access to your WP admin, so I can have a quick check?

    snfpaper
    Participant
    # 10 months ago

    Updated your status from Subscriber to Admin. Thanks!

    wpsolr
    Keymaster
    # 10 months ago

    Thanks. I can see you use a Solr index. How did you create it?

    wpsolr
    Keymaster
    # 10 months ago

    I detected an issue. The option “fuzzy” was selected in screen 2.1. It can have strange effects.

    Now there are 3 results for “Pribanic”.

    snfpaper
    Participant
    # 10 months ago

    Not quite sure how to answer your question. We installed Apache Solr and then WPSOLR PRO, and configured the index in step 0 by filling in the corresponding fields.

    snfpaper
    Participant
    # 10 months ago

    Thank you!

    wpsolr
    Keymaster
    # 10 months ago

    Is it solved?

    snfpaper
    Participant
    # 10 months ago

    It appears so – thank you for your help!

Viewing 13 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic.