Few questions about search and results

  • dnikola
    Participant
    4 years, 7 months ago #14293

    Hi Partice,

    I am currently indexing (indexed) about 4000 custom post type posts in solr.

    I this is settings a have setup in 2.2 https://prntscr.com/p9vdhl
    All text that needs to be searched is inside this field transkript

    so my questions’s are

    1. Did i setup this correctly?

    2. If i search for same word but on two different letter type – for example Patrice and Патрице will it return same results for both search ? if no, how i can manage when search for one word to return second one.

    3. If I search for Patri for example can it return everything that will match after searched term Patricing Patrice Patria Patriamo end etc… Is it enough just to check this option https://prntscr.com/p9vgzf

    Thanks!

    wpsolr
    Keymaster
    4 years, 7 months ago #14294

    1. You indexed your field in 2.2 settings. But you need to search in your field, and for that use the 2.2 option “Index custom fields and categories” or add your field as a boost in 2.3

    2. I don’t understand what you intend to do. Automatic translation during search?

    3. Yes, partial search should work fine.

    dnikola
    Participant
    4 years, 7 months ago #14299

    1. ok that is something that i already using on page 2.2 https://prntscr.com/p9vnon but on page 2.3 I am not sure what are 2.3 for (do you have some examples to point me how to use this 🙂 )

    2. In my language we have two type of letters Latin and Cyrillic. Some text indexed are Latin, and some are Cyrillic. So if some users are using Latin keyboard will they get results from text indexed with Cyrillic?

    3. Ok but i thinks that i have more results to see but search is not finding them… How i can debug this …

    wpsolr
    Keymaster
    4 years, 7 months ago #14301

    1. With 2.2 option, it’s fine

    2. What language are we talking about?

    3. Try using the Solr admin with search query like: text:Patri*

    dnikola
    Participant
    4 years, 7 months ago #14304

    1. Ok

    2. Serbian

    3. Ok i will try, if don’t manage to find way how to do that i will ask 🙂

    wpsolr
    Keymaster
    dnikola
    Participant
    4 years, 7 months ago #14307

    yes.

    Where i should check for a config file?

    wpsolr
    Keymaster
    4 years, 7 months ago #14308

    Edit the schema.xml file installed by the plugin in your index folders.

    The field analyser to change is “text”:

    
    <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
        <analyzer type="index">
            <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1"
                    catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.EnglishPossessiveFilterFactory"/>
            <filter class="solr.PorterStemFilterFactory"/>
        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <!--<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>-->
            <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1"
                    catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.EnglishPossessiveFilterFactory"/>
            <filter class="solr.PorterStemFilterFactory"/>
        </analyzer>
    </fieldType>
    
    dnikola
    Participant
    4 years, 7 months ago #14315

    ok seams that i have already made some changes

    but also i made for query

    should i remove?

    if i leave this for text will it be ok? https://prntscr.com/p9x6e1

    thanks!

    wpsolr
    Keymaster
    4 years, 7 months ago #14316

    You can use the example from the Solr documentation:

    <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
     <analyzer>
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.SerbianNormalizationFilterFactory" haircut="bald"/>
     </analyzer>
    </fieldType>

    Then restart Solr, or “reload” the index, and reindex everything in the plugin.

    dnikola
    Participant
    4 years, 7 months ago #14317

    ok i tried bald and regular but in both cases i get same results.

    So my question is are serialized meta fields making some kind of problems with search?
    I checked option to debug while indexing and no error has been seen.

    But when i search for a specific word i know that there are 35 posts in one category containing that word but it gives me only 33 of that.
    I know exactly which posts are not shown.

    What i can do about that?

    Thanks!

    wpsolr
    Keymaster
    4 years, 7 months ago #14318

    The 2 posts could be not indexed, in the wrong category, or the keywords searched not in the post title/content (or not exactly).

    Try and verify manually with Solr queries in the Solr admin.

    dnikola
    Participant
    4 years, 7 months ago #14319

    I know that post should be in results, but it is not there, it is not up to categories, i dont have post text,search depends on post meta field with serialized data, but other works fine…

    Should some error be trown on indexing if post could not be indexed?

    wpsolr
    Keymaster
    4 years, 7 months ago #14321

    Try to locate your 2 missing posts in your Solr admin search by their ID (query is: PID:xxxx), and check their fields content.

    dnikola
    Participant
    4 years, 7 months ago #14354

    Ok,

    i have done some research and what i have noticed that this filed for this posts are partial indexed.

    What i have noticed is that it breaks because of >> characters.

    for example there is string like

    >> TEXT TEXT << text between >> MORE TEXT <<

    so basically this text between is skipped, like it is ignoring text between << >>

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic.