Search keyword not working properly

Tagged:
  • wpsolr
    Keymaster
    1 year, 10 months ago #30117

    This is a file format issue, often due to XML (generated from a PDF for instance) containing the forbidden “<" character. No obvious solution, appart from removing the documents or fixing the documents content: https://www.wpsolr.com/forums/topic/indexing-error-invalid-utf-8-middle-byte-0x3c/

    hsurekar
    Participant
    1 year, 10 months ago #30204

    ok is there any way to index these articles as it’s not indexing and we have approx 30000 articles so how can i fix format issue?

    wpsolr
    Keymaster
    1 year, 10 months ago #30205

    You could try with Elasticsearch or OpenSearch. They could be less prone to format issues.

    hsurekar
    Participant
    1 year, 10 months ago #30232

    ok, now i am getting below error but not getting any post ID to check what is the issue. could you please let me know how to fix below error?
    Batch process : 1
    Debug mode : On

    An error or timeout occured.

    Error code: parsererror

    Error message: SyntaxError: Unexpected token P in JSON at position 0

    Posts excluded from the index:<br><b></b><br><br>******** DEBUG ACTIVATED – Beginning of new loop (batch size) *******<br><br>******** DEBUG ACTIVATED – Query documents from last post date *******<br><br>Query:<br><b>SELECT ID, post_modified, post_parent, post_type FROM wp_1_posts AS A WHERE ((post_modified = ‘2010-12-10 17:34:08’ AND ID > 97920) OR (post_modified > ‘2010-12-10 17:34:08’)) AND ( post_status IN (‘publish’) AND ( post_type = ‘post’ ) ) ORDER BY post_modified ASC, ID ASC LIMIT 1</b><br><br>Last post date:<br><b>2010-12-10 17:34:08</b><br><br>Last post ID:<br><b>97920</b><br><br>Post to be sent:<br><b>{ “id”: “97923”, “PID”: “97923”, “type”: “post”, “meta_type_s”: “post_type”, “displaymodified”: “2010-12-10T18:05:00Z”, “title”: “CJAM relocates to Adams court “, “title_s”: “CJAM relocates to Adams court “, “permalink”: “https:\/\/masslawyersweekly.com\/2010\/12\/10\/cjam-relocates-to-adams-court\/”, “post_status_s”: “publish”, “content”: “Chief Justice for Administration and Management Robert A. Mulligan has moved his office from Boston\u2019s Center Plaza to the first-floor mezzanine of the John Adams Courthouse.\r\n\r\nThe move further reduces the space leased at Center Plaza for the Administrative Office of the Trial Court. Since late 2008, consolidations and relocations have reduced the leased space there by 32 percent, and the landlord has renegotiated the lease.\r\n\r\nDepartments remaining at Center Plaza include human resources, legal, fiscal, support services and the Judicial Institute. The Administrative Office of the Juvenile Court and the Sentencing Commission also remain at Center Plaza.\r\n\r\nTelephone numbers for the CJAM, Chief of Staff Bob Panneton and Executive Director Frank Carney remain the same. The new address is: John Adams Courthouse, One Pemberton Square, Boston, MA, 02108. Mail for the AOTC departments should continue to be sent to Center Plaza.”, “snippet_s”: “Chief Justice for Administration and Management Robert A. Mulligan has moved his office from Boston?”, “post_author_s”: “542”, “author”: “Mass. Lawyers Weekly Staff”, “menu_order_i”: 0, “PID_i”: “97923”, “author_s”: “https:\/\/masslawyersweekly.com\/author\/mass-lawyersweeklystaff\/”, “displaydate”: “2010-12-10T18:05:00Z”, “displaydate_dt”: “2010-12-10T18:05:00Z”, “date”: “2010-12-10T23:05:00Z”, “displaymodified_dt”: “2010-12-10T18:05:00Z”, “modified”: “2010-12-10T23:05:00Z”, “displaymodified_dt_i”: “1292004300000”, “displaymodified_dt_y_i”: “2010”, “displaymodified_dt_ym_i”: “12”, “displaymodified_dt_yw_i”: “49”, “displaymodified_dt_yd_i”: “344”, “displaymodified_dt_md_i”: “10”, “displaymodified_dt_wd_i”: “6”, “displaymodified_dt_dh_i”: “18”, “displaymodified_dt_dm_i”: “5”, “displaymodified_dt_ds_i”: 0, “displaydate_dt_i”: “1292004300000”, “displaydate_dt_y_i”: “2010”, “displaydate_dt_ym_i”: “12”, “displaydate_dt_yw_i”: “49”, “displaydate_dt_yd_i”: “344”, “displaydate_dt_md_i”: “10”, “displaydate_dt_wd_i”: “6”, “displaydate_dt_dh_i”: “18”, “displaydate_dt_dm_i”: “5”, “displaydate_dt_ds_i”: 0, “date_i”: “1292022300000”, “date_y_i”: “2010”, “date_ym_i”: “12”, “date_yw_i”: “49”, “date_yd_i”: “344”, “date_md_i”: “10”, “date_wd_i”: “6”, “date_dh_i”: “23”, “date_dm_i”: “5”, “date_ds_i”: 0, “displaydate_i”: “1292004300000”, “displaydate_y_i”: “2010”, “displaydate_ym_i”: “12”, “displaydate_yw_i”: “49”, “displaydate_yd_i”: “344”, “displaydate_md_i”: “10”, “displaydate_wd_i”: “6”, “displaydate_dh_i”: “18”, “displaydate_dm_i”: “5”, “displaydate_ds_i”: 0, “modified_i”: “1292022300000”, “modified_y_i”: “2010”, “modified_ym_i”: “12”, “modified_yw_i”: “49”, “modified_yd_i”: “344”, “modified_md_i”: “10”, “modified_wd_i”: “6”, “modified_dh_i”: “23”, “modified_dm_i”: “5”, “modified_ds_i”: 0, “comments”: [], “numcomments”: 0, “categories_str”: [ “News Briefs” ], “categories”: [ “News Briefs”, “MALW”, “Subscriber Only”, “Yes” ], “flat_hierarchy_categories_str”: [ “News Briefs” ], “non_flat_hierarchy_categories_str”: [ “News Briefs” ], “tags”: [ “CJAM”, “Dec. 13 2010 issue.” ], “dmcss_pub_code_str”: [ “MALW” ], “dmcss_security_policy_str”: [ “Subscriber Only” ], “we_own_it_str”: [ “Yes” ] }</b><br><br>{“nb_results”:0,”status”:400,”message”:”Solr HTTP error: Bad Request (400)\n{\n "responseHeader":{\n "status":400,\n "QTime":0},\n "error":{\n "metadata":[\n "error-class","org.apache.solr.common.SolrException",\n "root-error-class","java.io.CharConversionException"],\n "msg":"Invalid UTF-8 middle byte 0x3c (at char #1563, byte #127)",\n "code":400}}\n”,”indexing_complete”:false}

    hsurekar
    Participant
    1 year, 10 months ago #30235

    I found issue please see below content of post
    If I add below content then it will give me error.
    “It’s no longer just the customers and internal clients and regulators of our business, but additional pressures externally from the SEC, analysts and shareholders.””

    If I add below content then it will work proper

    “It’s no longer just the customers and internal clients and regulators of our business, but additional pressures externally from the SEC, analysts and shareholders.”

    So could you please check and provide us solution for this quote related issue.

    Thanks
    Hemant

    wpsolr
    Keymaster
    1 year, 10 months ago #30237

    If I add below content then it will give me error.
    “It’s no longer just the customers and internal clients and regulators of our business, but additional pressures externally from the SEC, analysts and shareholders.””

    In your PDF content, or in your post’s description content?

    wpsolr
    Keymaster
    1 year, 10 months ago #30238

    (I tried your sentence within a post content, and indexed it with success in Solr).

    hsurekar
    Participant
    1 year, 10 months ago #30239

    can we have a quick call so i can show you?
    Thanks

    hsurekar
    Participant
    1 year, 10 months ago #30240

    see attached screenshot you have added different quote.Image EZZ-ClK4rT1Du2n5B-fDRVQBjdGWeqPs2-GzuYcZWvCfhQ of

    wpsolr
    Keymaster
    1 year, 10 months ago #30241

    I cannot access your latest image link.

    hsurekar
    Participant
    1 year, 10 months ago #30244

    please check now Image view of

    wpsolr
    Keymaster
    1 year, 10 months ago #30247

    Thanks. Can you copy this sentence from the tab “Text” in your WP editor (it contains the HTML).

    hsurekar
    Participant
    1 year, 10 months ago #30250

    Please see below content which we are adding but on your site editor it’s converting normal quote but not in wordpress editor.

    One thing about him that might surprise people: “I’m pretty involved in charities involving kids, like Boys Town New England, where I’m on the board of directors.”

    hsurekar
    Participant
    1 year, 10 months ago #30251

    I have attached txt file with that content please check it.

    TXT file : https://drive.google.com/file/d/1CAqqjVwZIix_ayMikrDOq7Om1Uj3NoUT/view?usp=sharing

    wpsolr
    Keymaster
    1 year, 10 months ago #30252

    I could index your TXT file content, unfortunately.

    How do you create this character?

Viewing 15 posts - 61 through 75 (of 114 total)

You must be logged in to reply to this topic.