Can WPSOLR index PDFs & Other Documents not located in the WordPress Media Library ?

480 views November 14, 2016 January 15, 2017 admin 2

YES, WPSOLR can index PDFs & Other Documents not located in the WordPress Media Library 

WordPress standard search, and most of WordPress search plugins, simply cannot search in documents encoded contents at all. The few plugin who can, often lack the capability to fetch the documents located outside the Media Library.

Steps to index a file, in the media library or not, in WPSOLR

wpsolr indexes files from media library or from an external url

wpsolr indexes files from media library or from an external url

– WPSOLR loads the file from the media library, or download it from an external url
– WPSOLR sends the encoded file content to Apache Solr Tika to decode it to plain text (extraction)
– WPSOLR adds the decoded plain text to the post content
– WPSOLR indexes the post

Which urls are indexed ?

WPSOLR is not a Web crawler, therefore does not follow the hyperlinks contained in a post body.

Instead, the user has to perform one action among:

– Add an ACF  file field (of type “url”) to the post
– Insert a shortcode of the plugin “Embed Any Document” in the post content
– Insert a shortcode of the plugin “PDF Embedder” in the post content
– Insert a shortcode of the plugin “Google Doc Embedder” in the post content

 

 

Was this helpful?

Leave A Comment
*
*