YES, WPSOLR can index PDFs & Other Documents not located in the WordPress Media Library
WordPress standard search, and most of WordPress search plugins, simply cannot search in documents encoded contents at all. The few plugin who can, often lack the capability to fetch the documents located outside the Media Library.
Steps to index a file, in the media library or not, in WPSOLR
– WPSOLR loads the file from the media library, or download it from an external url
– WPSOLR sends the encoded file content to Apache Solr Tika to decode it to plain text (extraction)
– WPSOLR adds the decoded plain text to the post content
– WPSOLR indexes the post
Which urls are indexed ?
WPSOLR is not a Web crawler, therefore does not follow the hyperlinks contained in a post body.
Instead, the user has to perform one action among:
– Add an ACF file field (of type “url”) to the post
– Insert a shortcode of the plugin “Embed Any Document” in the post content
– Insert a shortcode of the plugin “PDF Embedder” in the post content
– Insert a shortcode of the plugin “Google Doc Embedder” in the post content