Error indexing text from Apache Tika in Solr

Error indexing text from Apache Tika in Solr - java

I am trying to integrate Apache Tika with Solr so that text extracted by Tika could be indexed in Solr.
I tried the following code:
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.UUID;
import org.apache.solr.client.solrj.SolrServer;
import org.apache.solr.client.solrj.impl.HttpSolrServer;
import org.apache.solr.common.SolrInputDocument;
import org.apache.tika.exception.TikaException;
import org.apache.tika.metadata.DublinCore;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.mime.MimeTypes;
import org.apache.tika.parser.AutoDetectParser;
import org.apache.tika.parser.ParseContext;
import org.apache.tika.parser.Parser;
import org.apache.tika.sax.BodyContentHandler;
import org.xml.sax.ContentHandler;
import org.xml.sax.SAXException;
public class Main {
private static SolrServer solr;
public static void main(String[] args) throws IOException, SAXException, TikaException {
try {
solr = new HttpSolrServer("http://localhost:8983/solr/#/");
String path = "C:\\content\\";
String file_html = path + "mobydick.htm";
String file_txt = path + "/home/ben/abc.warc";
String file_pdf = path + "callofthewild.pdf";
processDocument(file_html);
processDocument(file_txt);
processDocument(file_pdf);
solr.commit();
}
catch (Exception ex) {
System.out.println(ex.getMessage());
}
}
private static void processDocument(String pathfilename) {
try {
InputStream input = new FileInputStream(new File(pathfilename));
//use Apache Tika to convert documents in different formats to plain text
ContentHandler textHandler = new BodyContentHandler(10*1024*1024);
Metadata meta = new Metadata();
Parser parser = new AutoDetectParser(); //handles documents in different formats:
ParseContext context = new ParseContext();
parser.parse(input, textHandler, meta, context); //convert to plain text
//collect metadata and content from Tika and other sources
//document id must be unique, use guide
UUID guid = java.util.UUID.randomUUID();
String docid = guid.toString();
//Dublin Core metadata (partial set)
String doctitle = meta.get(DublinCore.TITLE);
String doccreator = meta.get(DublinCore.CREATOR);
//other metadata
String docurl = pathfilename; //document url
//content
String doccontent = textHandler.toString();
//call to index
indexDocument(docid, doctitle, doccreator, docurl, doccontent);
}
catch (Exception ex) {
System.out.println(ex.getMessage());
}
}
private static void indexDocument(String docid, String doctitle, String
doccreator, String docurl, String doccontent) {
try {
SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", docid);
//map metadata fields to default schema
//location: path\solr-4.7.2\example\solr\collection1\conf\schema.xml
//Dublin Core
//thought: schema could be modified to use Dublin Core
doc.addField("title", doctitle);
doc.addField("author", doccreator);
//other metadata
doc.addField("url", docurl);
//content (and text)
//per schema, the content field is not indexed by default, used for returning and highlighting document content
//the schema "copyField" command automatically copies this to the "text" field which is indexed
doc.addField("content", doccontent);
//indexing
//when a field is indexed, like "text", Solr will handle tokenization, stemming, removal of stopwords etc, per the schema defn
//add to index
solr.add(doc);
}
catch (Exception ex) {
System.out.println(ex.getMessage());
}
}
}
Unfortunately I am hitting the Error below:
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/http/NoHttpResponseException at Main.main(Main.java:28)
Caused by: java.lang.ClassNotFoundException:
org.apache.http.NoHttpResponseException
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 1 more
Could you please help me with the resolution of this issue?

Related

java.lang.ClassNotFoundException: org.junit.Assert / Working with docising API

I have encountered these errors when trying to implement the docusing embedded signature into my web app (java -eclipse).
java.lang.ClassNotFoundException: org.junit.Assert
org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1412)
org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1220)
com.uniquedeveloper.registration.test2.EmbeddedSigningTest(test2.java:94)
com.uniquedeveloper.registration.test2.doPost(test2.java:79)
javax.servlet.http.HttpServlet.service(HttpServlet.java:681)
javax.servlet.http.HttpServlet.service(HttpServlet.java:764)
org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:53)
Here is my code for the test2 class:
package com.uniquedeveloper.registration;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.Base64;
import java.util.List;
import java.util.stream.Collectors;
import javax.servlet.ServletContext;
import javax.servlet.ServletException;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import javax.servlet.http.Part;
import javax.servlet.jsp.PageContext;
import org.apache.tomcat.util.http.fileupload.FileUpload;
import static org.junit.Assert.*;
import org.junit.Assert;
import org.junit.Test;
import com.docusign.esign.api.EnvelopesApi;
import com.docusign.esign.client.ApiClient;
import com.docusign.esign.client.ApiException;
import com.docusign.esign.model.Document;
import com.docusign.esign.model.EnvelopeDefinition;
import com.docusign.esign.model.EnvelopeSummary;
import com.docusign.esign.model.RecipientViewRequest;
import com.docusign.esign.model.Recipients;
import com.docusign.esign.model.SignHere;
import com.docusign.esign.model.Signer;
import com.docusign.esign.model.Tabs;
import com.docusign.esign.model.ViewUrl;
/**
* Servlet implementation class test2
*/
#WebServlet("/test2")
public class test2 extends HttpServlet {
private static final long serialVersionUID = 1L;
protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
String rEmail = request.getParameter("recipientem");
String rName = request.getParameter("recipientname");
/*Part file = request.getPart("file_upload");
*/
/* String filePath = (String)request.getServletContext().getInitParameter("file_upload");*/
/*Part filePart = request.getPart("file_upload");
String fileName=filePart.getSubmittedFileName();
for(Part part : request.getParts()) {
part.write("C:\\Bureau\\"+ fileName);
*/
/*Part filePart = request.getPart("file_upload");
InputStream filecontent = filePart.getInputStream();
if (filePart != null) {
filecontent = filePart.getInputStream();
}
else {
System.out.println("KHAWI");
}*/
String filePath = (String)request.getServletContext().getInitParameter("file_upload");
System.out.println("ZWIN1");
EmbeddedSigningTest(rEmail, rName, filePath);
}
#Test
public void EmbeddedSigningTest(String rEmail, String rName, String filePath ) {
String AccountId = "16501888";
System.out.println("\nEmbeddedSigningTest:\n" + "===========================================");
byte[] fileBytes = null;
try {
String currentDir = System.getProperty("user.dir");
Path path = Paths.get(currentDir + filePath);
fileBytes = Files.readAllBytes(path);
} catch (IOException ioExcp) {
Assert.assertNull(ioExcp);
}
System.out.println("ZWIN2");
// create an envelope to be signed
EnvelopeDefinition envDef = new EnvelopeDefinition();
envDef.setEmailSubject("Please Sign my Java SDK Envelope (Embedded Signer)");
envDef.setEmailBlurb("Hello, Please sign my Java SDK Envelope.");
// add a document to the envelope
Document doc = new Document();
String base64Doc = Base64.getEncoder().encodeToString(fileBytes);
doc.setDocumentBase64(base64Doc);
doc.setName(filePath);
doc.setDocumentId("1");
List<Document> docs = new ArrayList<>();
docs.add(doc);
envDef.setDocuments(docs);
// Add a recipient to sign the document
Signer signer = new Signer();
signer.setEmail(rEmail);
String name = "Pat Developer";
signer.setName(rName);
signer.setRecipientId("1");
// this value represents the client's unique identifier for the signer
String clientUserId = "2adce842-15eb-4744-9807-5a82020cc313 ";
signer.setClientUserId(clientUserId);
// Create a SignHere tab somewhere on the document for the signer to
// sign
SignHere signHere = new SignHere();
signHere.setDocumentId("1");
signHere.setPageNumber("1");
signHere.setRecipientId("1");
signHere.setXPosition("100");
signHere.setYPosition("100");
signHere.setScaleValue("0.5");
List<SignHere> signHereTabs = new ArrayList<>();
signHereTabs.add(signHere);
Tabs tabs = new Tabs();
tabs.setSignHereTabs(signHereTabs);
signer.setTabs(tabs);
// Above causes issue
envDef.setRecipients(new Recipients());
envDef.getRecipients().setSigners(new ArrayList<>());
envDef.getRecipients().getSigners().add(signer);
// send the envelope (otherwise it will be "created" in the Draft folder
envDef.setStatus("sent");
try {
EnvelopesApi envelopesApi = new EnvelopesApi();
EnvelopeSummary envelopeSummary = envelopesApi.createEnvelope(AccountId, envDef);
Assert.assertNotNull(envelopeSummary);
Assert.assertNotNull(envelopeSummary.getEnvelopeId());
System.out.println("EnvelopeSummary: " + envelopeSummary);
String returnUrl = "http://localhost:8080/index/";
RecipientViewRequest recipientView = new RecipientViewRequest();
recipientView.setReturnUrl(returnUrl);
recipientView.setClientUserId(clientUserId);
recipientView.setAuthenticationMethod("email");
recipientView.setUserName(name);
recipientView.setEmail(rEmail);
ViewUrl viewUrl = envelopesApi.createRecipientView(AccountId,
envelopeSummary.getEnvelopeId(), recipientView);
Assert.assertNotNull(viewUrl);
Assert.assertNotNull(viewUrl.getUrl());
// This Url should work in an Iframe or browser to allow signing
System.out.println("ViewUrl is " + viewUrl);
} catch (ApiException ex) {
Assert.fail("Exception: " + ex);
} catch (Exception e) {
Assert.fail("Exception: " + e.getLocalizedMessage());
}
}
}
I (pretty sure ) think it has something to do with the file not being uplaoded properly ( null).
Please help !

The error is telling you that the class org.junit.Assert is not on the classpath. Meaning that the JUnit library is missing or it has the wrong scope (if you are using a building tool like Maven or Gradle).
The whole class has several problems, I suspect you are trying to deploy the servlet on Tomcat and there the JUnit dependency is missing. You should redesign it and keep the test separated from the servlet.

Set up URI or catalog resolver with Saxon/XQuery

I am developing a simple command line application in Java to mine data from a large XML data set (15,000+ XML files). I have chosen to use Saxon S9API as the XQuery processor for this. Everything works fine so long as there is open access to the internet where the parser used by Saxon can resolve the xsi:noNamespaceSchemaLocation URI (or any other I will assume).
I have scoured Stackoverflow, as well as general Google searching, for answers on how to provide a catalog to the XQuery processor. I have not found a good explanation on how to do so.
This is the simple code I have at this point, which as I stated works fine when there is open access the Internet:
package ipd.part.info.mining.app;
import java.io.File;
import java.util.List;
import java.util.Scanner;
import java.util.logging.Level;
import java.util.logging.Logger;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import net.sf.saxon.Configuration;
import net.sf.saxon.TransformerFactoryImpl;
import net.sf.saxon.s9api.DOMDestination;
import net.sf.saxon.s9api.Processor;
import net.sf.saxon.s9api.QName;
import net.sf.saxon.s9api.SaxonApiException;
import net.sf.saxon.s9api.XQueryCompiler;
import net.sf.saxon.s9api.XQueryEvaluator;
import net.sf.saxon.s9api.XQueryExecutable;
import net.sf.saxon.s9api.XdmAtomicValue;
import net.sf.saxon.lib.*;
import static org.apache.xerces.jaxp.JAXPConstants.JAXP_SCHEMA_LANGUAGE;
import static org.apache.xerces.jaxp.JAXPConstants.W3C_XML_SCHEMA;
import org.apache.xerces.util.XMLCatalogResolver;
import org.apache.xml.resolver.tools.CatalogResolver;
import org.w3c.dom.Document;
import org.xml.sax.ErrorHandler;
/**
*
* #author tfurst
*/
public class IPDPartInfoMiningApp {
/**
* #param args the command line arguments
*/
private static Scanner scanner = new Scanner(System.in);
private static String ietmPath;
private static String outputPath;
private static CatalogResolver resolver;
private static org.apache.xerces.util.XMLCatalogResolver xres;
private static ErrorHandler eHandler;
private static DocumentBuilderFactory DBF;
private static DocumentBuilder DB;
public static void main(String[] args) {
initDb();
try {
// TODO code application logic here
System.out.println("Enter path to complete IETM Export:");
ietmPath = scanner.nextLine();
System.out.println("Enter path to save report:");
outputPath = scanner.nextLine();
Processor proc = new Processor(true);
XQueryCompiler comp = proc.newXQueryCompiler();
//File xq = fixXquery(new File(XQ));
//XQueryExecutable exp = comp.compile(xq);
XQueryExecutable exp = comp.compile("declare variable $path external;\n" +
"\n" +
"let $coll := collection(concat($path,'?select=*.xml'))//itemSequenceNumber \n" +
"\n" +
"return\n" +
"<parts>\n" +
"{\n" +
" for $mod in $coll\n" +
" let $pn := normalize-space($mod/partNumber)\n" +
" let $nomen := $mod/partIdentSegment[1]/descrForPart\n" +
" let $smr := $mod/locationRcmdSegment/locationRcmd/sourceMaintRecoverability\n" +
" order by $pn\n" +
" return <part pn=\"{$pn}\" nomen=\"{$nomen}\" smr=\"{$smr}\"/>\n" +
"}\n" +
"</parts>");
//Serializer out = proc.newSerializer(System.out);
Document dom = DB.newDocument();
XQueryEvaluator ev = exp.load();
ev.setExternalVariable(new QName("path"), new XdmAtomicValue(new File(ietmPath).toPath().toUri().toString().substring(0, new File(ietmPath).toPath().toUri().toString().lastIndexOf("/"))));
ev.run(new DOMDestination(dom));
TransformerFactoryImpl tfact = new net.sf.saxon.TransformerFactoryImpl();
Transformer trans = tfact.newTransformer();
DOMSource src = new DOMSource(dom);
StreamResult res = new StreamResult(new File(outputPath + File.separator + "output.xml"));
trans.transform(src, res);
} catch (SaxonApiException | TransformerException ex) {
Logger.getLogger(IPDPartInfoMiningApp.class.getName()).log(Level.SEVERE, null, ex);
}
}
private static XMLCatalogResolver createXMLCatalogResolver(CatalogResolver resolver)
{
int i = 0;
List files = resolver.getCatalog().getCatalogManager().getCatalogFiles();
String[] catalogs = new String[files.size()];
XMLCatalogResolver xcr = new XMLCatalogResolver();
for(Object file : files)
{
catalogs[i] = new File(file.toString()).getAbsolutePath();
}
xcr.setCatalogList(catalogs);
return xcr;
}
private static void initDb()
{
try
{
resolver = new CatalogResolver();
eHandler = new DocumentErrorHandler();
xres = createXMLCatalogResolver(resolver);
DBF = DocumentBuilderFactory.newInstance();
DBF.setAttribute(JAXP_SCHEMA_LANGUAGE, W3C_XML_SCHEMA);
DBF.setNamespaceAware(true);
DB = DBF.newDocumentBuilder();
DB.setEntityResolver(xres);
DB.setErrorHandler(eHandler);
}
catch (ParserConfigurationException ex)
{
ex.printStackTrace();
}
}
}
I am receiving this error when I disconnect my machine from the network:
C:\Users\tfurst\Desktop\XQuery Test\testXml\test\tool>java -jar IPD_Part_Info_Mining_App.jar
Enter path to complete IETM Export:
C:\Users\tfurst\Desktop\Wire Repl Testing
Enter path to save report:
C:\Users\tfurst\Desktop\Wire Repl Testing\report
Error on line 6 column 2
collection(): failed to parse XML file
file:/C:/Users/tfurst/Desktop/Wire%20Repl%20Testing/DMC-HH60W-A-52-21-0001-04AAA-520A-B.xml: I/O error reported by XML parser processing file:/C:/Users/tfurst/Desktop/Wire%20Repl%20Testing/DMC-HH60W-A-52-21-0001-04AAA-520A-B.xml: Read timed out
Aug 20, 2019 2:55:23 PM ipd.part.info.mining.app.IPDPartInfoMiningApp main
SEVERE: null
net.sf.saxon.s9api.SaxonApiException: collection(): failed to parse XML file file:/C:/Users/tfurst/Desktop/Wire%20Repl%20Testing/DMC-HH60W-A-52-21-0001-04AAA-520A-B.xml: I/O error reported by XML parser processing file:/C:/Users/tfurst/Desktop/Wire%20Repl%20Testing/DMC-HH60W-A-52-21-0001-04AAA-520A-B.xml: Read timed out
at net.sf.saxon.s9api.XQueryEvaluator.run(XQueryEvaluator.java:372)
at ipd.part.info.mining.app.IPDPartInfoMiningApp.main(IPDPartInfoMiningApp.java:80)
Caused by: net.sf.saxon.trans.XPathException: collection(): failed to parse XML file file:/C:/Users/tfurst/Desktop/Wire%20Repl%20Testing/DMC-HH60W-A-52-21-0001-04AAA-520A-B.xml: I/O error reported by XML parser processing file:/C:/Users/tfurst/Desktop/Wire%20Repl%20Testing/DMC-HH60W-A-52-21-0001-04AAA-520A-B.xml: Read timed out
at net.sf.saxon.resource.XmlResource.getItem(XmlResource.java:113)
at net.sf.saxon.functions.CollectionFn$2.mapItem(CollectionFn.java:246)
at net.sf.saxon.expr.ItemMappingIterator.next(ItemMappingIterator.java:113)
at net.sf.saxon.expr.ItemMappingIterator.next(ItemMappingIterator.java:108)
at net.sf.saxon.expr.ItemMappingIterator.next(ItemMappingIterator.java:108)
at net.sf.saxon.om.FocusTrackingIterator.next(FocusTrackingIterator.java:85)
at net.sf.saxon.expr.ContextMappingIterator.next(ContextMappingIterator.java:59)
at net.sf.saxon.expr.sort.DocumentOrderIterator.<init>(DocumentOrderIterator.java:47)
at net.sf.saxon.expr.sort.DocumentSorter.iterate(DocumentSorter.java:230)
at net.sf.saxon.expr.flwor.ForClausePush.processTuple(ForClausePush.java:34)
at net.sf.saxon.expr.flwor.FLWORExpression.process(FLWORExpression.java:841)
at net.sf.saxon.expr.instruct.ElementCreator.processLeavingTail(ElementCreator.java:337)
at net.sf.saxon.expr.instruct.ElementCreator.processLeavingTail(ElementCreator.java:284)
at net.sf.saxon.expr.instruct.Instruction.process(Instruction.java:151)
at net.sf.saxon.query.XQueryExpression.run(XQueryExpression.java:411)
at net.sf.saxon.s9api.XQueryEvaluator.run(XQueryEvaluator.java:370)
... 1 more
C:\Users\tfurst\Desktop\XQuery Test\testXml\test\tool>pause
Press any key to continue . . .
I am sure this is probably a relatively simple fix, most likely something I have overlooked. I know how to handle this when working with XSL tranformations, by supplying a catalog and the location of the schemas. Thanks in advance for any help, much appreciated.

To use an XML catalog file something like the following in your code should work:
Processor proc = new Processor(false); //false for Saxon-HE
XQueryCompiler compiler = proc.newXQueryCompiler();
XmlCatalogResolver.setCatalog("path/catalog.xml", proc.getUnderlyingConfiguration(), false);
...

EDI Must be a minimum of 1 instances of segment [UNS]

Am newbie to EDI. And i just converted the ORDERS edi file to XML using smooks api. Some of the ORDER example files are working fine in following example. But i got the following exception when i running the following edi file. Am stuck with this. Here is my example and EDI data
package example;
import org.json.JSONObject;
import org.json.XML;
import org.milyn.Smooks;
import org.milyn.SmooksException;
import org.milyn.io.StreamUtils;
import org.milyn.smooks.edi.unedifact.UNEdifactReaderConfigurator;
import org.xml.sax.SAXException;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.StringWriter;
public class Main {
public static int PRETTY_PRINT_INDENT_FACTOR = 4;
protected static String runSmooksTransform() throws IOException, SAXException, SmooksException {
Smooks smooks = new Smooks();
smooks.setReaderConfig(new UNEdifactReaderConfigurator("urn:org.milyn.edi.unedifact:d93a-mapping:*"));
try {
StringWriter writer = new StringWriter();
smooks.filterSource(new StreamSource(new FileInputStream("EDI.edi")), new StreamResult(writer));
return writer.toString();
} finally {
smooks.close();
}
}
public static void main(String[] args) throws IOException, SAXException, SmooksException {
System.out.println("\n\n==============Message In==============");
System.out.println(readInputMessage());
System.out.println("======================================\n");
String messageOut = Main.runSmooksTransform();
System.out.println("==============Message Out=============");
System.out.println(messageOut);
System.out.println("======================================\n\n");
JSONObject xmlJSONObj = XML.toJSONObject(messageOut);
String jsonPrettyPrintString = xmlJSONObj.toString(PRETTY_PRINT_INDENT_FACTOR);
System.out.println(jsonPrettyPrintString);
}
private static String readInputMessage() throws IOException {
return StreamUtils.readStreamAsString(new FileInputStream("EDI.edi"));
}
}
And the exception with Sample EDI Data
Exception in thread "main" org.milyn.SmooksException: Failed to filter source.
at org.milyn.delivery.sax.SmooksSAXFilter.doFilter(SmooksSAXFilter.java:97)
at org.milyn.delivery.sax.SmooksSAXFilter.doFilter(SmooksSAXFilter.java:64)
at org.milyn.Smooks._filter(Smooks.java:526)
at org.milyn.Smooks.filterSource(Smooks.java:482)
at org.milyn.Smooks.filterSource(Smooks.java:456)
at example.Main.runSmooksTransform(Main.java:49)
at example.Main.main(Main.java:63)
Caused by: org.milyn.edisax.EDIParseException: EDI message processing failed [ORDERS][D:93A:UN]. Must be a minimum of 1 instances of segment [UNS]. Currently at segment number 9.
at org.milyn.edisax.EDIParser.mapSegments(EDIParser.java:499)
at org.milyn.edisax.EDIParser.mapSegments(EDIParser.java:450)
at org.milyn.edisax.EDIParser.parse(EDIParser.java:426)
at org.milyn.edisax.EDIParser.parse(EDIParser.java:410)
at org.milyn.edisax.unedifact.handlers.UNHHandler.process(UNHHandler.java:97)
at org.milyn.edisax.unedifact.handlers.UNBHandler.process(UNBHandler.java:75)
at org.milyn.edisax.unedifact.UNEdifactInterchangeParser.parse(UNEdifactInterchangeParser.java:113)
at org.milyn.smooks.edi.unedifact.UNEdifactReader.parse(UNEdifactReader.java:75)
at org.milyn.delivery.sax.SAXParser.parse(SAXParser.java:76)
at org.milyn.delivery.sax.SmooksSAXFilter.doFilter(SmooksSAXFilter.java:86)
... 6 more

Bad source data will cause this.
It looks like smooks is looking for a UNS segment which isn't in your data. The section control is mandatory per the D.93A standard.

Apache Tika and Apache Solr integration through Java API

I am trying to integrate Apache tika and Apache Solr so that I can index my parse data. I'm using Solr version 4.3.1 and Tika version as 2.11.6.
The code which I am following are like:
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.UUID;
import org.apache.solr.client.solrj.SolrServer;
import org.apache.solr.client.solrj.impl.HttpSolrServer;
import org.apache.solr.common.SolrInputDocument;
import org.apache.tika.exception.TikaException;
import org.apache.tika.metadata.DublinCore;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.mime.MimeTypes;
import org.apache.tika.parser.AutoDetectParser;
import org.apache.tika.parser.ParseContext;
import org.apache.tika.parser.Parser;
import org.apache.tika.sax.BodyContentHandler;
import org.xml.sax.ContentHandler;
import org.xml.sax.SAXException;
public class Main {
private static SolrServer solr;
public static void main(String[] args) throws IOException, SAXException, TikaException {
try {
solr = new HttpSolrServer("http://localhost:8983/solr/#/"); //create solr connection
//solr.deleteByQuery( "*:*" ); //delete everything in the index; good for testing
//location of source documents
//later this will be switched to a database
String path = "C:\\content\\";
String file_html = path + "mobydick.htm";
String file_txt = path + "/home/ben/abc.warc";
String file_pdf = path + "callofthewild.pdf";
processDocument(file_html);
processDocument(file_txt);
processDocument(file_pdf);
solr.commit(); //after all docs are added, commit to the index
//now you can search at http://localhost:8983/solr/browse
}
catch (Exception ex) {
System.out.println(ex.getMessage());
}
}
private static void processDocument(String pathfilename) {
try {
InputStream input = new FileInputStream(new File(pathfilename));
//use Apache Tika to convert documents in different formats to plain text
ContentHandler textHandler = new BodyContentHandler(10*1024*1024);
Metadata meta = new Metadata();
Parser parser = new AutoDetectParser();
//handles documents in different formats:
ParseContext context = new ParseContext();
parser.parse(input, textHandler, meta, context); //convert to plain text
//collect metadata and content from Tika and other sources
//document id must be unique, use guide
UUID guid = java.util.UUID.randomUUID();
String docid = guid.toString();
//Dublin Core metadata (partial set)
String doctitle = meta.get(DublinCore.TITLE);
String doccreator = meta.get(DublinCore.CREATOR);
//other metadata
String docurl = pathfilename; //document url
//content
String doccontent = textHandler.toString();
//call to index
indexDocument(docid, doctitle, doccreator, docurl, doccontent);
}
catch (Exception ex) {
System.out.println(ex.getMessage());
}
}
private static void indexDocument(String docid, String doctitle, String doccreator, String docurl, String doccontent) {
try {
SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", docid);
//map metadata fields to default schema
//location: path\solr-4.7.2\example\solr\collection1\conf\schema.xml
//Dublin Core
//thought: schema could be modified to use Dublin Core
doc.addField("title", doctitle);
doc.addField("author", doccreator);
//other metadata
doc.addField("url", docurl);
//content (and text)
//per schema, the content field is not indexed by default, used for returning and highlighting document content
//the schema "copyField" command automatically copies this to the "text" field which is indexed
doc.addField("content", doccontent);
//indexing
//when a field is indexed, like "text", Solr will handle tokenization, stemming, removal of stopwords etc, per the schema defn
//add to index
solr.add(doc);
}
catch (Exception ex) {
System.out.println(ex.getMessage());
} } }
The Error I got
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/http/NoHttpResponseException
at Main.main(Main.java:28)
Caused by: java.lang.ClassNotFoundException: org.apache.http.NoHttpResponseException
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 1 more

This does not seem to be Tika related, so I would focus on Solr.
Specifically on how you included Solr libraries and its dependencies. If you pulled them through Maven, they should have worked. But if you did it manually, perhaps you missed one or two.
Specifically, the error message is about a missing class that is distributed with Apache Commons HTTP (client) library. Perhaps are missing it in dependencies or on the classpath.

Exception thrown when trying to load Pentaho report through java program

This code was mostly taken from Pentaho's website on sample programs. The goal is to use their API to generate HTML output with a Pentaho report.
The report was created with Pentaho 3.5.0-RC2
The version of jars I am using are the following:
pentaho-reporting-engine-classic-core-3.5.0-RC2.jar
libloader-1.1.1.jar
libbase-1.1.1.jar
The exception I am getting is the following:
org.pentaho.reporting.libraries.resourceloader.ContentNotRecognizedException: None of the selected factories was able to handle the given data: ResourceKey{schema=org.pentaho.reporting.libraries.resourceloader.loader.URLResourceLoader, identifier=file:/C:/Users/elias.kopsiaftis/workspace_experimental/TestPentahoReport/bin/Test.prpt, factoryParameters={}, parent=null}
at org.pentaho.reporting.libraries.resourceloader.DefaultResourceManagerBackend.create(DefaultResourceManagerBackend.java:295)
at org.pentaho.reporting.libraries.resourceloader.ResourceManager.create(ResourceManager.java:387)
at org.pentaho.reporting.libraries.resourceloader.ResourceManager.create(ResourceManager.java:342)
at org.pentaho.reporting.libraries.resourceloader.ResourceManager.createDirectly(ResourceManager.java:205)
at ReportTester.getReportDefinition(ReportTester.java:74)
at ReportTester.getReport(ReportTester.java:25)
at ReportTester.main(ReportTester.java:84)
java.lang.NullPointerException
at ReportTester.getReport(ReportTester.java:26)
at ReportTester.main(ReportTester.java:84)
And finally, the code snippet!
import java.io.File;
import java.io.IOException;
import java.net.URL;
import java.util.Map;
import java.util.HashMap;
import java.io.*;
import org.pentaho.reporting.engine.classic.core.DataFactory;
import org.pentaho.reporting.engine.classic.core.MasterReport;
import org.pentaho.reporting.engine.classic.core.ReportProcessingException;
import org.pentaho.reporting.engine.classic.core.modules.output.table.html.HtmlReportUtil;
import org.pentaho.reporting.libraries.resourceloader.Resource;
import org.pentaho.reporting.libraries.base.util.StackableException;
import org.pentaho.reporting.libraries.resourceloader.ResourceException;
import org.pentaho.reporting.libraries.resourceloader.ResourceManager;
public class ReportTester {
private String reportName = "FormCompleteUsagePerPartnerByMonth.prpt";
private String reportPath = "";
public void getReport() {
try {
final MasterReport report = getReportDefinition();
report.getParameterValues().put("partner", "");
report.getParameterValues().put("startingdate", "2015-03-01");
report.getParameterValues().put("endingdate", "2015-03-05");
HtmlReportUtil.createStreamHTML(report, System.out);
} catch(Exception e) {
e.printStackTrace();
}
}
private MasterReport getReportDefinition() {
try {
// Using the classloader, get the URL to the reportDefinition file
final ClassLoader classloader = this.getClass().getClassLoader();
final URL reportDefinitionURL = classloader.getResource(reportPath + reportName);
if(reportDefinitionURL == null) {
System.out.println("URL was null");
}
// Parse the report file
final ResourceManager resourceManager = new ResourceManager();
resourceManager.registerDefaults();
final Resource directly = resourceManager.createDirectly(reportDefinitionURL, MasterReport.class);
return (MasterReport) directly.getResource();
} catch (Exception e) {
e.printStackTrace();
}
return null;
}
public static void main(String[] args) {
ReportTester rt = new ReportTester();
rt.getReport();
}
}
Any advice on this would be much appreciated as I have been scratching my head on this all day. Thanks in advance!

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Error indexing text from Apache Tika in Solr - java

Related

java.lang.ClassNotFoundException: org.junit.Assert / Working with docising API

Set up URI or catalog resolver with Saxon/XQuery

EDI Must be a minimum of 1 instances of segment [UNS]

Apache Tika and Apache Solr integration through Java API

Exception thrown when trying to load Pentaho report through java program

Categories

Resources