TikaException: Failed to close temporary resource while parsing PDF - java

I am using Apache Tika on Windows 10, jre 1.8.0_241, and I've imported Tika 1.24.1 using ant. I have the code below to extract content from a PDF:
public class TikaExtraction {
public static void main(final String[] args) throws IOException, TikaException {
//Assume sample.txt is in your current directory
File file = new File("C:\\Users\\myPC\\Desktop\\testPDF.pdf");
//Instantiating Tika facade class
Tika tika = new Tika();
String filecontent = tika.parseToString(file);
System.out.println("Extracted Content: " + filecontent);
}
}
Getting below exception:
Exception in thread "main" org.apache.tika.exception.TikaException: Failed to close temporary resources
at org.apache.tika.io.TemporaryResources.dispose(TemporaryResources.java:174)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:150)
at org.apache.tika.Tika.parseToString(Tika.java:527)
at com.oracle.cegbu.filesearch.service.kafka.TikaExtraction.main(TikaExtraction.java:28)
Caused by: java.nio.file.FileSystemException: C:\Users\myPC\AppData\Local\Temp\apache-tika-6518312717498705085.tmp: The process cannot access the file because it is being used by another process.
at sun.nio.fs.WindowsException.translateToIOException(Unknown Source)
at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
at sun.nio.fs.WindowsFileSystemProvider.implDelete(Unknown Source)
at sun.nio.fs.AbstractFileSystemProvider.delete(Unknown Source)
at java.nio.file.Files.delete(Unknown Source)
at org.apache.tika.io.TemporaryResources$1.close(TemporaryResources.java:84)
at org.apache.tika.io.TemporaryResources.close(TemporaryResources.java:145)
at org.apache.tika.io.TemporaryResources.dispose(TemporaryResources.java:172)
... 3 more

Lets try to use the inputStream instead of File in try-in-resources
public class TikaExtraction {
public static void main(final String[] args) throws IOException, TikaException {
//Assume sample.txt is in your current directory
Tika tika = new Tika();
File file = new File("C:\\Users\\myPC\\Desktop\\testPDF.pdf");
//Instantiating Tika facade class
try(InputStream inputStream = new FileInputStream(file)) {
String filecontent = tika.parseToString(inputStream);
System.out.println("Extracted Content: " + filecontent);
}
}
}

Related

Not able to read from Parquet Reader along with Hadoop configuration using Java

I need to read parquet file from s3 using java & maven support.
public static void main(String[] args) throws IOException, URISyntaxException {
Path path = new Path("s3", "batch-dev", "/aman/part-e52b.c000.snappy.parquet");
Configuration conf = new Configuration();
conf.set("fs.s3.awsAccessKeyId", "xxx");
conf.set("fs.s3.awsSecretAccessKey", "xxxxx");
InputFile file = HadoopInputFile.fromPath(path, conf);
ParquetFileReader reader2 = ParquetFileReader.open(conf, path);
//MessageType schema = reader2.getFooter().getFileMetaData().getSchema();
//System.out.println(schema);
}
Using above code, give FileNotFoundException
Note that: Note that I am using s3 scheme and not s3a. Not sure whether we have support for s3 scheme in Hadoop.
Exception in thread "main" java.io.FileNotFoundException: s3://batch-dev/aman/part-e52b.c000.snappy.parquet: No such file or directory.
at org.apache.hadoop.fs.s3.S3FileSystem.getFileStatus(S3FileSystem.java:334)
at org.apache.parquet.hadoop.util.HadoopInputFile.fromPath(HadoopInputFile.java:39)
at com.bidgely.cloud.core.cass.gb.S3GBRawDataHandler.main(S3GBRawDataHandler.java:505)
However, with the same path if I use s3Client, I am able to get the object. But the problem here is that I can not read parquet data from input stream getting from below code.
public static void main(String args[]) {
AWSCredentials credentials = new BasicAWSCredentials("XXXXX", "XXXXX");
AmazonS3 s3Client = AmazonS3ClientBuilder.standard().withRegion("us-west-2").withCredentials(new AWSStaticCredentialsProvider(credentials)).build();
S3Object object = s3Client.getObject(new GetObjectRequest("batch-dev", "/aman/part-e52b.c000.snappy.parquet"));
System.out.println(object.getObjectContent());
}
Kindly help me with the solution. [I had to use java only].

i am trying to convert .jsp to .pdf format using pd4ml. when start executing my code i am getting execption below

public static void main(String[] args) {
try {
PdfViewerStarter jt = new PdfViewerStarter();
jt.doConversion("http://pd4ml.com/sample.htm", "D:/pd4ml.pdf");
} catch (Exception e) {
e.printStackTrace();
}
}
public void doConversion(String url, String outputPath)
throws InvalidParameterException, MalformedURLException, IOException {
File output = new File(outputPath);
java.io.FileOutputStream fos = new java.io.FileOutputStream(output);
PD4ML pd4ml = new PD4ML();
pd4ml.setHtmlWidth(userSpaceWidth);
pd4ml.setPageSize(pd4ml.changePageOrientation(PD4Constants.A4));
pd4ml.setPageInsetsMM(new Insets(topValue, leftValue, bottomValue, rightValue));
pd4ml.useTTF("c:/windows/fonts", true);
pd4ml.render(new URL(url), fos);
fos.close();
if (Desktop.isDesktopSupported()) {
Desktop.getDesktop().open(output);
} else {
System.out.println("Awt Desktop is not supported!");
}
System.out.println(outputPath + "\ndone.");
}
Error:
Error. ss_css2.jar is not in the classpath. See README.txt
Exception in thread "main" java.lang.NoClassDefFoundError: org/w3c/css/sac/CSSException
at org.zefer.html.doc.PD4MLHtmlParser.o00000(Unknown Source)
at org.zefer.html.doc.PD4MLHtmlParser.<init>(Unknown Source)
at org.zefer.pd4ml.PD4ML.super(Unknown Source)
at org.zefer.pd4ml.PD4ML.render(Unknown Source)
at TestForPdfPD4ML.doConversion(TestForPdfPD4ML.java:42)
at TestForPdfPD4ML.main(TestForPdfPD4ML.java:24)
Caused by: java.lang.ClassNotFoundException: org.w3c.css.sac.CSSException
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 6 more
It may be asking for the ClassPath of the Jar File where the needed (imported) libraries are contained. If you are running from Command Line try something like this:
java -cp .;"The full path where the Jar File is\jarfile.jar" YourRunningClassFile

Jackson writeValue() to file not working with relative path

I'm trying to write an object to json file by jackson.
If i provide an absolute path "D:/Projects/quiz-red/src/main/resources/com/models/Quizzes.json" it's working and file appears in the directory
But if i provide a relative path - "/com/models/Quizzes.json" i'm just getting Process finished with exit code 0 in console and nothing happens. What am I doing wrong?
There is my code:
public static void writeEntityToJson(Object jsonDataObject, String path) throws IOException {
ObjectWriter writer = mapper.writer(new DefaultPrettyPrinter());
writer.writeValue(new File(path), jsonDataObject);
}
public static void main(String[] args) throws IOException {
Quiz quiz = new Quiz(5L, "Title", "Short desc");
writeEntityToJson(quiz, "/com/models/Quizzes2.json");
}
I want to save a file to resources from DataProvider using relative path
Exception:
Exception in thread "main" java.io.FileNotFoundException: com\models\Quizzes5.json (The system cannot find the path specified)
at java.base/java.io.FileOutputStream.open0(Native Method)
at java.base/java.io.FileOutputStream.open(FileOutputStream.java:298)
at java.base/java.io.FileOutputStream.<init>(FileOutputStream.java:237)
at java.base/java.io.FileOutputStream.<init>(FileOutputStream.java:187)
at com.fasterxml.jackson.core.JsonFactory.createGenerator(JsonFactory.java:1223)
at com.fasterxml.jackson.databind.ObjectWriter.writeValue(ObjectWriter.java:942)
at com.utils.DataProvider.writeEntityToJson(DataProvider.java:33)
at com.utils.DataProvider.main(DataProvider.java:50)

java.lang.NoClassDefFoundError: math/geom2d/line/LinearShape2D (activiti)

i tray to convert bpmn2.0 file to JSON , but i have this error :
java.lang.NoClassDefFoundError: **math/geom2d/line/LinearShape2D**
my code :
public void convertXmlToJson() throws Exception {
XMLStreamReader streamReader = null ;
BpmnXMLConverter bpmnXMLConverter = new BpmnXMLConverter();
XMLInputFactory factory = XMLInputFactory.newInstance();
//get Reader connected to XML input from filename
Reader reader = new FileReader(filename);
streamReader = factory.createXMLStreamReader(reader);
ObjectNode node = new BpmnJsonConverter().convertToJson(bpmnXMLConverter.convertToBpmnModel(streamReader));
node.toString();
}
Well, one of your JARs in the build path is trying to load the class math.geom2d.line.LinearShape2D - but it is not in your build path, so it can not be found.
Add the jar with this class to the build path and it should work.
Seems like you need this jar:
http://geom-java.sourceforge.net/
http://geom-java.sourceforge.net/api/math/geom2d/line/class-use/LinearShape2D.html

java.lang.NoClassDefFoundError when reading hadoop SequenceFile

I am trying to read a SequenceFile with custom Writeable in it.
Here's the code:
public static void main(String[] args) throws IOException {
//String iFile = null;
String uri = "/tmp/part-r-00000";
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(uri), conf);
Path path = new Path(uri);
MyClass value = new MyClass();
SequenceFile.Reader reader = null;
try {
reader = new Reader(fs, path, conf);
while(reader.next(value)){
System.out.println(value.getUrl());
System.out.println(value.getHeader());
System.out.println(value.getImages().size());
break;
}
} catch (Exception e) {// Catch exception if any
System.err.println("Error: " + e.getMessage());
}
finally {
IOUtils.closeStream(reader);
}
}
When I run this, I get following exception:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/configuration/Configuration
at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<init>(DefaultMetricsSystem.java:37)
at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<clinit>(DefaultMetricsSystem.java:34)
at org.apache.hadoop.security.UgiInstrumentation.create(UgiInstrumentation.java:51)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:196)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:159)
at org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:216)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:409)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:395)
at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:1418)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1319)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:226)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:109)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:210)
at com.iathao.run.site.emr.DecryptMapReduceOutput.main(DecryptMapReduceOutput.java:32)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.configuration.Configuration
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
... 14 more
All libraries are packaged into the jar file and are present. What's wrong, and how do I fix this?
The hadoop-common-*.jar has to be included for the org.apache.commons.configuration.Configuration class. Put the jar as dependencies.

Categories

Resources