Am using StaX XMLEventReader and XMLEventWriter.
I need to make modified temporal copy of original xml file saved in byte array. If I do so (for debug, am writing to file):
public boolean isCrcCorrect(Path path) throws IOException, XPathExpressionException {
ByteArrayOutputStream output = new ByteArrayOutputStream();
XMLEventFactory eventFactory = XMLEventFactory.newInstance();
XMLEventReader reader = null;
XMLEventWriter writer = null;
StreamResult result;
String tagContent;
if (!fileData.currentFilePath.equals(path.toString())) {
parseFile(path);
}
try {
System.out.println(path.toString());
reader = XMLInputFactory.newInstance().createXMLEventReader(new FileReader(path.toString()));
//writer = XMLOutputFactory.newInstance().createXMLEventWriter(output);
writer = XMLOutputFactory.newInstance().createXMLEventWriter(new FileWriter("f:\\Projects\\iqpdct\\iqpdct-domain\\src\\main\\java\\de\\iq2dev\\domain\\util\\debug.xml"));
writer.add(reader);
writer.close();
} catch(XMLStreamException strEx) {
System.out.println(strEx.getMessage());
}
crc.reset();
crc.update(output.toByteArray());
System.out.println(crc.getValue());
//return fileData.file_crc == crc.getValue();
return false;
}
clone differs from origin
Source:
<VendorText textId="T_VendorText" />
Clone:
<VendorText textId="T_VendorText"></VendorText>
Why he is putting the end tag? There is no either in Source.
If you want a precise copy of a byte stream that happens to be an XML document, you must copy it as a byte stream. You can't copy it by providing a back-end to an XML parser because the purpose of the parser front-end to to isolate your code from features that can vary but which are semantically equivalent. Such as, in your case, the two means for indicating an empty element.
Related
The question is rather simple. I am using the aspose library to convert a pdf file to excel. The excel file is subsequently written to the database and this generated excel file is not needed in the future.
My method:
public void main(MultipartFile file) throws IOException {
InputStream inputStream = file.getInputStream();
Document document = new Document(inputStream);
ExcelSaveOptions options = new ExcelSaveOptions();
options.setFormat(ExcelSaveOptions.ExcelFormat.XLSX);
document.save("newExcelFile.xlsx", options);
}
In this method, the file is saved to the root folder of the project (if it is running locally). How can I not store this file, but make it temporary? My question is that this project is located on the server, and I would not like to create directories specifically for this file.
The Document.save() method has an overload for saving to an OutputStream (See here for the API reference).
Given that you can store the result to anything that implements an OutputStream, you can provide any implementation that you want - one useful option might be to use ByteArrayOutputStream to store the result in memory, or possibly - just use Files.createTempFile() and create a FileOutputStream for that.
For example, your code may be rewritten thus:
public byte[] convertToExcel(MultipartFile file) throws IOException {
InputStream inputStream = file.getInputStream();
Document document = new Document(inputStream);
ExcelSaveOptions options = new ExcelSaveOptions();
options.setFormat(ExcelSaveOptions.ExcelFormat.XLSX);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
document.save(baos, options);
return baos.toByteArray();
}
I want to stream a ZIP file containing several very large (~1GByte) XML files. I could read the data from each zip file into a buffer and create a XMLStream from that - but to save on memory I would prefer to process the data on the fly.
#Test
public void zipStreamTest() throws IOException, XMLStreamException {
FileInputStream fis = new FileInputStream("archive.zip");
ZipInputStream zis = new ZipInputStream(fis);
ZipEntry ei;
while ((ei = zis.getNextEntry()) != null){
XMLEventReader xr = XMLInputFactory.newInstance().createXMLEventReader(zis);
while (reader.hasNext()) {
XMLEvent xe = xr.nextEvent();
// do some xml event processing..
}
zis.closeEntry();
}
zis.close();
}
The problem: I'm getting a java.io.IOException: Stream closed when executing zis.closeEntry();. When I remove that line, the same error is thrown at zis.getNextEntry() which closes previous entries if they're still open automatically.
It seems that my XML stream reader is breaking the stream at the end of the XML file so that the rest of the zip can't be processed.
Do I have an implementation error or is my conception of how streams work incorrect?
Note: To make this a minimal reproduceable example all you need is a zip file "archive.zip" which contains any valid XML file (no subdirectories inside the zip!). You can then run the snippet using JUnit.
You could try to open separate InputStream for each entry using java.util.zip.ZipFile:
#Test
public void zipStreamTest() throws Exception {
ZipFile zipFile = new ZipFile("archive.zip");
Iterator<? extends ZipEntry> iterator = zipFile.entries().asIterator();
while (iterator.hasNext()) {
ZipEntry ze = iterator.next();
try (InputStream zis = zipFile.getInputStream(ze)) {
XMLEventReader reader = XMLInputFactory.newInstance().createXMLEventReader(zis);
while (reader.hasNext()) {
XMLEvent xe = reader.nextEvent();
// do some xml event processing
}
reader.close();
}
}
}
I would recommend using ZipFile instead of ZipInputStream, as suggested in answer by Alexandra Dudkina.
However, if you're processing a data stream while e.g. downloading, and therefore want to keep using ZipInputStream, you should wrap it in a CloseShieldInputStream from Apache Commons IO1 inside the getNextEntry() loop:
while ((ei = zis.getNextEntry()) != null) {
XMLEventReader xr = XMLInputFactory.newInstance().createXMLEventReader(new CloseShieldInputStream(zis));
// Process XML here
zis.closeEntry();
}
1) Or other similar helper class from third-party library of your choice.
code that should read html file and write the result another file the buffered writer writes the file but when the code is run with different urlit doesn't appends but rewrites the file and the previous content disappears
the solution recuired is that when jsoup iterates new html the result should add to output file and not rewrite
changed different writer types other than buffered writer
public class WriteFile
{
public static void main(String args[]) throws IOException
{
String url = "http://www.someurl.com/registers";
Document doc = Jsoup.connect(url).get();
Elements es = doc.getElementsByClass("a_code");
for (Element clas : es)
{
System.out.println(clas.text());
BufferedWriter writer = new BufferedWriter(new FileWriter("D://Author.html"));
writer.append(clas.text());
writer.close();
}
}
}
Don't mistake the append-method of the BufferedWriter as appending content to the file. It actually appends to the given writer.
To actually append additional content to the file you need to specify that when opening the file writer. FileWriter has an additional constructor parameter allowing to specify that:
new FileWriter("D://Author.html", /* append = */ true)
You may even be interested in the Java Files API instead, so you can spare instantating your own BufferedWriter, etc.:
Files.write(Paths.get("D://Author.html"), clas.text().getBytes(), StandardOpenOption.CREATE, StandardOpenOption.APPEND);
Your loop and what you are writing may further be simplifiable to something as follows (you may then even omit the APPEND-open option again, if that makes sense):
Files.write(Paths.get("D://Author.html"),
String.join("" /* or new line? */,
doc.getElementsByClass("a_code")
.eachText()
).getBytes(),
StandardOpenOption.CREATE, StandardOpenOption.APPEND);
Having read the documentation from copyBytes (of IOUtils), we can see here it's parameters:
copyBytes:
public static void copyBytes(InputStream in,
OutputStream out,
int buffSize,
boolean close) throws IOException
Copies from one stream to another.
Parameters:
in - InputStrem to read from
out - OutputStream to write to
buffSize - the size of the buffer
close - whether or not close the InputStream and OutputStream at the end. The streams are closed in the finally clause.
Throws:
IOException
So, with this information in mind- I've got a data-structure like this:
List<String> inputLinesObject = IOUtils.readLines(in, "UTF-8");
^which is what I hope would be an extensible array list of strings, that I can populate with data from the file that I'm reading with that copyBytes method.
However, here's the code I use when I call the copyBytes method:
IOUtils.copyBytes(in, inputLinesObject, 4096, false);
That place where you see inputLinesObject, that's where I'd like to put my extensible array list that can collect that data and convert it to string format- but the way I'm doing it now is not the right way- and I'm somehow stuck- I can't see the right way to collect that data in the format of an array list of strings (what is it at this point? As it comes from an inputSteam does that make it a byteArray?).
Here's the full program- it reads in files from HDFS and -is supposed to (though currently is not) output them to an array list of strings- which finally will logged to the console with System.out.println.
// this concatenates output to terminal from hdfs
public static void main(String[] args) throws IOException {
// supply this as input
String uri = args[0];
// reading in from hdfs
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(uri), conf);
FSDataInputStream in = null;
// create arraylist for hdfs file to flow into
//List<String> inputLinesObject = new ArrayList<String>();
List<String> inputLinesObject = IOUtils.readLines(in, "UTF-8");
// TODO: how to make this go to a file rather than to the System.out?
try
{
in = fs.open(new Path(uri));
// The way:
IOUtils.copyBytes(in, inputLinesObject, 4096, false);
}
finally{
IOUtils.closeStream(in);
}
Use ByteArrayOutputStream, see here:
// supply this as input
String uri = args[0];
// reading in from hdfs
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(uri), conf);
FSDataInputStream in = null;
// create arraylist for hdfs file to flow into
List<String> list = new ArrayList<String>(); // Initialize List
ByteArrayOutputStream baos = new ByteArrayOutputStream();
OutputStream os = new DataOutputStream(baos);
try
{
in = fs.open(new Path(uri));
// The way:
IOUtils.copyBytes(in, os, 4096, false);
}
finally{
IOUtils.closeStream(in);
}
byte[] data = baos.toByteArray();
String dataAsString = new String(data, "UTF-8"); // or whatever encoding
System.out.println(dataAsString);
I have tried different ways to write a string to a file.
File file = new File(eventPath)
file.withWriterAppend { it << xmlDocument }
OR
file << xmlDocument
In this way, the string when the file size reaches 1kb is interrupted.
If I do this way (as explained here: java: write to xml file)
File file = new File("foo")
if (file.exists()) {
assert file.delete()
assert file.createNewFile()
}
boolean append = true
FileWriter fileWriter = new FileWriter(file, append)
BufferedWriter buffWriter = new BufferedWriter(fileWriter)
100.times { buffWriter.write "foo" }
buffWriter.flush()
buffWriter.close()
Happens that the string gets repeated.
How can I use the first method without have limit on string size? Thanks
Does:
new File(eventPath).withWriterAppend { it.writeLine xmlDocument }
work?