Un/marshalling base64 encoded binary data as stream

Un/marshalling base64 encoded binary data as stream - java

I have an application importing and exporting data from an Oracle database to/from XML using JAXB. Now there are some BLOB fields in the DB containing uploaded files which I would like to have in the XML as a base64 encoded string. This works quite well out of the box with JAXB using #XmlSchemaType(name = "base64Binary") as done below:
#XmlType
public class DocumentTemplateFile {
// other fields ommited
#XmlElement(required = true)
#XmlSchemaType(name = "base64Binary")
private byte[] data;
// other code ommited
}
The problem with this solution is that the whole file content is stored in memory because of the byte array. Depending on the size of the file this could cause some issues.
I was therefore wondering if there's a way to create an XmlAdapter or similar that gets Streams from and to the file, so that I can stream it directly to / from the DB's BLOB without having the whole content in memory. I was thinking on something similar to this:
public class BlobXmlAdapter extends XmlAdapter<InputStream, OutputStream> {
#Override
public InputStream marshal(final OutputStream value) throws Exception {
return null;
}
#Override
public OutputStream unmarshal(final InputStream value) throws Exception {
return null;
}
}
This is obviously only an illustrative example so that you can have an idea of what I'm looking for. The end solution doesn't necessarily have to make use of XmlAdaters. All I need is a way to hook on the un/marshalling process and stream the data through a buffer / queue rather than storing everything in a byte array.

this solution uses following third party library. you should use following maven dependency:
<dependency>
<groupId>jlibs</groupId>
<artifactId>jlibs-xsd</artifactId>
<version>2.0</version>
</dependency>
<repository>
<id>jlibs-snapshots-repository</id>
<name>JLibs Snapshots Repository</name>
<url>https://raw.githubusercontent.com/santhosh-tekuri/maven-repository/master</url>
<layout>default</layout>
</repository>
We need to use following custom XmlAdapter:
import javax.xml.bind.annotation.adapters.XmlAdapter;
import java.io.File;
/**
* #author Santhosh Kumar Tekuri
*/
public class Base64Adapter extends XmlAdapter<String, File>{
#Override
public File unmarshal(String v) throws Exception{
return new File(v);
}
#Override
public String marshal(File v) throws Exception{
throw new UnsupportedOperationException();
}
}
now change your pojo to use above adapter:
import javax.xml.bind.annotation.XmlElement;
import javax.xml.bind.annotation.XmlRootElement;
import javax.xml.bind.annotation.XmlSchemaType;
import javax.xml.bind.annotation.adapters.XmlJavaTypeAdapter;
import java.io.File;
#XmlRootElement
public class DocumentTemplateFile {
#XmlElement(required = true)
public String userName;
#XmlElement(required = true)
#XmlSchemaType(name = "base64Binary")
#XmlJavaTypeAdapter(Base64Adapter.class)
public File data;
}
now the following helper class should be used to read xml file:
import jlibs.xml.Namespaces;
import jlibs.xml.xsd.DOMLSInputList;
import jlibs.xml.xsd.XSParser;
import jlibs.xml.xsd.XSUtil;
import org.apache.xerces.xs.XSElementDeclaration;
import org.apache.xerces.xs.XSModel;
import org.apache.xerces.xs.XSSimpleTypeDefinition;
import org.apache.xerces.xs.XSTypeDefinition;
import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.XMLFilterImpl;
import javax.xml.bind.JAXBContext;
import javax.xml.bind.SchemaOutputResolver;
import javax.xml.namespace.QName;
import javax.xml.parsers.SAXParserFactory;
import javax.xml.transform.Result;
import javax.xml.transform.sax.SAXSource;
import javax.xml.transform.stream.StreamResult;
import java.io.*;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
/**
* #author Santhosh Kumar Tekuri
*/
public class JAXBBlobUtil{
public static XSModel generateSchemas(Class clazz) throws Exception{
final Map<String, ByteArrayOutputStream> schemas = new HashMap<String, ByteArrayOutputStream>();
JAXBContext.newInstance(clazz).generateSchema(new SchemaOutputResolver(){
#Override
public Result createOutput(String namespaceUri, String suggestedFileName) throws IOException{
ByteArrayOutputStream bout = new ByteArrayOutputStream();
schemas.put(suggestedFileName, bout);
StreamResult result = new StreamResult(bout);
result.setSystemId(suggestedFileName);
return result;
}
});
DOMLSInputList lsInputList = new DOMLSInputList();
for(Map.Entry<String, ByteArrayOutputStream> entry : schemas.entrySet()){
ByteArrayInputStream bin = new ByteArrayInputStream(entry.getValue().toByteArray());
lsInputList.addStream(bin, entry.getKey(), null);
}
return new XSParser().parse(lsInputList);
}
private static Object unmarshal(Class clazz, InputSource is) throws Exception{
XSModel xsModel = generateSchemas(clazz);
JAXBContext context = JAXBContext.newInstance(clazz);
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setNamespaceAware(true);
XMLReader xmlReader = factory.newSAXParser().getXMLReader();
xmlReader = new Base64Filter(xmlReader, xsModel);
return context.createUnmarshaller().unmarshal(new SAXSource(xmlReader, is));
}
private static class Base64Filter extends XMLFilterImpl{
private XSModel schema;
private List<QName> xpath = new ArrayList();
private FileWriter fileWriter;
public Base64Filter(XMLReader parent, XSModel schema){
super(parent);
this.schema = schema;
}
#Override
public void startDocument() throws SAXException{
xpath.clear();
super.startDocument();
}
#Override
public void startElement(String uri, String localName, String qName, Attributes atts) throws SAXException{
super.startElement(uri, localName, qName, atts);
xpath.add(new QName(uri, localName));
XSElementDeclaration elem = XSUtil.findElementDeclaration(schema, this.xpath);
if(elem!=null){
XSTypeDefinition type = elem.getTypeDefinition();
if(type.getTypeCategory()==XSTypeDefinition.SIMPLE_TYPE){
XSSimpleTypeDefinition simpleType = (XSSimpleTypeDefinition)type;
while(!Namespaces.URI_XSD.equals(simpleType.getNamespace()))
simpleType = (XSSimpleTypeDefinition)simpleType.getBaseType();
if("base64Binary".equals(simpleType.getName())){
try{
File file = File.createTempFile("data", "binary");
file.deleteOnExit();
fileWriter = new FileWriter(file);
String absolutePath = file.getAbsolutePath();
super.characters(absolutePath.toCharArray(), 0, absolutePath.length());
}catch(IOException ex){
throw new SAXException(ex);
}
}
}
}
}
#Override
public void characters(char[] ch, int start, int length) throws SAXException{
try{
if(fileWriter==null)
super.characters(ch, start, length);
else
fileWriter.write(ch, start, length);
}catch(IOException ex){
throw new SAXException(ex);
}
}
#Override
public void endElement(String uri, String localName, String qName) throws SAXException{
xpath.remove(xpath.size() - 1);
try{
if(fileWriter!=null)
fileWriter.close();
fileWriter = null;
}catch(IOException ex){
throw new SAXException(ex);
}
super.endElement(uri, localName, qName);
}
};
}
Now read xml file as below:
public static void main(String[] args) throws Exception{
DocumentTemplateFile obj = (DocumentTemplateFile)unmarshal(DocumentTemplateFile.class, new InputSource("sample.xml"));
// obj.data refers to File which contains base64 encoded data
}

create custom XmlAdapter as below:
public class Base64FileAdapter extends XmlAdapter<String, File>{
#Override
public String marshal(File file) throws Exception {
// todo: read file and convert to base64 and return
}
#Override
public File unmarshal(String data) throws Exception {
File file = File.createTempFile("dataFile", "binary");
file.deleteOnExit();
//todo: base64 decode string data and write bytes to file
return file;
}
}
now inside you bean, use it:
#XmlElement(required = true)
#XmlJavaTypeAdapter(Base64FileAdapter.class)
private File dataFile;
now the entire binary content is stored in file. you can read/write from this file. and this file is deleted on jvm exit.

Related

StreamingFileSink bulk writer results in some checkpoint error when running in AWS EMR

Unable to use StreamingFileSink and store incoming events in compressed fashion.
I am trying to use StreamingFileSink to write unbounded event stream to S3. In the process, I would like to compress the data to make better use of storage size available.
I wrote a compressed string writer, by borrowing some code from SequenceFileWriterFactory from flink. It fails with the exception I described below.
If I try to use BucketingSink, it works great.
Using BucketingSink, I approached compressed string write as below. Again, I borrowed this code from some other pull request.
import org.apache.flink.streaming.connectors.fs.StreamWriterBase;
import org.apache.flink.streaming.connectors.fs.Writer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.compress.CodecPool;
import org.apache.hadoop.io.compress.CompressionCodec;
import org.apache.hadoop.io.compress.CompressionCodecFactory;
import org.apache.hadoop.io.compress.CompressionOutputStream;
import org.apache.hadoop.io.compress.Compressor;
import java.io.IOException;
public class CompressionStringWriter<T> extends StreamWriterBase<T> implements Writer<T> {
private static final long serialVersionUID = 3231207311080446279L;
private String codecName;
private String separator;
public String getCodecName() {
return codecName;
}
public String getSeparator() {
return separator;
}
private transient CompressionOutputStream compressedOutputStream;
public CompressionStringWriter(String codecName, String separator) {
this.codecName = codecName;
this.separator = separator;
}
public CompressionStringWriter(String codecName) {
this(codecName, System.lineSeparator());
}
protected CompressionStringWriter(CompressionStringWriter<T> other) {
super(other);
this.codecName = other.codecName;
this.separator = other.separator;
}
#Override
public void open(FileSystem fs, Path path) throws IOException {
super.open(fs, path);
Configuration conf = fs.getConf();
CompressionCodecFactory codecFactory = new CompressionCodecFactory(conf);
CompressionCodec codec = codecFactory.getCodecByName(codecName);
if (codec == null) {
throw new RuntimeException("Codec " + codecName + " not found");
}
Compressor compressor = CodecPool.getCompressor(codec, conf);
compressedOutputStream = codec.createOutputStream(getStream(), compressor);
}
#Override
public void close() throws IOException {
if (compressedOutputStream != null) {
compressedOutputStream.close();
compressedOutputStream = null;
} else {
super.close();
}
}
#Override
public void write(Object element) throws IOException {
getStream();
compressedOutputStream.write(element.toString().getBytes());
compressedOutputStream.write(this.separator.getBytes());
}
#Override
public CompressionStringWriter<T> duplicate() {
return new CompressionStringWriter<>(this);
}
}
BucketingSink<DeviceEvent> bucketingSink = new BucketingSink<>("s3://"+ this.bucketName + "/" + this.objectPrefix);
bucketingSink
.setBucketer(new OrgIdBasedBucketAssigner())
.setWriter(new CompressionStringWriter<DeviceEvent>("Gzip", "\n"))
.setPartPrefix("file-")
.setPartSuffix(".gz")
.setBatchSize(1_500_000);
The one with BucketingSink works.
Now my code snippets using StreamingFileSink involves the below set of code.
import org.apache.flink.api.common.serialization.BulkWriter;
import java.io.IOException;
public class CompressedStringBulkWriter<T> implements BulkWriter<T> {
private final CompressedStringWriter compressedStringWriter;
public CompressedStringBulkWriter(final CompressedStringWriter compressedStringWriter) {
this.compressedStringWriter = compressedStringWriter;
}
#Override
public void addElement(T element) throws IOException {
this.compressedStringWriter.write(element);
}
#Override
public void flush() throws IOException {
this.compressedStringWriter.flush();
}
#Override
public void finish() throws IOException {
this.compressedStringWriter.close();
}
}
import org.apache.flink.api.common.serialization.BulkWriter;
import org.apache.flink.core.fs.FSDataOutputStream;
import org.apache.hadoop.conf.Configuration;
import java.io.IOException;
public class CompressedStringBulkWriterFactory<T> implements BulkWriter.Factory<T> {
private SerializableHadoopConfiguration serializableHadoopConfiguration;
public CompressedStringBulkWriterFactory(final Configuration hadoopConfiguration) {
this.serializableHadoopConfiguration = new SerializableHadoopConfiguration(hadoopConfiguration);
}
#Override
public BulkWriter<T> create(FSDataOutputStream out) throws IOException {
return new CompressedStringBulkWriter(new CompressedStringWriter(out, serializableHadoopConfiguration.get(), "Gzip", "\n"));
}
}
import org.apache.flink.core.fs.FSDataOutputStream;
import org.apache.flink.core.fs.FileSystem;
import org.apache.flink.core.fs.Path;
import org.apache.flink.runtime.fs.hdfs.HadoopFileSystem;
import org.apache.flink.util.Preconditions;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.compress.CodecPool;
import org.apache.hadoop.io.compress.CompressionCodec;
import org.apache.hadoop.io.compress.CompressionCodecFactory;
import org.apache.hadoop.io.compress.CompressionOutputStream;
import org.apache.hadoop.io.compress.Compressor;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.io.IOException;
import java.io.Serializable;
public class CompressedStringWriter<T> implements Serializable {
private static final Logger LOG = LoggerFactory.getLogger(CompressedStringWriter.class);
private static final long serialVersionUID = 2115292142239557448L;
private String separator;
private transient CompressionOutputStream compressedOutputStream;
public CompressedStringWriter(FSDataOutputStream out, Configuration hadoopConfiguration, String codecName, String separator) {
this.separator = separator;
try {
Preconditions.checkNotNull(hadoopConfiguration, "Unable to determine hadoop configuration using path");
CompressionCodecFactory codecFactory = new CompressionCodecFactory(hadoopConfiguration);
CompressionCodec codec = codecFactory.getCodecByName(codecName);
Preconditions.checkNotNull(codec, "Codec " + codecName + " not found");
LOG.info("The codec name that was loaded from hadoop {}", codec);
Compressor compressor = CodecPool.getCompressor(codec, hadoopConfiguration);
this.compressedOutputStream = codec.createOutputStream(out, compressor);
LOG.info("Setup a compressor for codec {} and compressor {}", codec, compressor);
} catch (IOException ex) {
throw new RuntimeException("Unable to compose a hadoop compressor for the path", ex);
}
}
public void flush() throws IOException {
if (compressedOutputStream != null) {
compressedOutputStream.flush();
}
}
public void close() throws IOException {
if (compressedOutputStream != null) {
compressedOutputStream.close();
compressedOutputStream = null;
}
}
public void write(T element) throws IOException {
compressedOutputStream.write(element.toString().getBytes());
compressedOutputStream.write(this.separator.getBytes());
}
}
import org.apache.hadoop.conf.Configuration;
import java.io.IOException;
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;
import java.io.Serializable;
public class SerializableHadoopConfiguration implements Serializable {
private static final long serialVersionUID = -1960900291123078166L;
private transient Configuration hadoopConfig;
SerializableHadoopConfiguration(Configuration hadoopConfig) {
this.hadoopConfig = hadoopConfig;
}
Configuration get() {
return this.hadoopConfig;
}
// --------------------
private void writeObject(ObjectOutputStream out) throws IOException {
this.hadoopConfig.write(out);
}
private void readObject(ObjectInputStream in) throws IOException {
final Configuration config = new Configuration();
config.readFields(in);
if (this.hadoopConfig == null) {
this.hadoopConfig = config;
}
}
}
My actual flink job
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
Properties kinesisConsumerConfig = new Properties();
...
...
DataStream<DeviceEvent> kinesis =
env.addSource(new FlinkKinesisConsumer<>(this.streamName, new DeviceEventSchema(), kinesisConsumerConfig)).name("source")
.setParallelism(16)
.setMaxParallelism(24);
final StreamingFileSink<DeviceEvent> bulkCompressStreamingFileSink = StreamingFileSink.<DeviceEvent>forBulkFormat(
path,
new CompressedStringBulkWriterFactory<>(
BucketingSink.createHadoopFileSystem(
new Path("s3a://"+ this.bucketName + "/" + this.objectPrefix),
null).getConf()))
.withBucketAssigner(new OrgIdBucketAssigner())
.build();
deviceEventDataStream.addSink(bulkCompressStreamingFileSink).name("bulkCompressStreamingFileSink").setParallelism(16);
env.execute();
I expect data to be saved in S3 as multiple files. Unfortunately no files are being created.
In the logs, I see below exception
2019-05-15 22:17:20,855 INFO org.apache.flink.runtime.taskmanager.Task - Sink: bulkCompressStreamingFileSink (11/16) (c73684c10bb799a6e0217b6795571e22) switched from RUNNING to FAILED.
java.lang.Exception: Could not perform checkpoint 1 for operator Sink: bulkCompressStreamingFileSink (11/16).
at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:595)
at org.apache.flink.streaming.runtime.io.BarrierBuffer.notifyCheckpoint(BarrierBuffer.java:396)
at org.apache.flink.streaming.runtime.io.BarrierBuffer.processBarrier(BarrierBuffer.java:292)
at org.apache.flink.streaming.runtime.io.BarrierBuffer.getNextNonBlocked(BarrierBuffer.java:200)
at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:209)
at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:105)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.Exception: Could not complete snapshot 1 for operator Sink: bulkCompressStreamingFileSink (11/16).
at org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:422)
at org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.checkpointStreamOperator(StreamTask.java:1113)
at org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1055)
at org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:729)
at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:641)
at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:586)
... 8 more
Caused by: java.io.IOException: Stream closed.
at org.apache.flink.fs.s3.common.utils.RefCountedFile.requireOpened(RefCountedFile.java:117)
at org.apache.flink.fs.s3.common.utils.RefCountedFile.write(RefCountedFile.java:74)
at org.apache.flink.fs.s3.common.utils.RefCountedBufferingFileStream.flush(RefCountedBufferingFileStream.java:105)
at org.apache.flink.fs.s3.common.writer.S3RecoverableFsDataOutputStream.closeAndUploadPart(S3RecoverableFsDataOutputStream.java:199)
at org.apache.flink.fs.s3.common.writer.S3RecoverableFsDataOutputStream.closeForCommit(S3RecoverableFsDataOutputStream.java:166)
at org.apache.flink.streaming.api.functions.sink.filesystem.PartFileWriter.closeForCommit(PartFileWriter.java:71)
at org.apache.flink.streaming.api.functions.sink.filesystem.BulkPartWriter.closeForCommit(BulkPartWriter.java:63)
at org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.closePartFile(Bucket.java:239)
at org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.prepareBucketForCheckpointing(Bucket.java:280)
at org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.onReceptionOfCheckpoint(Bucket.java:253)
at org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.snapshotActiveBuckets(Buckets.java:244)
at org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.snapshotState(Buckets.java:235)
at org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink.snapshotState(StreamingFileSink.java:347)
at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.trySnapshotFunctionState(StreamingFunctionUtils.java:118)
at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.snapshotFunctionState(StreamingFunctionUtils.java:99)
at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.snapshotState(AbstractUdfStreamOperator.java:90)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:395)
So wondering, what am I missing.
I am using AWS EMR latest (5.23).

In CompressedStringBulkWriter#close(), you are calling the close() method on the CompressionCodecStream which also closes the underlying the stream i.e. Flink's FSDataOutputStream. It has to be opened for the checkpointing to be done properly by Flink's internal to guarantee recoverable stream. That is why you are getting
Caused by: java.io.IOException: Stream closed.
at org.apache.flink.fs.s3.common.utils.RefCountedFile.requireOpened(RefCountedFile.java:117)
at org.apache.flink.fs.s3.common.utils.RefCountedFile.write(RefCountedFile.java:74)
at org.apache.flink.fs.s3.common.utils.RefCountedBufferingFileStream.flush(RefCountedBufferingFileStream.java:105)
at org.apache.flink.fs.s3.common.writer.S3RecoverableFsDataOutputStream.closeAndUploadPart(S3RecoverableFsDataOutputStream.java:199)
at org.apache.flink.fs.s3.common.writer.S3RecoverableFsDataOutputStream.closeForCommit(S3RecoverableFsDataOutputStream.java:166)
So instead of compressedOutputStream.close(), use compressedOutputStream.finish() which just flushes everything that's in buffer to the outputstream without closing it. BTW, there is an inbuilt HadoopCompressionBulkWriter made available in the latest version Flink, you can also use that.

Compile Java code in-memory [duplicate]

This question already has answers here:
Compile code fully in memory with javax.tools.JavaCompiler [duplicate]
(7 answers)
Closed 6 years ago.
I want to treat a String as a Java file then compile and run it. In other words, use Java as a script language.
To get better performance, we should avoid writing .class files to disk.

This answer is from one of my blogs, Compile and Run Java Source Code in Memory.
Here are the three source code files.
MemoryJavaCompiler.java
package me.soulmachine.compiler;
import java.io.IOException;
import java.io.PrintWriter;
import java.io.Writer;
import java.lang.reflect.Method;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import javax.tools.*;
/**
* Simple interface to Java compiler using JSR 199 Compiler API.
*/
public class MemoryJavaCompiler {
private javax.tools.JavaCompiler tool;
private StandardJavaFileManager stdManager;
public MemoryJavaCompiler() {
tool = ToolProvider.getSystemJavaCompiler();
if (tool == null) {
throw new RuntimeException("Could not get Java compiler. Please, ensure that JDK is used instead of JRE.");
}
stdManager = tool.getStandardFileManager(null, null, null);
}
/**
* Compile a single static method.
*/
public Method compileStaticMethod(final String methodName, final String className,
final String source)
throws ClassNotFoundException {
final Map<String, byte[]> classBytes = compile(className + ".java", source);
final MemoryClassLoader classLoader = new MemoryClassLoader(classBytes);
final Class clazz = classLoader.loadClass(className);
final Method[] methods = clazz.getDeclaredMethods();
for (final Method method : methods) {
if (method.getName().equals(methodName)) {
if (!method.isAccessible()) method.setAccessible(true);
return method;
}
}
throw new NoSuchMethodError(methodName);
}
public Map<String, byte[]> compile(String fileName, String source) {
return compile(fileName, source, new PrintWriter(System.err), null, null);
}
/**
* compile given String source and return bytecodes as a Map.
*
* #param fileName source fileName to be used for error messages etc.
* #param source Java source as String
* #param err error writer where diagnostic messages are written
* #param sourcePath location of additional .java source files
* #param classPath location of additional .class files
*/
private Map<String, byte[]> compile(String fileName, String source,
Writer err, String sourcePath, String classPath) {
// to collect errors, warnings etc.
DiagnosticCollector<JavaFileObject> diagnostics =
new DiagnosticCollector<JavaFileObject>();
// create a new memory JavaFileManager
MemoryJavaFileManager fileManager = new MemoryJavaFileManager(stdManager);
// prepare the compilation unit
List<JavaFileObject> compUnits = new ArrayList<JavaFileObject>(1);
compUnits.add(fileManager.makeStringSource(fileName, source));
return compile(compUnits, fileManager, err, sourcePath, classPath);
}
private Map<String, byte[]> compile(final List<JavaFileObject> compUnits,
final MemoryJavaFileManager fileManager,
Writer err, String sourcePath, String classPath) {
// to collect errors, warnings etc.
DiagnosticCollector<JavaFileObject> diagnostics =
new DiagnosticCollector<JavaFileObject>();
// javac options
List<String> options = new ArrayList<String>();
options.add("-Xlint:all");
// options.add("-g:none");
options.add("-deprecation");
if (sourcePath != null) {
options.add("-sourcepath");
options.add(sourcePath);
}
if (classPath != null) {
options.add("-classpath");
options.add(classPath);
}
// create a compilation task
javax.tools.JavaCompiler.CompilationTask task =
tool.getTask(err, fileManager, diagnostics,
options, null, compUnits);
if (task.call() == false) {
PrintWriter perr = new PrintWriter(err);
for (Diagnostic diagnostic : diagnostics.getDiagnostics()) {
perr.println(diagnostic);
}
perr.flush();
return null;
}
Map<String, byte[]> classBytes = fileManager.getClassBytes();
try {
fileManager.close();
} catch (IOException exp) {
}
return classBytes;
}
}
MemoryJavaFileManager.java
package me.soulmachine.compiler;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FilterOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.net.URI;
import java.nio.CharBuffer;
import java.util.HashMap;
import java.util.Map;
import javax.tools.FileObject;
import javax.tools.ForwardingJavaFileManager;
import javax.tools.JavaFileManager;
import javax.tools.JavaFileObject;
import javax.tools.JavaFileObject.Kind;
import javax.tools.SimpleJavaFileObject;
/**
* JavaFileManager that keeps compiled .class bytes in memory.
*/
#SuppressWarnings("unchecked")
final class MemoryJavaFileManager extends ForwardingJavaFileManager {
/** Java source file extension. */
private final static String EXT = ".java";
private Map<String, byte[]> classBytes;
public MemoryJavaFileManager(JavaFileManager fileManager) {
super(fileManager);
classBytes = new HashMap<>();
}
public Map<String, byte[]> getClassBytes() {
return classBytes;
}
public void close() throws IOException {
classBytes = null;
}
public void flush() throws IOException {
}
/**
* A file object used to represent Java source coming from a string.
*/
private static class StringInputBuffer extends SimpleJavaFileObject {
final String code;
StringInputBuffer(String fileName, String code) {
super(toURI(fileName), Kind.SOURCE);
this.code = code;
}
public CharBuffer getCharContent(boolean ignoreEncodingErrors) {
return CharBuffer.wrap(code);
}
}
/**
* A file object that stores Java bytecode into the classBytes map.
*/
private class ClassOutputBuffer extends SimpleJavaFileObject {
private String name;
ClassOutputBuffer(String name) {
super(toURI(name), Kind.CLASS);
this.name = name;
}
public OutputStream openOutputStream() {
return new FilterOutputStream(new ByteArrayOutputStream()) {
public void close() throws IOException {
out.close();
ByteArrayOutputStream bos = (ByteArrayOutputStream)out;
classBytes.put(name, bos.toByteArray());
}
};
}
}
public JavaFileObject getJavaFileForOutput(JavaFileManager.Location location,
String className,
Kind kind,
FileObject sibling) throws IOException {
if (kind == Kind.CLASS) {
return new ClassOutputBuffer(className);
} else {
return super.getJavaFileForOutput(location, className, kind, sibling);
}
}
static JavaFileObject makeStringSource(String fileName, String code) {
return new StringInputBuffer(fileName, code);
}
static URI toURI(String name) {
File file = new File(name);
if (file.exists()) {
return file.toURI();
} else {
try {
final StringBuilder newUri = new StringBuilder();
newUri.append("mfm:///");
newUri.append(name.replace('.', '/'));
if(name.endsWith(EXT)) newUri.replace(newUri.length() - EXT.length(), newUri.length(), EXT);
return URI.create(newUri.toString());
} catch (Exception exp) {
return URI.create("mfm:///com/sun/script/java/java_source");
}
}
}
}
MemoryClassLoader.java
package me.soulmachine.compiler;
import java.io.File;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLClassLoader;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.StringTokenizer;
/**
* ClassLoader that loads .class bytes from memory.
*/
final class MemoryClassLoader extends URLClassLoader {
private Map<String, byte[]> classBytes;
public MemoryClassLoader(Map<String, byte[]> classBytes,
String classPath, ClassLoader parent) {
super(toURLs(classPath), parent);
this.classBytes = classBytes;
}
public MemoryClassLoader(Map<String, byte[]> classBytes, String classPath) {
this(classBytes, classPath, ClassLoader.getSystemClassLoader());
}
public MemoryClassLoader(Map<String, byte[]> classBytes) {
this(classBytes, null, ClassLoader.getSystemClassLoader());
}
public Class load(String className) throws ClassNotFoundException {
return loadClass(className);
}
public Iterable<Class> loadAll() throws ClassNotFoundException {
List<Class> classes = new ArrayList<Class>(classBytes.size());
for (String name : classBytes.keySet()) {
classes.add(loadClass(name));
}
return classes;
}
protected Class findClass(String className) throws ClassNotFoundException {
byte[] buf = classBytes.get(className);
if (buf != null) {
// clear the bytes in map -- we don't need it anymore
classBytes.put(className, null);
return defineClass(className, buf, 0, buf.length);
} else {
return super.findClass(className);
}
}
private static URL[] toURLs(String classPath) {
if (classPath == null) {
return new URL[0];
}
List<URL> list = new ArrayList<URL>();
StringTokenizer st = new StringTokenizer(classPath, File.pathSeparator);
while (st.hasMoreTokens()) {
String token = st.nextToken();
File file = new File(token);
if (file.exists()) {
try {
list.add(file.toURI().toURL());
} catch (MalformedURLException mue) {}
} else {
try {
list.add(new URL(token));
} catch (MalformedURLException mue) {}
}
}
URL[] res = new URL[list.size()];
list.toArray(res);
return res;
}
}
Explanations:
In order to represent a Java source file in memory instead of disk, I defined a StringInputBuffer class in the MemoryJavaFileManager.java.
To save the compiled .class files in memory, I implemented a class MemoryJavaFileManager. The main idea is to override the function getJavaFileForOutput() to store bytecodes into a map.
To load the bytecodes in memory, I have to implement a customized classloader MemoryClassLoader, which reads bytecodes in the map and turn them into classes.
Here is a unite test.
package me.soulmachine.compiler;
import org.junit.Test;
import java.io.IOException;
import java.lang.reflect.InvocationTargetException;
import java.lang.reflect.Method;
import java.util.ArrayList;
import static org.junit.Assert.assertEquals;
public class MemoryJavaCompilerTest {
private final static MemoryJavaCompiler compiler = new MemoryJavaCompiler();
#Test public void compileStaticMethodTest()
throws ClassNotFoundException, InvocationTargetException, IllegalAccessException {
final String source = "public final class Solution {\n"
+ "public static String greeting(String name) {\n"
+ "\treturn \"Hello \" + name;\n" + "}\n}\n";
final Method greeting = compiler.compileStaticMethod("greeting", "Solution", source);
final Object result = greeting.invoke(null, "soulmachine");
assertEquals("Hello soulmachine", result.toString());
}
}
Reference
JavaCompiler.java from Cloudera Morphlines
How to create an object from a string in Java (how to eval a string)?
InMemoryJavaCompiler
Java-Runtime-Compiler
动态的Java - 无废话JavaCompilerAPI中文指南

What's the most standard Java way to store raw binary data along with XML?

I need to store a huge amount of binary data into a file, but I want also to read/write the header of that file in XML format.
Yes, I could just store the binary data into some XML value and let it be serialized using base64 encoding. But this wouldn't be space-efficient.
Can I "mix" XML data and raw binary data in a more-or-less standardized way?
I was thinking about two options:
Is there a way to do this using JAXB?
Or is there a way to take some existing XML data and append binary data to it, in such a way that the the boundary is recognized?
Isn't the concept I'm looking for somehow used by / for SOAP?
Or is it used in the email standard? (Separation of binary attachments)
Scheme of what I'm trying to achieve:
[meta-info-about-boundary][XML-data][boundary][raw-binary-data]
Thank you!

You can leverage AttachementMarshaller & AttachmentUnmarshaller for this. This is the bridge used by JAXB/JAX-WS to pass binary content as attachments. You can leverage this same mechanism to do what you want.
http://download.oracle.com/javase/6/docs/api/javax/xml/bind/attachment/package-summary.html
PROOF OF CONCEPT
Below is how it could be implemented. This should work with any JAXB impl (it works for me with EclipseLink JAXB (MOXy), and the reference implementation).
Message Format
[xml_length][xml][attach1_length][attach1]...[attachN_length][attachN]
Root
This is an object with multiple byte[] properties.
import javax.xml.bind.annotation.XmlRootElement;
#XmlRootElement
public class Root {
private byte[] foo;
private byte[] bar;
public byte[] getFoo() {
return foo;
}
public void setFoo(byte[] foo) {
this.foo = foo;
}
public byte[] getBar() {
return bar;
}
public void setBar(byte[] bar) {
this.bar = bar;
}
}
Demo
This class has is used to demonstrate how MessageWriter and MessageReader are used:
import java.io.FileInputStream;
import java.io.FileOutputStream;
import javax.xml.bind.JAXBContext;
public class Demo {
public static void main(String[] args) throws Exception {
JAXBContext jc = JAXBContext.newInstance(Root.class);
Root root = new Root();
root.setFoo("HELLO WORLD".getBytes());
root.setBar("BAR".getBytes());
MessageWriter writer = new MessageWriter(jc);
FileOutputStream outStream = new FileOutputStream("file.xml");
writer.write(root, outStream);
outStream.close();
MessageReader reader = new MessageReader(jc);
FileInputStream inStream = new FileInputStream("file.xml");
Root root2 = (Root) reader.read(inStream);
inStream.close();
System.out.println(new String(root2.getFoo()));
System.out.println(new String(root2.getBar()));
}
}
MessageWriter
Is responsible for writing the message to the desired format:
import java.io.ByteArrayOutputStream;
import java.io.ObjectOutputStream;
import java.io.OutputStream;
import java.util.ArrayList;
import java.util.List;
import javax.activation.DataHandler;
import javax.xml.bind.JAXBContext;
import javax.xml.bind.Marshaller;
import javax.xml.bind.attachment.AttachmentMarshaller;
public class MessageWriter {
private JAXBContext jaxbContext;
public MessageWriter(JAXBContext jaxbContext) {
this.jaxbContext = jaxbContext;
}
/**
* Write the message in the following format:
* [xml_length][xml][attach1_length][attach1]...[attachN_length][attachN]
*/
public void write(Object object, OutputStream stream) {
try {
Marshaller marshaller = jaxbContext.createMarshaller();
marshaller.setProperty(Marshaller.JAXB_FRAGMENT, true);
BinaryAttachmentMarshaller attachmentMarshaller = new BinaryAttachmentMarshaller();
marshaller.setAttachmentMarshaller(attachmentMarshaller);
ByteArrayOutputStream xmlStream = new ByteArrayOutputStream();
marshaller.marshal(object, xmlStream);
byte[] xml = xmlStream.toByteArray();
xmlStream.close();
ObjectOutputStream messageStream = new ObjectOutputStream(stream);
messageStream.write(xml.length); //[xml_length]
messageStream.write(xml); // [xml]
for(Attachment attachment : attachmentMarshaller.getAttachments()) {
messageStream.write(attachment.getLength()); // [attachX_length]
messageStream.write(attachment.getData(), attachment.getOffset(), attachment.getLength()); // [attachX]
}
messageStream.flush();
} catch(Exception e) {
throw new RuntimeException(e);
}
}
private static class BinaryAttachmentMarshaller extends AttachmentMarshaller {
private static final int THRESHOLD = 10;
private List<Attachment> attachments = new ArrayList<Attachment>();
public List<Attachment> getAttachments() {
return attachments;
}
#Override
public String addMtomAttachment(DataHandler data, String elementNamespace, String elementLocalName) {
return null;
}
#Override
public String addMtomAttachment(byte[] data, int offset, int length, String mimeType, String elementNamespace, String elementLocalName) {
if(data.length < THRESHOLD) {
return null;
}
int id = attachments.size() + 1;
attachments.add(new Attachment(data, offset, length));
return "cid:" + String.valueOf(id);
}
#Override
public String addSwaRefAttachment(DataHandler data) {
return null;
}
#Override
public boolean isXOPPackage() {
return true;
}
}
public static class Attachment {
private byte[] data;
private int offset;
private int length;
public Attachment(byte[] data, int offset, int length) {
this.data = data;
this.offset = offset;
this.length = length;
}
public byte[] getData() {
return data;
}
public int getOffset() {
return offset;
}
public int getLength() {
return length;
}
}
}
MessageReader
Is responsible for reading the message:
import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.ObjectInputStream;
import java.io.OutputStream;
import java.util.HashMap;
import java.util.Map;
import javax.activation.DataHandler;
import javax.activation.DataSource;
import javax.xml.bind.JAXBContext;
import javax.xml.bind.Unmarshaller;
import javax.xml.bind.attachment.AttachmentUnmarshaller;
public class MessageReader {
private JAXBContext jaxbContext;
public MessageReader(JAXBContext jaxbContext) {
this.jaxbContext = jaxbContext;
}
/**
* Read the message from the following format:
* [xml_length][xml][attach1_length][attach1]...[attachN_length][attachN]
*/
public Object read(InputStream stream) {
try {
ObjectInputStream inputStream = new ObjectInputStream(stream);
int xmlLength = inputStream.read(); // [xml_length]
byte[] xmlIn = new byte[xmlLength];
inputStream.read(xmlIn); // [xml]
BinaryAttachmentUnmarshaller attachmentUnmarshaller = new BinaryAttachmentUnmarshaller();
int id = 1;
while(inputStream.available() > 0) {
int length = inputStream.read(); // [attachX_length]
byte[] data = new byte[length]; // [attachX]
inputStream.read(data);
attachmentUnmarshaller.getAttachments().put("cid:" + String.valueOf(id++), data);
}
Unmarshaller unmarshaller = jaxbContext.createUnmarshaller();
unmarshaller.setAttachmentUnmarshaller(attachmentUnmarshaller);
ByteArrayInputStream byteInputStream = new ByteArrayInputStream(xmlIn);
Object object = unmarshaller.unmarshal(byteInputStream);
byteInputStream.close();
inputStream.close();
return object;
} catch(Exception e) {
throw new RuntimeException(e);
}
}
private static class BinaryAttachmentUnmarshaller extends AttachmentUnmarshaller {
private Map<String, byte[]> attachments = new HashMap<String, byte[]>();
public Map<String, byte[]> getAttachments() {
return attachments;
}
#Override
public DataHandler getAttachmentAsDataHandler(String cid) {
byte[] bytes = attachments.get(cid);
return new DataHandler(new ByteArrayDataSource(bytes));
}
#Override
public byte[] getAttachmentAsByteArray(String cid) {
return attachments.get(cid);
}
#Override
public boolean isXOPPackage() {
return true;
}
}
private static class ByteArrayDataSource implements DataSource {
private byte[] bytes;
public ByteArrayDataSource(byte[] bytes) {
this.bytes = bytes;
}
public String getContentType() {
return "application/octet-stream";
}
public InputStream getInputStream() throws IOException {
return new ByteArrayInputStream(bytes);
}
public String getName() {
return null;
}
public OutputStream getOutputStream() throws IOException {
return null;
}
}
}
For More Information
http://bdoughan.blogspot.com/2011/03/jaxb-web-services-and-binary-data.html

I followed the concept suggested by Blaise Doughan, but without attachment marshallers:
I let an XmlAdapter convert a byte[] to a URI-reference and back, while references point to separate files, where raw data is stored. The XML file and all binary files are then put into a zip.
It is similar to the approach of OpenOffice and the ODF format, which in fact is a zip with few XMLs and binary files.
(In the example code, no actual binary files are written, and no zip is created.)
Bindings.java
import java.net.*;
import java.util.*;
import javax.xml.bind.annotation.*;
import javax.xml.bind.annotation.adapters.*;
final class Bindings {
static final String SCHEME = "storage";
static final Class<?>[] ALL_CLASSES = new Class<?>[]{
Root.class, RawRef.class
};
static final class RawRepository
extends XmlAdapter<URI, byte[]> {
final SortedMap<String, byte[]> map = new TreeMap<>();
final String host;
private int lastID = 0;
RawRepository(String host) {
this.host = host;
}
#Override
public byte[] unmarshal(URI o) {
if (!SCHEME.equals(o.getScheme())) {
throw new Error("scheme is: " + o.getScheme()
+ ", while expected was: " + SCHEME);
} else if (!host.equals(o.getHost())) {
throw new Error("host is: " + o.getHost()
+ ", while expected was: " + host);
}
String key = o.getPath();
if (!map.containsKey(key)) {
throw new Error("key not found: " + key);
}
byte[] ret = map.get(key);
return Arrays.copyOf(ret, ret.length);
}
#Override
public URI marshal(byte[] o) {
++lastID;
String key = String.valueOf(lastID);
map.put(key, Arrays.copyOf(o, o.length));
try {
return new URI(SCHEME, host, "/" + key, null);
} catch (URISyntaxException ex) {
throw new Error(ex);
}
}
}
#XmlRootElement
#XmlType
static final class Root {
#XmlElement
final List<RawRef> element = new LinkedList<>();
}
#XmlType
static final class RawRef {
#XmlJavaTypeAdapter(RawRepository.class)
#XmlElement
byte[] raw = null;
}
}
Main.java
import java.io.*;
import javax.xml.bind.*;
public class _Run {
public static void main(String[] args)
throws Exception {
JAXBContext context = JAXBContext.newInstance(Bindings.ALL_CLASSES);
Marshaller marshaller = context.createMarshaller();
marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true);
Unmarshaller unmarshaller = context.createUnmarshaller();
Bindings.RawRepository adapter = new Bindings.RawRepository("myZipVFS");
marshaller.setAdapter(adapter);
Bindings.RawRef ta1 = new Bindings.RawRef();
ta1.raw = "THIS IS A STRING".getBytes();
Bindings.RawRef ta2 = new Bindings.RawRef();
ta2.raw = "THIS IS AN OTHER STRING".getBytes();
Bindings.Root root = new Bindings.Root();
root.element.add(ta1);
root.element.add(ta2);
StringWriter out = new StringWriter();
marshaller.marshal(root, out);
System.out.println(out.toString());
}
}
Output
<root>
<element>
<raw>storage://myZipVFS/1</raw>
</element>
<element>
<raw>storage://myZipVFS/2</raw>
</element>
</root>

This is not natively supportted by JAXB as you do not want serialize the binary data to XML, but can usually be done in a higher level when using JAXB.
The way I do this is with webservices (SOAP and REST) is using MIME multipart/mixed messages (check multipart specification). Initially designed for emails, works great to send xml with binary data and most webservice frameworks such as axis or jersey support it in an almost transparent way.
Here is an example of sending an object in XML together with a binary file with REST webservice using Jersey with the jersey-multipart extension.
XML object
#XmlRootElement
public class Book {
private String title;
private String author;
private int year;
//getter and setters...
}
Client
byte[] bin = some binary data...
Book b = new Book();
b.setAuthor("John");
b.setTitle("wild stuff");
b.setYear(2012);
MultiPart multiPart = new MultiPart();
multiPart.bodyPart(new BodyPart(b, MediaType.APPLICATION_XML_TYPE));
multiPart.bodyPart(new BodyPart(bin, MediaType.APPLICATION_OCTET_STREAM_TYPE));
response = service.path("rest").path("multipart").
type(MultiPartMediaTypes.MULTIPART_MIXED).
post(ClientResponse.class, multiPart);
Server
#POST
#Consumes(MultiPartMediaTypes.MULTIPART_MIXED)
public Response post(MultiPart multiPart) {
for(BodyPart part : multiPart.getBodyParts()) {
System.out.println(part.getMediaType());
}
return Response.status(Response.Status.ACCEPTED).
entity("Attachements processed successfully.").
type(MediaType.TEXT_PLAIN).build();
}
I tried to send a file with 110917 bytes. Using wireshark, you can see that the data is sent directly over HTTP like this:
Hypertext Transfer Protocol
POST /org.etics.test.rest.server/rest/multipart HTTP/1.1\r\n
Content-Type: multipart/mixed; boundary=Boundary_1_353042220_1343207087422\r\n
MIME-Version: 1.0\r\n
User-Agent: Java/1.7.0_04\r\n
Host: localhost:8080\r\n
Accept: text/html, image/gif, image/jpeg\r\n
Connection: keep-alive\r\n
Content-Length: 111243\r\n
\r\n
[Full request URI: http://localhost:8080/org.etics.test.rest.server/rest/multipart]
MIME Multipart Media Encapsulation, Type: multipart/mixed, Boundary: "Boundary_1_353042220_1343207087422"
[Type: multipart/mixed]
First boundary: --Boundary_1_353042220_1343207087422\r\n
Encapsulated multipart part: (application/xml)
Content-Type: application/xml\r\n\r\n
eXtensible Markup Language
<?xml
<book>
<author>
John
</author>
<title>
wild stuff
</title>
<year>
2012
</year>
</book>
Boundary: \r\n--Boundary_1_353042220_1343207087422\r\n
Encapsulated multipart part: (application/octet-stream)
Content-Type: application/octet-stream\r\n\r\n
Media Type
Media Type: application/octet-stream (110917 bytes)
Last boundary: \r\n--Boundary_1_353042220_1343207087422--\r\n
As you see, binary data is sent has octet-stream, with no waste of space, contrarly to what happens when sending binary data inline in the xml. The is just the very low overhead MIME envelope.
With SOAP the principle is the same (just that it will have the SOAP envelope).

I don't think so -- XML libraries generally aren't designed to work with XML+extra-data.
But you might be able to get away with something as simple as a special stream wrapper -- it would expose an "XML"-containing stream and a binary stream (from the special "format"). Then JAXB (or whatever else XML library) could play with the "XML" stream and the binary stream is kept separate.
Also remember to take "binary" vs. "text" files into account.
Happy coding.

Get line number from xml node - java

I have parsed an XML file and have gotten a Node that I am interested in. How can I now find the line number in the source XML file where this node occurs?
EDIT:
Currently I am using the SAXParser to parse my XML. However I will be happy with a solution using any parser.
Along with the Node, I also have the XPath expression for the node.
I need to get the line number because I am displaying the XML file in a textbox, and need to highlight the line where the node occured. Assume that the XML file is nicely formatted with sufficient line breaks.

I have got this working by following this example:
http://eyalsch.wordpress.com/2010/11/30/xml-dom-2/
This solution follows the method suggested by Michael Kay. Here is how you use it:
// XmlTest.java
import java.io.ByteArrayInputStream;
import java.io.InputStream;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
public class XmlTest {
public static void main(final String[] args) throws Exception {
String xmlString = "<foo>\n"
+ " <bar>\n"
+ " <moo>Hello World!</moo>\n"
+ " </bar>\n"
+ "</foo>";
InputStream is = new ByteArrayInputStream(xmlString.getBytes());
Document doc = PositionalXMLReader.readXML(is);
is.close();
Node node = doc.getElementsByTagName("moo").item(0);
System.out.println("Line number: " + node.getUserData("lineNumber"));
}
}
If you run this program, it will out put: "Line number: 3"
PositionalXMLReader is a slightly modified version of the example linked above.
// PositionalXMLReader.java
import java.io.IOException;
import java.io.InputStream;
import java.util.Stack;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.xml.sax.Attributes;
import org.xml.sax.Locator;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class PositionalXMLReader {
final static String LINE_NUMBER_KEY_NAME = "lineNumber";
public static Document readXML(final InputStream is) throws IOException, SAXException {
final Document doc;
SAXParser parser;
try {
final SAXParserFactory factory = SAXParserFactory.newInstance();
parser = factory.newSAXParser();
final DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
final DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
doc = docBuilder.newDocument();
} catch (final ParserConfigurationException e) {
throw new RuntimeException("Can't create SAX parser / DOM builder.", e);
}
final Stack<Element> elementStack = new Stack<Element>();
final StringBuilder textBuffer = new StringBuilder();
final DefaultHandler handler = new DefaultHandler() {
private Locator locator;
#Override
public void setDocumentLocator(final Locator locator) {
this.locator = locator; // Save the locator, so that it can be used later for line tracking when traversing nodes.
}
#Override
public void startElement(final String uri, final String localName, final String qName, final Attributes attributes)
throws SAXException {
addTextIfNeeded();
final Element el = doc.createElement(qName);
for (int i = 0; i < attributes.getLength(); i++) {
el.setAttribute(attributes.getQName(i), attributes.getValue(i));
}
el.setUserData(LINE_NUMBER_KEY_NAME, String.valueOf(this.locator.getLineNumber()), null);
elementStack.push(el);
}
#Override
public void endElement(final String uri, final String localName, final String qName) {
addTextIfNeeded();
final Element closedEl = elementStack.pop();
if (elementStack.isEmpty()) { // Is this the root element?
doc.appendChild(closedEl);
} else {
final Element parentEl = elementStack.peek();
parentEl.appendChild(closedEl);
}
}
#Override
public void characters(final char ch[], final int start, final int length) throws SAXException {
textBuffer.append(ch, start, length);
}
// Outputs text accumulated under the current node
private void addTextIfNeeded() {
if (textBuffer.length() > 0) {
final Element el = elementStack.peek();
final Node textNode = doc.createTextNode(textBuffer.toString());
el.appendChild(textNode);
textBuffer.delete(0, textBuffer.length());
}
}
};
parser.parse(is, handler);
return doc;
}
}

If you are using a SAX parser then the line number of an event can be obtained using the Locator object, which is notified to the ContentHandler via the setDocumentLocator() callback. This is called at the start of parsing, and you need to save the Locator; then after any event (such as startElement()), you can call methods such as getLineNumber() to obtain the current position in the source file. (After startElement(), the callback is defined to give you the line number on which the ">" of the start tag appears.)

Note that according to the spec (of Locator.getLineNumber()) the method returns the line number where the SAX-event ends!
In the case of "startElement()" this means:
Here the line number for Element is 1:
<Element></Element>
Here the line number for Element is 3:
<Element
attribute1="X"
attribute2="Y">
</Element>

priomsrb's answer is great and works. For my usecase i need to integrate it to an existing framework where e.g. the encoding is also covered. Therefore the following refactoring was applied to have a separate LineNumberHandler class.
Then the code will also work with a Sax InputSource where the encoding can be modified like this:
// read in the xml document
org.xml.sax.InputSource is=new org.xml.sax.InputSource();
is.setByteStream(instream);
if (encoding!=null) {
is.setEncoding(encoding);
if (Debug.CORE)
Debug.log("setting XML encoding to - "+is.getEncoding());
}
Separate LineNumberHandler
/**
* LineNumber Handler
* #author wf
*
*/
public static class LineNumberHandler extends DefaultHandler {
final Stack<Element> elementStack = new Stack<Element>();
final StringBuilder textBuffer = new StringBuilder();
private Locator locator;
private Document doc;
/**
* create a line number Handler for the given document
* #param doc
*/
public LineNumberHandler(Document doc) {
this.doc=doc;
}
#Override
public void setDocumentLocator(final Locator locator) {
this.locator = locator; // Save the locator, so that it can be used
// later for line tracking when traversing
// nodes.
}
#Override
public void startElement(final String uri, final String localName,
final String qName, final Attributes attributes) throws SAXException {
addTextIfNeeded();
final Element el = doc.createElement(qName);
for (int i = 0; i < attributes.getLength(); i++) {
el.setAttribute(attributes.getQName(i), attributes.getValue(i));
}
el.setUserData(LINE_NUMBER_KEY_NAME,
String.valueOf(this.locator.getLineNumber()), null);
elementStack.push(el);
}
#Override
public void endElement(final String uri, final String localName,
final String qName) {
addTextIfNeeded();
final Element closedEl = elementStack.pop();
if (elementStack.isEmpty()) { // Is this the root element?
doc.appendChild(closedEl);
} else {
final Element parentEl = elementStack.peek();
parentEl.appendChild(closedEl);
}
}
#Override
public void characters(final char ch[], final int start, final int length)
throws SAXException {
textBuffer.append(ch, start, length);
}
// Outputs text accumulated under the current node
private void addTextIfNeeded() {
if (textBuffer.length() > 0) {
final Element el = elementStack.peek();
final Node textNode = doc.createTextNode(textBuffer.toString());
el.appendChild(textNode);
textBuffer.delete(0, textBuffer.length());
}
}
};
PositionalXMLReader
public class PositionalXMLReader {
final static String LINE_NUMBER_KEY_NAME = "lineNumber";
/**
* read a document from the given input strem
*
* #param is
* - the input stream
* #return - the Document
* #throws IOException
* #throws SAXException
*/
public static Document readXML(final InputStream is)
throws IOException, SAXException {
final Document doc;
SAXParser parser;
try {
final SAXParserFactory factory = SAXParserFactory.newInstance();
parser = factory.newSAXParser();
final DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory
.newInstance();
final DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
doc = docBuilder.newDocument();
} catch (final ParserConfigurationException e) {
throw new RuntimeException("Can't create SAX parser / DOM builder.", e);
}
LineNumberHandler handler = new LineNumberHandler(doc);
parser.parse(is, handler);
return doc;
}
}
JUnit Testcase
package com.bitplan.common.impl;
import static org.junit.Assert.assertEquals;
import java.io.ByteArrayInputStream;
import java.io.InputStream;
import org.junit.Test;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import com.bitplan.bobase.PositionalXMLReader;
public class TestXMLWithLineNumbers {
/**
* get an Example XML Stream
* #return the example stream
*/
public InputStream getExampleXMLStream() {
String xmlString = "<foo>\n" + " <bar>\n"
+ " <moo>Hello World!</moo>\n" + " </bar>\n" + "</foo>";
InputStream is = new ByteArrayInputStream(xmlString.getBytes());
return is;
}
#Test
public void testXMLWithLineNumbers() throws Exception {
InputStream is = this.getExampleXMLStream();
Document doc = PositionalXMLReader.readXML(is);
is.close();
Node node = doc.getElementsByTagName("moo").item(0);
assertEquals("3", node.getUserData("lineNumber"));
}
}

How to parse GPX files with SAXReader?

I'm trying to parse a GPX file. I tried it with JDOM, but it does not work very well.
SAXBuilder builder = new SAXBuilder();
Document document = builder.build(filename);
Element root = document.getRootElement();
System.out.println("Root:\t" + root.getName());
List<Element> listTrks = root.getChildren("trk");
System.out.println("Count trk:\t" + listTrks.size());
for (Element tmpTrk : listTrks) {
List<Element> listTrkpts = tmpTrk.getChildren("trkpt");
System.out.println("Count pts:\t" + listTrkpts.size());
for (Element tmpTrkpt : listTrkpts) {
System.out.println(tmpTrkpt.getAttributeValue("lat") + ":" + tmpTrkpt.getAttributeValue("lat"));
}
}
I opened the example file (CC-BY-SA OpenStreetMap) and the output is just:
Root: gpx
Count trk: 0
What can I do? Should I us a SAXParserFactory (javax.xml.parsers.SAXParserFactory) and implement a Handler class?

Here is my gpx reader. It ignores some of the tags but I hope it will help.
package ch.perry.rando.geocode;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.text.DateFormat;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.Date;
import java.util.List;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
/**
*
* #author perrym
*/
public class GpxReader extends DefaultHandler {
private static final DateFormat TIME_FORMAT
= new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss'Z'");
private List<Trackpoint> track = new ArrayList<Trackpoint>();
private StringBuffer buf = new StringBuffer();
private double lat;
private double lon;
private double ele;
private Date time;
public static Trackpoint[] readTrack(InputStream in) throws IOException {
try {
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(true);
SAXParser parser = factory.newSAXParser();
GpxReader reader = new GpxReader();
parser.parse(in, reader);
return reader.getTrack();
} catch (ParserConfigurationException e) {
throw new IOException(e.getMessage());
} catch (SAXException e) {
throw new IOException(e.getMessage());
}
}
public static Trackpoint[] readTrack(File file) throws IOException {
InputStream in = new FileInputStream(file);
try {
return readTrack(in);
} finally {
in.close();
}
}
#Override
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
buf.setLength(0);
if (qName.equals("trkpt")) {
lat = Double.parseDouble(attributes.getValue("lat"));
lon = Double.parseDouble(attributes.getValue("lon"));
}
}
#Override
public void endElement(String uri, String localName, String qName)
throws SAXException {
if (qName.equals("trkpt")) {
track.add(Trackpoint.fromWGS84(lat, lon, ele, time));
} else if (qName.equals("ele")) {
ele = Double.parseDouble(buf.toString());
} else if (qName.equals("")) {
try {
time = TIME_FORMAT.parse(buf.toString());
} catch (ParseException e) {
throw new SAXException("Invalid time " + buf.toString());
}
}
}
#Override
public void characters(char[] chars, int start, int length)
throws SAXException {
buf.append(chars, start, length);
}
private Trackpoint[] getTrack() {
return track.toArray(new Trackpoint[track.size()]);
}
}

To read GPX files easily in Java see: http://sourceforge.net/p/gpsanalysis/wiki/Home/
example:
//gets points from a GPX file
final List points= GpxFileDataAccess.getPoints(new File("/path/toGpxFile.gpx"));

Ready to use, open source, and fully functional java GpxParser (and much more) here
https://sourceforge.net/projects/geokarambola/
Details here
https://plus.google.com/u/0/communities/110606810455751902142
With the above library parsing a GPX file is a one liner:
Gpx gpx = GpxFileIo.parseIn( "SomeGeoCollection.gpx" ) ;
Getting its points, routes or tracks trivial too:
for(Point pt: gpx.getPoints( ))
Location loc = new Location( pt.getLatitude( ), pt.getLongitude( ) ) ;

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Un/marshalling base64 encoded binary data as stream - java

Related

StreamingFileSink bulk writer results in some checkpoint error when running in AWS EMR

Compile Java code in-memory [duplicate]

What's the most standard Java way to store raw binary data along with XML?

Get line number from xml node - java

How to parse GPX files with SAXReader?

Categories

Resources