I have got an OutputStream which can be initialized as a chain of OutputStreams. There could be any level of chaining .Only thing guaranteed is that at the end of the chain is a FileOutputStream.
I need to recreate this chained outputStream with a modified Filename in FileOutputStream. This would have been possible if out variable (which stores the underlying chained outputStream) was accessible ; as shown below.
public OutputStream recreateChainedOutputStream(OutputStream os) throws IOException {
if(os instanceof FileOutputStream) {
return new FileOutputStream("somemodified.filename");
} else if (os instanceof FilterOutputStream) {
return recreateChainedOutputStream(os.out);
}
}
Is there any other way of achieving the same?
You can use reflection to access the os.out field of the FilterOutputStream, this has however some drawbacks:
If the other OutputStream is also a kind of RolloverOutputStream, you can have a hard time reconstructing it,
If the other OutputStream has custom settings, like GZip compression parameter, you cannot reliable read this
If there is a
A quick and dirty implementation of recreateChainedOutputStream( might be:
private final static Field out;
{
try {
out = FilterInputStream.class.getField("out");
out.setAccessible(true);
} catch(Exception e) {
throw new RuntimeException(e);
}
}
public OutputStream recreateChainedOutputStream(OutputStream out) throws IOException {
if (out instanceof FilterOutputStream) {
Class<?> c = ou.getClass();
COnstructor<?> con = c.getConstructor(OutputStream.class);
return con.invoke(this.out.get(out));
} else {
// Other output streams...
}
}
While this may be ok in your current application, this is a big no-no in the production world because the large amount of different kind of OutputStreams your application may recieve.
A better way to solve would be a kind of Function<String, OutputStream> that works as a factory to create OutputStreams for the named file. This way the external api keeps its control over the OutputStreams while your api can adress multiple file names. An example of this would be:
public class MyApi {
private final Function<String, OutputStream> fileProvider;
private OutputStream current;
public MyApi (Function<String, OutputStream> fileProvider, String defaultFile) {
this.fileProvider = fileProvider;
selectNewOutputFile(defaultFile);
}
public void selectNewOutputFile(String name) {
OutputStream current = this.current;
this.current = fileProvider.apply(name);
if(current != null) current.close();
}
}
This can then be used in other applications as:
MyApi api = new MyApi(name->new FileOutputStream(name));
For simple FileOutputStreams, or be used as:
MyApi api = new MyApi(name->
new GZIPOutputStream(
new CipherOutputStream(
new CheckedOutputStream(
new FileOutputStream(name),
new CRC32()),
chipper),
1024,
true)
);
For a file stream that stored checksummed using new CRC32(), chipped using chipper, gzip according to a 1024 buffer with sync write mode.
Related
I want to process files with a flink stream in which two lines belong together. In the first line there is a header and in the second line a corresponding text.
The files are located on my local file system. I am using the readFile(fileInputFormat, path, watchType, interval, pathFilter, typeInfo) method with a custom FileInputFormat.
My streaming job class looks like this:
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<Read> inputStream = env.readFile(new ReadInputFormatTest("path/to/monitored/folder"), "path/to/monitored/folder", FileProcessingMode.PROCESS_CONTINUOUSLY, 100);
inputStream.print();
env.execute("Flink Streaming Java API Skeleton");
and my ReadInputFormatTest like this:
public class ReadInputFormatTest extends FileInputFormat<Read> {
private transient FileSystem fileSystem;
private transient BufferedReader reader;
private final String inputPath;
private String headerLine;
private String readLine;
public ReadInputFormatTest(String inputPath) {
this.inputPath = inputPath;
}
#Override
public void open(FileInputSplit inputSplit) throws IOException {
FileSystem fileSystem = getFileSystem();
this.reader = new BufferedReader(new InputStreamReader(fileSystem.open(inputSplit.getPath())));
this.headerLine = reader.readLine();
this.readLine = reader.readLine();
}
private FileSystem getFileSystem() {
if (fileSystem == null) {
try {
fileSystem = FileSystem.get(new URI(inputPath));
} catch (URISyntaxException | IOException e) {
throw new RuntimeException(e);
}
}
return fileSystem;
}
#Override
public boolean reachedEnd() throws IOException {
return headerLine == null;
}
#Override
public Read nextRecord(Read r) throws IOException {
r.setHeader(headerLine);
r.setSequence(readLine);
headerLine = reader.readLine();
readLine = reader.readLine();
return r;
}
}
As expected, the headers and the text are stored together in one object. However, the file is read eight times. So the problem is the parallelization. Where and how can I specify that a file is processed only once, but several files in parallel?
Or do I have to change my custom FileInputFormat even further?
I would modify your source to emit the available filenames (instead of the actual file contents) and then add a new processor to read a name from the input stream and then emit pairs of lines. In other words, split the current source into a source followed by a processor. The processor can be made to run at any degree of parallelism and the source would be a single instance.
This is a followup to anonymous file streams reusing descriptors
As per my previous question, I can't depend on code like this (happens to work in JDK8, for now):
RandomAccessFile r = new RandomAccessFile(...);
FileInputStream f_1 = new FileInputStream(r.getFD());
// some io, not shown
f_1 = null;
f_2 = new FileInputStream(r.getFD());
// some io, not shown
f_2 = null;
f_3 = new FileInputStream(r.getFD());
// some io, not shown
f_3 = null;
However, to prevent accidental errors and as a form of self-documentation, I would like to invalidate each file stream after I'm done using it - without closing the underlying file descriptor.
Each FileInputStream is meant to be independent, with positioning controlled by the RandomAccessFile. I share the same FileDescriptor to prevent any race conditions arising from opening the same path multiple times. When I'm done with one FileInputStream, I want to invalidate it so as to make it impossible to accidentally read from it while using the second FileInputStream (which would cause the second FileInputStream to skip data).
How can I do this?
notes:
the libraries I use require compatibiity with java.io.*
if you suggest a library (I prefer builtin java semantics if at all possible), it must be commonly available (packaged) for linux (the main target) and usable on windows (experimental target)
but, windows support isn't a absolutely required
Edit: in response to a comment, here is my workflow:
RandomAccessFile r = new RandomAccessFile(String path, "r");
int header_read;
int header_remaining = 4; // header length, initially
byte[] ba = new byte[header_remaining];
ByteBuffer bb = new ByteBuffer.allocate(header_remaining);
while ((header_read = r.read(ba, 0, header_remaining) > 0) {
header_remaining -= header_read;
bb.put(ba, 0, header_read);
}
byte[] header = bb.array();
// process header, not shown
// the RandomAccessFile above reads only a small amount, so buffering isn't required
r.seek(0);
FileInputStream f_1 = new FileInputStream(r.getFD());
Library1Result result1 = library1.Main.entry_point(f_1)
// process result1, not shown
// Library1 reads the InputStream in large chunks, so buffering isn't required
// invalidate f_1 (this question)
r.seek(0)
int read;
while ((read = r.read(byte[4096] buffer)) > 0 && library1.continue()) {
library2.process(buffer, read);
}
// the RandomAccessFile above is read in large chunks, so buffering isn't required
// in a previous edit the RandomAccessFile was used to create a FileInputStream. Obviously that's not required, so ignore
r.seek(0)
Reader r_1 = new BufferedReader(new InputStreamReader(new FileInputStream(r.getFD())));
Library3Result result3 = library3.Main.entry_point(r_2)
// process result3, not shown
// I'm not sure how Library3 uses the reader, so I'm providing buffering
// invalidate r_1 (this question) - bonus: frees the buffer
r.seek(0);
FileInputStream f_2 = new FileInputStream(r.getFD());
Library1Result result1 = library1.Main.entry_point(f_2)
// process result1 (reassigned), not shown
// Yes, I actually have to call 'library1.Main.entry_point' *again* - same comments apply as from before
// invalidate f_2 (this question)
//
// I've been told to be careful when opening multiple streams from the same
// descriptor if one is buffered. This is very vague. I assume because I only
// ever use any stream once and exclusively, this code is safe.
//
A pure Java solution might be to create a forwarding decorator that checks on each method call whether the stream is validated or not. For InputStream this decorator may look like this:
public final class CheckedInputStream extends InputStream {
final InputStream delegate;
boolean validated;
public CheckedInputStream(InputStream stream) throws FileNotFoundException {
delegate = stream;
validated = true;
}
public void invalidate() {
validated = false;
}
void checkValidated() {
if (!validated) {
throw new IllegalStateException("Stream is invalidated.");
}
}
#Override
public int read() throws IOException {
checkValidated();
return delegate.read();
}
#Override
public int read(byte b[]) throws IOException {
checkValidated();
return read(b, 0, b.length);
}
#Override
public int read(byte b[], int off, int len) throws IOException {
checkValidated();
return delegate.read(b, off, len);
}
#Override
public long skip(long n) throws IOException {
checkValidated();
return delegate.skip(n);
}
#Override
public int available() throws IOException {
checkValidated();
return delegate.available();
}
#Override
public void close() throws IOException {
checkValidated();
delegate.close();
}
#Override
public synchronized void mark(int readlimit) {
checkValidated();
delegate.mark(readlimit);
}
#Override
public synchronized void reset() throws IOException {
checkValidated();
delegate.reset();
}
#Override
public boolean markSupported() {
checkValidated();
return delegate.markSupported();
}
}
You can use it like:
CheckedInputStream f_1 = new CheckedInputStream(new FileInputStream(r.getFD()));
// some io, not shown
f_1.invalidate();
f_1.read(); // throws IllegalStateException
Under unix you could generally avoid such problems by dup'ing a file descriptor.
Since java does not not offer such a feature one option would be a native library which exposes that. jnr-posix does that for example. On the other hand jnr depends on a lot more jdk implementation properties than your original question.
Is it possible to force Properties not to add the date comment in front? I mean something like the first line here:
#Thu May 26 09:43:52 CEST 2011
main=pkg.ClientMain
args=myargs
I would like to get rid of it altogether. I need my config files to be diff-identical unless there is a meaningful change.
Guess not. This timestamp is printed in private method on Properties and there is no property to control that behaviour.
Only idea that comes to my mind: subclass Properties, overwrite store and copy/paste the content of the store0 method so that the date comment will not be printed.
Or - provide a custom BufferedWriter that prints all but the first line (which will fail if you add real comments, because custom comments are printed before the timestamp...)
Given the source code or Properties, no, it's not possible. BTW, since Properties is in fact a hash table and since its keys are thus not sorted, you can't rely on the properties to be always in the same order anyway.
I would use a custom algorithm to store the properties if I had this requirement. Use the source code of Properties as a starter.
Based on https://stackoverflow.com/a/6184414/242042 here is the implementation I have written that strips out the first line and sorts the keys.
public class CleanProperties extends Properties {
private static class StripFirstLineStream extends FilterOutputStream {
private boolean firstlineseen = false;
public StripFirstLineStream(final OutputStream out) {
super(out);
}
#Override
public void write(final int b) throws IOException {
if (firstlineseen) {
super.write(b);
} else if (b == '\n') {
firstlineseen = true;
}
}
}
private static final long serialVersionUID = 7567765340218227372L;
#Override
public synchronized Enumeration<Object> keys() {
return Collections.enumeration(new TreeSet<>(super.keySet()));
}
#Override
public void store(final OutputStream out, final String comments) throws IOException {
super.store(new StripFirstLineStream(out), null);
}
}
Cleaning looks like this
final Properties props = new CleanProperties();
try (final Reader inStream = Files.newBufferedReader(file, Charset.forName("ISO-8859-1"))) {
props.load(inStream);
} catch (final MalformedInputException mie) {
throw new IOException("Malformed on " + file, mie);
}
if (props.isEmpty()) {
Files.delete(file);
return;
}
try (final OutputStream os = Files.newOutputStream(file)) {
props.store(os, "");
}
if you try to modify in the give xxx.conf file it will be useful.
The write method used to skip the First line (#Thu May 26 09:43:52 CEST 2011) in the store method. The write method run till the end of the first line. after it will run normally.
public class CleanProperties extends Properties {
private static class StripFirstLineStream extends FilterOutputStream {
private boolean firstlineseen = false;
public StripFirstLineStream(final OutputStream out) {
super(out);
}
#Override
public void write(final int b) throws IOException {
if (firstlineseen) {
super.write(b);
} else if (b == '\n') {
// Used to go to next line if did use this line
// you will get the continues output from the give file
super.write('\n');
firstlineseen = true;
}
}
}
private static final long serialVersionUID = 7567765340218227372L;
#Override
public synchronized Enumeration<java.lang.Object> keys() {
return Collections.enumeration(new TreeSet<>(super.keySet()));
}
#Override
public void store(final OutputStream out, final String comments)
throws IOException {
super.store(new StripFirstLineStream(out), null);
}
}
Can you not just flag up in your application somewhere when a meaningful configuration change takes place and only write the file if that is set?
You might want to look into Commons Configuration which has a bit more flexibility when it comes to writing and reading things like properties files. In particular, it has methods which attempt to write the exact same properties file (including spacing, comments etc) as the existing properties file.
You can handle this question by following this Stack Overflow post to retain order:
Write in a standard order:
How can I write Java properties in a defined order?
Then write the properties to a string and remove the comments as needed. Finally write to a file.
ByteArrayOutputStream baos = new ByteArrayOutputStream();
properties.store(baos,null);
String propertiesData = baos.toString(StandardCharsets.UTF_8.name());
propertiesData = propertiesData.replaceAll("^#.*(\r|\n)+",""); // remove all comments
FileUtils.writeStringToFile(fileTarget,propertiesData,StandardCharsets.UTF_8);
// you may want to validate the file is readable by reloading and doing tests to validate the expected number of keys matches
InputStream is = new FileInputStream(fileTarget);
Properties testResult = new Properties();
testResult.load(is);
I'm trying to extend my library for integrating Swing and JPA by making JPA config as automatic (and portable) as can be done, and it means programmatically adding <class> elements. (I know it can be done via Hibernate's AnnotationConfiguration or EclipseLInk's ServerSession, but - portability). I'd also like to avoid using Spring just for this single purpose.
I can create a persistence.xml on the fly, and fill it with <class> elements from specified packages (via the Reflections library). The problem starts when I try to feed this persistence.xml to a JPA provider. The only way I can think of is setting up a URLClassLoader, but I can't think of a way what wouldn't make me write the file to the disk somewhere first, for sole ability to obtain a valid URL. Setting up a socket for serving the file via an URL(localhost:xxxx) seems... I don't know, evil?
Does anyone have an idea how I could solve this problem? I know it sounds like a lot of work to avoid using one library, but I'd just like to know if it can be done.
EDIT (a try at being more clear):
Dynamically generated XML is kept in a String object. I don't know how to make it available to a persistence provider. Also, I want to avoid writing the file to disk.
For purpose of my problem, a persistence provider is just a class which scans the classpath for META-INF/persistence.xml. Some implementations can be made to accept dynamic creation of XML, but there is no common interface (especially for a crucial part of the file, the <class> tags).
My idea is to set up a custom ClassLoader - if you have any other I'd be grateful, I'm not set on this one.
The only easily extendable/configurable one I could find was a URLClassLoader. It works on URL objects, and I don't know if I can create one without actually writing XML to disk first.
That's how I'm setting things up, but it's working by writing the persistenceXmlFile = new File("META-INF/persistence.xml") to disk:
Thread.currentThread().setContextClassLoader(
new URLResourceClassLoader(
new URL[] { persistenceXmlFile.toURI().toURL() },
Thread.currentThread().getContextClassLoader()
)
);
URLResourceClassLoader is URLCLassLoader's subclass, which allows for looking up resources as well as classes, by overriding public Enumeration<URL> findResources(String name).
Maybe a bit late (after 4 years), but for others that are looking for a similar solution, you may be able to use the URL factory I created:
public class InMemoryURLFactory {
public static void main(String... args) throws Exception {
URL url = InMemoryURLFactory.getInstance().build("/this/is/a/test.txt", "This is a test!");
byte[] data = IOUtils.toByteArray(url.openConnection().getInputStream());
// Prints out: This is a test!
System.out.println(new String(data));
}
private final Map<URL, byte[]> contents = new WeakHashMap<>();
private final URLStreamHandler handler = new InMemoryStreamHandler();
private static InMemoryURLFactory instance = null;
public static synchronized InMemoryURLFactory getInstance() {
if(instance == null)
instance = new InMemoryURLFactory();
return instance;
}
private InMemoryURLFactory() {
}
public URL build(String path, String data) {
try {
return build(path, data.getBytes("UTF-8"));
} catch (UnsupportedEncodingException ex) {
throw new RuntimeException(ex);
}
}
public URL build(String path, byte[] data) {
try {
URL url = new URL("memory", "", -1, path, handler);
contents.put(url, data);
return url;
} catch (MalformedURLException ex) {
throw new RuntimeException(ex);
}
}
private class InMemoryStreamHandler extends URLStreamHandler {
#Override
protected URLConnection openConnection(URL u) throws IOException {
if(!u.getProtocol().equals("memory")) {
throw new IOException("Cannot handle protocol: " + u.getProtocol());
}
return new URLConnection(u) {
private byte[] data = null;
#Override
public void connect() throws IOException {
initDataIfNeeded();
checkDataAvailability();
// Protected field from superclass
connected = true;
}
#Override
public long getContentLengthLong() {
initDataIfNeeded();
if(data == null)
return 0;
return data.length;
}
#Override
public InputStream getInputStream() throws IOException {
initDataIfNeeded();
checkDataAvailability();
return new ByteArrayInputStream(data);
}
private void initDataIfNeeded() {
if(data == null)
data = contents.get(u);
}
private void checkDataAvailability() throws IOException {
if(data == null)
throw new IOException("In-memory data cannot be found for: " + u.getPath());
}
};
}
}
}
We can use the Jimfs google library for that.
First, we need to add the maven dependency to our project:
<dependency>
<groupId>com.google.jimfs</groupId>
<artifactId>jimfs</artifactId>
<version>1.2</version>
</dependency>
After that, we need to configure our filesystem behavior, and write our String content to the in-memory file, like this:
public static final String INPUT =
"\n"
+ "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"
+ "<note>\n"
+ " <to>Tove</to>\n"
+ " <from>Jani</from>\n"
+ " <heading>Reminder</heading>\n"
+ " <body>Don't forget me this weekend!</body>\n"
+ "</note>";
#Test
void usingJIMFS() throws IOException {
try (var fs = Jimfs.newFileSystem(Configuration.unix())) {
var path = fs.getPath(UUID.randomUUID().toString());
Files.writeString(path, INPUT);
var url = path.toUri().toURL();
assertThat(url.getProtocol()).isEqualTo("jimfs");
assertThat(Resources.asCharSource(url, UTF_8).read()).isEqualTo(INPUT);
}
}
We can find more examples in the official repository.
If we look inside the jimfs source code we will find the implementation is similar to #NSV answer.
I'm writing a play 2.0 java application that allows users to upload files. Those files are stored on a third-party service I access using a Java library, the method I use in this API has the following signature:
void store(InputStream stream, String path, String contentType)
I've managed to make uploads working using the following simple controller:
public static Result uploadFile(String path) {
MultipartFormData body = request().body().asMultipartFormData();
FilePart filePart = body.getFile("files[]");
InputStream is = new FileInputStream(filePart.getFile())
myApi.store(is,path,filePart.getContentType());
return ok();
}
My concern is that this solution is not efficient because by default the play framework stores all the data uploaded by the client in a temporary file on the server then calls my uploadFile() method in the controller.
In a traditional servlet application I would have written a servlet behaving this way:
myApi.store(request.getInputStream(), ...)
I have been searching everywhere and didn't find any solution. The closest example I found is Why makes calling error or done in a BodyParser's Iteratee the request hang in Play Framework 2.0? but I didn't found how to modify it to fit my needs.
Is there a way in play2 to achieve this behavior, i.e. having the data uploaded by the client to go "through" the web-application directly to another system ?
Thanks.
I've been able to stream data to my third-party API using the following Scala controller code:
def uploadFile() =
Action( parse.multipartFormData(myPartHandler) )
{
request => Ok("Done")
}
def myPartHandler: BodyParsers.parse.Multipart.PartHandler[MultipartFormData.FilePart[Result]] = {
parse.Multipart.handleFilePart {
case parse.Multipart.FileInfo(partName, filename, contentType) =>
//Still dirty: the path of the file is in the partName...
String path = partName;
//Set up the PipedOutputStream here, give the input stream to a worker thread
val pos:PipedOutputStream = new PipedOutputStream();
val pis:PipedInputStream = new PipedInputStream(pos);
val worker:UploadFileWorker = new UploadFileWorker(path,pis);
worker.contentType = contentType.get;
worker.start();
//Read content to the POS
Iteratee.fold[Array[Byte], PipedOutputStream](pos) { (os, data) =>
os.write(data)
os
}.mapDone { os =>
os.close()
Ok("upload done")
}
}
}
The UploadFileWorker is a really simple Java class that contains the call to the thrid-party API.
public class UploadFileWorker extends Thread {
String path;
PipedInputStream pis;
public String contentType = "";
public UploadFileWorker(String path, PipedInputStream pis) {
super();
this.path = path;
this.pis = pis;
}
public void run() {
try {
myApi.store(pis, path, contentType);
pis.close();
} catch (Exception ex) {
ex.printStackTrace();
try {pis.close();} catch (Exception ex2) {}
}
}
}
It's not completely perfect because I would have preferred to recover the path as a parameter to the Action but I haven't been able to do so. I thus have added a piece of javascript that updates the name of the input field (and thus the partName) and it does the trick.