Caching FileInputStream

Caching FileInputStream - java

In my program I am repeatedly reading a number of files like this:
String myLetter = "CoverSheet.rtf"; // actually has a full path
FileInputStream in = new FileInputStream(myLetter);
letterSection.importRtfDocument(in);
in.close();
Because there are many small files which are components to add to the document with importRtfDocument, and thousands of letters to generate in a run, the processing is quite slow.
The importRtfDocument method comes from a library I'm using, and needs to be given a FileInputStream. This is where I'm stumped. I tried a few things like declaring a FileInputStream for each file in the class and keeping them open - but reset() isn't supported.
I have looked at other similar questions like this one:
How to Cache InputStream for Multiple Use
However, none seem to address my problem, to wit, how can I cache a FileInputStream?

I normally create my own pool to cache files. Just consider following simple code :
class CachedPool {
private Map<URI, CachedFile> pool = new HashMap<>();
public CachedPool(){
}
public <T> T getResource(URI uri) {
CachedFile file;
if(pool.containsKey(uri)){
file = pool.get(uri);
} else {
file = new CachedFile(uri); // Injecting point to add resources
pool.put(uri, file);
}
return file.getContent();
}
}
class CachedFile {
private URI uri;
private int counter;
private Date cachedTime;
private Object content;
public CachedFile(URL uri){
this.url = uri;
this.content = uri.toURL().getContent();
this.cachedTime = new Date();
this.counter = 0;
}
public <T> T getContent(){
counter++;
return (T) content;
}
/** Override equals() and hashCode() **/
/** Write getters for all instance variables **/
}
You can use counter of CachedFile to remove the files that are rarely being used after a certain time period or when heap memory is very low.

Related

LastModifiedFileListFilter for Sftp inbound adapter

I am trying to implement LastModifiedFileListFilter as it looks like there is no similar filter for spring-integration-sftp yet for 5.3.2 release, I tried to copy the LastModifiedFileListFilter from spring-integration-file but the discard callback isn't working. Here is my implementation:
#Slf4j
#Data
public class LastModifiedLsEntryFileListFilter implements DiscardAwareFileListFilter<ChannelSftp.LsEntry> {
private static final long ONE_SECOND = 1000;
private static final long DEFAULT_AGE = 30;
private volatile long age = DEFAULT_AGE;
#Nullable
private Consumer<ChannelSftp.LsEntry> discardCallback;
public LastModifiedLsEntryFileListFilter(final long age) {
this.age = age;
}
#Override
public List<ChannelSftp.LsEntry> filterFiles(final ChannelSftp.LsEntry[] files) {
final List<ChannelSftp.LsEntry> list = new ArrayList<>();
final long now = System.currentTimeMillis() / ONE_SECOND;
for (final ChannelSftp.LsEntry file : files) {
if (this.fileIsAged(file, now)) {
log.info("File [{}] is aged...", file.getFilename());
list.add(file);
} else if (this.discardCallback != null) {
log.info("File [{}] is still being uploaded...", file.getFilename());
this.discardCallback.accept(file);
}
}
return list;
}
#Override
public boolean accept(final ChannelSftp.LsEntry file) {
if (this.fileIsAged(file, System.currentTimeMillis() / ONE_SECOND)) {
return true;
}
else if (this.discardCallback != null) {
this.discardCallback.accept(file);
}
return false;
}
private boolean fileIsAged(final ChannelSftp.LsEntry file, final long now) {
return file.getAttrs().getMTime() + this.age <= now;
}
#Override
public void addDiscardCallback(#Nullable final Consumer<ChannelSftp.LsEntry> discardCallbackToSet) {
this.discardCallback = discardCallbackToSet;
}
}
The filter is able to correctly identify the age of file and discards it but that file is not retried which I believe is part of discard callback.
I guess my question is how to set discard callback to keep retrying the discarded file until files ages. Thanks

not retried which I believe is part of discard callback.
I wonder what makes you think that way...
The fact that FileReadingMessageSource with its WatchService option has the logic like this:
if (filter instanceof DiscardAwareFileListFilter) {
((DiscardAwareFileListFilter<File>) filter).addDiscardCallback(this.filesToPoll::add);
}
doesn't mean that SFTP implementation is similar.
The retry is there anyway: on the next poll not accepted file will be checked again.
You probably don't show other filters you use, and your file is filtered out before this LastModifiedLsEntryFileListFilter, e.g. with the SftpPersistentAcceptOnceFileListFilter. You need to consider to have your "last-modified" as a first one in the chain.
If you are not going to support discard callback from the outside, you probably don't need to implement that DiscardAwareFileListFilter at all.

Data operation in memory

I know it is always better to operate data in memory instead of file. Currently, I am putting all incoming data in a static ArrayList, and when that ArrayList has more than 80 entries, my program will save the contents of this ArrayList to a file and clear up this array for the next wave of coming data.
I wonder if it's better (or worse) to use Vector instead of ArrayList. If there is difference, which is better/worse? And in which case?
Here is my relevant code:
public class Exchange () {
private static ArrayList<String> datain = new ArrayList<String> ();
public static void addData(String s) {
datain.add(s);
}
public static boolean checkSize() {
if (datain.size() >= 80)
return true;
else
return false;
}
public static void writeData() throws FileNotFoundException {
PrintWriter pw = new PrintWriter(new File ("myfile.txt"));
for (int i = 0; i < datain.size(); i++) {
pw.println(datain.get(i);
}
pw.close();
}
public static void clear() {
datain = new ArrayList<String>();
}
}
P.S. This approach currently works fine, I am just wondering whether using vector will be better for this case. Also, if you see any bad design, feel free to point it out. Thanks!

In the vast majority of cases, using ArrayList will suffice. The primary difference is that Vector is thread-safe, whilst ArrayList is not, but seeing as you aren't working with multiple threads, there is no reason to prefer Vector over ArrayList in your code

This is how I would do it, unless I know there is a performance issue. Most of the cost is in the opening and closing of the file, so I would avoid doing that. I would also assume that appending is what you want.
public enum Logging {
; // no instances
public static final String FILE_NAME = "myfile.txt";
private static final PrintWriter pw;
static {
try {
pw = new PrintWriter(new FileWriter(FILE_NAME, true));
} catch (IOException ioe) {
throw new AssertionError(ioe);
}
}
public static void addData(String s) {
pw.println(s);
}
}

Proxy Pattern: how is it more efficent that creating the real object?

In the following example, from wiki books https://en.wikibooks.org/wiki/Computer_Science_Design_Patterns/Proxy
I am not sure how this is faster/more effiecent than just creating the real object and using display image from it. Becuase the proxy creates the real object anyway within the displayImage method?
//on System B
class ProxyImage implements Image {
private RealImage image = null;
private String filename = null;
/**
* Constructor
* #param FILENAME
*/
public ProxyImage(final String FILENAME) {
filename = FILENAME;
}
/**
* Displays the image
*/
public void displayImage() {
if (image == null) {
image = new RealImage(filename);
}
image.displayImage();
}
}
Surely the proxy pattern wouldnt save memory as it needs to instantiate two objects (proxy and real) rather than just one (real) if you didnt use proxy?

From the link you posted (emphasis mine):
The proxy class ProxyImage is running on another system than the real image class itself and can represent the real image RealImage over there. The image information is accessed from the disk. Using the proxy pattern, the code of the ProxyImage avoids multiple loading of the image, accessing it from the other system in a memory-saving manner.
In short: it doesn't save memory, it speeds up the application because you don't need to access to disk every time to read the real image.
This is proven in this part of the code:
public void displayImage() {
//if image is not loaded into memory
if (image == null) {
//then load it, go to disk only once
image = new RealImage(filename);
}
//now it is in memory, display the real image
image.displayImage();
}
To have a better understanding of this problem, let's change the definitions of the class and the interface:
public interface Image {
String getName();
byte[] getData();
}
Now, the RealImage class that will always seek for the data in disk, in case the file doesn't exist (it was deleted or renamed):
public class RealImage implements Image {
//implements all the operations going to disk...
private String fileName;
public RealImage(String fileName) {
this.fileName = fileName;
}
#Override
public String getName() {
String name = "";
//fancy operations to seek for the file in disk (in case it has been deleted)
//read the data from file in disk
//get the name
name = ...;
return name;
}
#Override
public byte[] getData() {
byte[] data;
//again, fancy operations to seek for the file in disk (in case it has been deleted)
//read the data from file in disk
//get the image data for displaying purposes
data = ...;
return data;
}
}
And now, our good ProxyImage that will act as a proxy for a RealImage to save the highly costly task of going to disk every time by saving the data into memory:
public class ProxyImage implements Image {
private String fileName;
private RealImage realImage;
private byte[] data;
private String name;
//implements all the operations going to disk...
public RealImage(String fileName) {
this.fileName = fileName;
}
#Override
public String getName() {
//in case we don't have the name of the image
if (this.name == null) {
//use a RealImage to retrieve the image name
//we will create the instance of realImage only if needed
if (realImage == null) {
realImage = new RealImage(fileName);
}
//getting the image from the real image is highly costly
//so we will do this only once
this.name = realImage.getName();
}
return this.name;
}
#Override
public byte[] getData() {
//similar behavior for the data of the image
if (this.data == null) {
if (realImage == null) {
realImage = new RealImage(fileName);
}
//highly costly operation
this.data = realImage.getData();
}
return this.data;
}
}
Thus reflecting the goodies of using a proxy for our RealImage.

The purpose of this particular proxy appears to be to implement what's called 'Lazy Loading.' It doesn't actually read the file and create the image in memory until some other piece of code actually attempts to display it. This could save a lot of time and memory compared to putting images into memory that you never use!
In small examples it's easy to think "Well I could just program smarted and not load the silly thing." But imagine a bigger system where you are stuck with an API that takes a List<Image> as an argument, but only actually draws one when the user clicks the filename or something. This could be a significant boost.

Reading and writing multiple files in parallel

I need to write a program in Java which will read a relatively large number (~50,000) files in a directory tree, process the data, and output the processed data in a separate (flat) directory.
Currently I have something like this:
private void crawlDirectoyAndProcessFiles(File directory) {
for (File file : directory.listFiles()) {
if (file.isDirectory()) {
crawlDirectoyAndProcessFiles(file);
} else {
Data d = readFile(file);
ProcessedData p = d.process();
writeFile(p,file.getAbsolutePath(),outputDir);
}
}
}
Suffice to say that each of those methods is removed and trimmed down for ease of reading, but they all work fine. The whole process works fine, except that it is slow. The processing of data occurs via a remote service and takes between 5-15 seconds. Multiply that by 50,000...
I've never done anything multi-threaded before, but I figure I can get some pretty good speed increases if I do. Can anyone give some pointers how I can effectively parallelise this method?

I would use a ThreadPoolExecutor to manage the threads. You can do something like this:
private class Processor implements Runnable {
private final File file;
public Processor(File file) {
this.file = file;
}
#Override
public void run() {
Data d = readFile(file);
ProcessedData p = d.process();
writeFile(p,file.getAbsolutePath(),outputDir);
}
}
private void crawlDirectoryAndProcessFiles(File directory, Executor executor) {
for (File file : directory.listFiles()) {
if (file.isDirectory()) {
crawlDirectoryAndProcessFiles(file,executor);
} else {
executor.execute(new Processor(file);
}
}
}
You would obtain an Executor using:
ExecutorService executor = Executors.newFixedThreadPool(poolSize);
where poolSize is the maximum number of threads you want going at once. (It's important to have a reasonable number here; 50,000 threads isn't exactly a good idea. A reasonable number might be 8.) Note that after you've queued all the files, your main thread can wait until things are done by calling executor.awaitTermination.

Assuming you have a single hard disk (i.e. something that only allows single simultaneous read operations, not a SSD or RAID array, network file system, etc...), then you only want one thread performing IO (reading from/writing to the disk). Also, you only want as many threads doing CPU bound operations as you have cores, otherwise time will be wasted in context switching.
Given the above restrictions, the code below should work for you. The single threaded executor ensures that only one Runnable executes at any one time. The fixed thread pool ensures no more than NUM_CPUS Runnables are executing at any one time.
One thing this does not do is to provide feedback on when processing is finished.
private final static int NUM_CPUS = 4;
private final Executor _fileReaderWriter = Executors.newSingleThreadExecutor();
private final Executor _fileProcessor = Executors.newFixedThreadPool(NUM_CPUS);
private final class Data {}
private final class ProcessedData {}
private final class FileReader implements Runnable
{
private final File _file;
FileReader(final File file) { _file = file; }
#Override public void run()
{
final Data data = readFile(_file);
_fileProcessor.execute(new FileProcessor(_file, data));
}
private Data readFile(File file) { /* ... */ return null; }
}
private final class FileProcessor implements Runnable
{
private final File _file;
private final Data _data;
FileProcessor(final File file, final Data data) { _file = file; _data = data; }
#Override public void run()
{
final ProcessedData processedData = processData(_data);
_fileReaderWriter.execute(new FileWriter(_file, processedData));
}
private ProcessedData processData(final Data data) { /* ... */ return null; }
}
private final class FileWriter implements Runnable
{
private final File _file;
private final ProcessedData _data;
FileWriter(final File file, final ProcessedData data) { _file = file; _data = data; }
#Override public void run()
{
writeFile(_file, _data);
}
private Data writeFile(final File file, final ProcessedData data) { /* ... */ return null; }
}
public void process(final File file)
{
if (file.isDirectory())
{
for (final File subFile : file.listFiles())
process(subFile);
}
else
{
_fileReaderWriter.execute(new FileReader(file));
}
}

The easiest (and probably one of the most reasonable) way is to have a thread pool (take a look in corresponding Executor). Main thread is responsible to crawl in the directory. When a file is encountered, then create a "Job" (which is a Runnable/Callable) and let the Executor handle the job.
(This should be sufficient for you to start, I prefer not giving too much concrete code coz it should not be difficult for you to figure out once you have read the Executor, Callable etc part)

Best-practice for documenting available/required Java properties file contents

Is there a well-established approach for documenting Java "properties" file contents, including:
specifying the data type/contents expected for a given key
specifying whether a key is required for the application to function
providing a description of the key's meaning
Currently, I maintain (by hand) a .properties file that is the default, and I write a prose description of the data type and description of each key in a comment before. This does not lead to a programmatically accessible properties file.
I guess what I'm looking for is a "getopt" equivalent for properties files...
[EDIT: Related]
Java Configuration Frameworks

You could use some of the features in the Apache Commons Configuration package. It at least provides type access to your properties.
There are only conventions in the traditional java properties file. Some I've seen include providing, like you said, an example properties file. Another is to provide the default configuration with all the properties, but commented out.
If you really want to require something, maybe you're not looking for a properties file. You could use an XML configuration file and specify a schema with datatypes and requirements. You can use jaxb to compile the schema into java and read it i that way. With validation you can make sure the required properties are there.
The best you could hope for is when you execute your application, it reads, parses, and validates the properties in the file. If you absolutely had to stay properties based and didn't want to go xml, but needed this parsing. You could have a secondary properties file that listed each property that could be included, its type, and whether it was required. You'd then have to write a properties file validator that would take in a file to validate as well as a validation schema-like properties file. Something like
#list of required properties
required=prop1,prop2,prop3
#all properties and their types
prop1.type=Integer
prop2.type=String
I haven't looked through all of the Apache Configuration package, but they often have useful utilities like this. I wouldn't be surprised if you could find something in there that would simplify this.

Another option to check out is the project called OWNER. There, you define the interface that serves as the configuration object in your application, using types and annotations. Then, OWNER does the finding and parsing of the correct Properties file. Thus, you could write a javadoc for your interface and use that as the documentation.

I have never seen a standard way of doing it. What I would probably do is:
wrap or extend the java.util.Properties class
override (of extending) or provide a method (if wrapping) the store method (or storeToXML, etc) that writes out a comment for each line.
have the method that stores the properties have some sort of input file where you describe the properties of each one.
It doesn't get you anything over what you are doing by hand, except that you can manage the information in a different way that might be easier to deal with - for example you could have a program that spit out the comments to read in. It would potentially give you the programmatic access that you need, but it is a roll-your-own sort of thing.
Or it might just be too much work for too little to gain (which is why there isn't something obvious out there).
If you can specify the sort of comments you want to see I could take a stab at writing something if I get bored :-) (it is the sort of thing I like to do for fun, sick I know :-).
Ok... I got bored... here is something that is at least a start :-)
import java.util.HashMap;
import java.util.Map;
import java.util.Map.Entry;
import java.util.Properties;
public class PropertiesVerifier
{
private final Map<String, PropertyInfo> optionalInfo;
private final Map<String, PropertyInfo> requiredInfo;
{
optionalInfo = new HashMap<String, PropertyInfo>();
requiredInfo = new HashMap<String, PropertyInfo>();
}
public PropertiesVerifier(final PropertyInfo[] infos)
{
for(final PropertyInfo info : infos)
{
final Map<String, PropertyInfo> infoMap;
if(info.isRequired())
{
infoMap = requiredInfo;
}
else
{
infoMap = optionalInfo;
}
infoMap.put(info.getName(), info);
}
}
public void verifyProperties(final Properties properties)
{
for(final Entry<Object, Object> property : properties.entrySet())
{
final String key;
final String value;
key = (String)property.getKey();
value = (String)property.getValue();
if(!(isValid(key, value)))
{
throw new IllegalArgumentException(value + " is not valid for: " + key);
}
}
}
public boolean isRequired(final String key)
{
return (requiredInfo.get(key) != null);
}
public boolean isOptional(final String key)
{
return (optionalInfo.get(key) != null);
}
public boolean isKnown(final String key)
{
return (isRequired(key) || isOptional(key));
}
public Class getType(final String key)
{
final PropertyInfo info;
info = getPropertyInfoFor(key);
return (info.getType());
}
public boolean isValid(final String key,
final String value)
{
final PropertyInfo info;
info = getPropertyInfoFor(key);
return (info.verify(value));
}
private PropertyInfo getPropertyInfoFor(final String key)
{
PropertyInfo info;
info = requiredInfo.get(key);
if(info == null)
{
info = optionalInfo.get(key);
if(info == null)
{
// should be a better exception maybe... depends on how you
// want to deal with it
throw new IllegalArgumentException(key + "
is not a valid property name");
}
}
return (info);
}
protected final static class PropertyInfo
{
private final String name;
private final boolean required;
private final Class clazz;
private final Verifier verifier;
protected PropertyInfo(final String nm,
final boolean mandatory,
final Class c)
{
this(nm, mandatory, c, getDefaultVerifier(c));
}
protected PropertyInfo(final String nm,
final boolean mandatory,
final Class c,
final Verifier v)
{
// check for null
name = nm;
required = mandatory;
clazz = c;
verifier = v;
}
#Override
public int hashCode()
{
return (getName().hashCode());
}
#Override
public boolean equals(final Object o)
{
final boolean retVal;
if(o instanceof PropertyInfo)
{
final PropertyInfo other;
other = (PropertyInfo)o;
retVal = getName().equals(other.getName());
}
else
{
retVal = false;
}
return (retVal);
}
public boolean verify(final String value)
{
return (verifier.verify(value));
}
public String getName()
{
return (name);
}
public boolean isRequired()
{
return (required);
}
public Class getType()
{
return (clazz);
}
}
private static Verifier getDefaultVerifier(final Class clazz)
{
final Verifier verifier;
if(clazz.equals(Boolean.class))
{
// shoudl use a singleton to save space...
verifier = new BooleanVerifier();
}
else
{
throw new IllegalArgumentException("Unknown property type: " +
clazz.getCanonicalName());
}
return (verifier);
}
public static interface Verifier
{
boolean verify(final String value);
}
public static class BooleanVerifier
implements Verifier
{
public boolean verify(final String value)
{
final boolean retVal;
if(value.equalsIgnoreCase("true") ||
value.equalsIgnoreCase("false"))
{
retVal = true;
}
else
{
retVal = false;
}
return (retVal);
}
}
}
And a simple test for it:
import java.util.Properties;
public class Main
{
public static void main(String[] args)
{
final Properties properties;
final PropertiesVerifier verifier;
properties = new Properties();
properties.put("property.one", "true");
properties.put("property.two", "false");
// properties.put("property.three", "5");
verifier = new PropertiesVerifier(
new PropertiesVerifier.PropertyInfo[]
{
new PropertiesVerifier.PropertyInfo("property.one",
true,
Boolean.class),
new PropertiesVerifier.PropertyInfo("property.two",
false,
Boolean.class),
// new PropertiesVerifier.PropertyInfo("property.three",
// true,
// Boolean.class),
});
System.out.println(verifier.isKnown("property.one"));
System.out.println(verifier.isKnown("property.two"));
System.out.println(verifier.isKnown("property.three"));
System.out.println(verifier.isRequired("property.one"));
System.out.println(verifier.isRequired("property.two"));
System.out.println(verifier.isRequired("property.three"));
System.out.println(verifier.isOptional("property.one"));
System.out.println(verifier.isOptional("property.two"));
System.out.println(verifier.isOptional("property.three"));
System.out.println(verifier.getType("property.one"));
System.out.println(verifier.getType("property.two"));
// System.out.println(verifier.getType("property.tthree"));
System.out.println(verifier.isValid("property.one", "true"));
System.out.println(verifier.isValid("property.two", "false"));
// System.out.println(verifier.isValid("property.tthree", "5"));
verifier.verifyProperties(properties);
}
}

One easy way is to distribute your project with a sample properties file, e.g. my project has in svn a "build.properties.example",with properties commented as necessary. The locally correct properties don't go into svn.
Since you mention "getopt", though, I'm wondering if you're really thinking of cmd line arguments? If there's a "main" that needs specific properties, I usually put it the relevant instructions in a "useage" message that prints out if the arguments are incorrect or "-h".

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.