I have multiple threads which are serializing my 'Data' objects to files. The filename is based on 2 fields from the Object
class Data {
org.joda.DateTime time;
String title;
public String getFilename() {
return time.toString() + '_' + title + ".xml";
}
It is possible that 2 Data objects will have the same 'time' and 'title', and so the same filename.
This is acceptable, and I'm happy for either to be saved. (They're probably the same Data object anyway if those are the same)
My problem is that two (or more) threads are writing to a file AT THE SAME TIME, causing malformed XML.
I had a look at java.nio.channels.FileLock, but it's for VM-Wide locking, and specifically NOT suitable for intra-Thread locking.
I could synchronize on DataIO.class (but that will cause a HUGE overhead, since I really only want to synchronize on the individual File).
Synchronizing on the File object will be useless, as multiple File objects can represent the same System-File.
Code Follows:
class DataIO {
public void writeArticleToFile(Article article, String filename, boolean overwrite) throws IOException {
File file = new File(filename);
writeArticleToFile(article, file, overwrite);
}
public void writeDataToFile(Data data, File file, boolean overwrite) throws IOException {
if (file.exists()) {
if (overwrite) {
if (!file.delete()) {
throw new IOException("Failed to delete the file, for overwriting: " + file);
}
} else {
throw new IOException("File " + file + " already exists, and overwrite flag is set to false.");
}
}
File parentFile = file.getParentFile();
if (parentFile != null) {
file.getParentFile().mkdirs();
}
file.createNewFile();
if (!file.canWrite()) {
throw new IOException("You do not have permission to write to the file: " + file);
}
FileOutputStream fos = new FileOutputStream(file, false);
try {
writeDataToStream(data, fos);
logger.debug("Successfully wrote Article to file: " + file.getAbsolutePath());
} finally {
fos.close();
}
}
}
If I am reading this correctly you have a Data object that represents a single file.
You can consider creating a striped set based on the Data object. Possibly having a ConcurrentHashMap of
ConcurrentMap<Data,Lock> lockMap = new ConcurrentHashMap<Data,Lock>();
No when you want to write to this object you can do:
Lock lock = lockMap.get(someMyDataObject);
lock.lock();
try{
//write object here
}finally{
lock.unlock();
}
Keep in mind you would have to write the hashCode and equals method based on the title and DateTime
You could intern() the string that is the filename. Then synchronise on the interned string.
class DataIO {
public void writeArticleToFile(Article article, String filename, boolean overwrite) throws IOException {
synchronized(filename.intern()) {
File file = new File(filename);
writeArticleToFile(article, file, overwrite);
}
}
I agree that using synchronization is the technique you should use. What you need is a distinct object for each file permutation, and more importantly the same object each time. One option might be to create a class called FileLock:
public class FileLock {
DateTime time;
String title;
public FileLock(DateTime time, String title) {
this.time = time;
this.title = title;
}
override equals/hashCode based on those two properties
static Hashtable<FileLock, FileLock> unqiueLocks = new Hashtable<FileLock, FileLock>();
static lockObject = new Object();
public static FileLock getLock(DateTime time, String title) {
synchronized (lockObject) {
FileLock lock = new FileLock(time, title);
if (unqiueLocks.ContainsKey(lock)) {
return unqiueLocks.get(lock);
}
else {
unqiueLocks.put(lock, lock);
return lock;
}
}
}
}
Then callers would use it like:
synchronized (FileLock.getLock(time, title)) {
...
}
Bear in mind this has a memory leak since the Hashtable keeps growing with new file/time permutations. If you need to, you could modify this technique so that callers of getLock also invoke a releaseLock method that you use to keep the Hashtable clean.
Related
I want to process files with a flink stream in which two lines belong together. In the first line there is a header and in the second line a corresponding text.
The files are located on my local file system. I am using the readFile(fileInputFormat, path, watchType, interval, pathFilter, typeInfo) method with a custom FileInputFormat.
My streaming job class looks like this:
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<Read> inputStream = env.readFile(new ReadInputFormatTest("path/to/monitored/folder"), "path/to/monitored/folder", FileProcessingMode.PROCESS_CONTINUOUSLY, 100);
inputStream.print();
env.execute("Flink Streaming Java API Skeleton");
and my ReadInputFormatTest like this:
public class ReadInputFormatTest extends FileInputFormat<Read> {
private transient FileSystem fileSystem;
private transient BufferedReader reader;
private final String inputPath;
private String headerLine;
private String readLine;
public ReadInputFormatTest(String inputPath) {
this.inputPath = inputPath;
}
#Override
public void open(FileInputSplit inputSplit) throws IOException {
FileSystem fileSystem = getFileSystem();
this.reader = new BufferedReader(new InputStreamReader(fileSystem.open(inputSplit.getPath())));
this.headerLine = reader.readLine();
this.readLine = reader.readLine();
}
private FileSystem getFileSystem() {
if (fileSystem == null) {
try {
fileSystem = FileSystem.get(new URI(inputPath));
} catch (URISyntaxException | IOException e) {
throw new RuntimeException(e);
}
}
return fileSystem;
}
#Override
public boolean reachedEnd() throws IOException {
return headerLine == null;
}
#Override
public Read nextRecord(Read r) throws IOException {
r.setHeader(headerLine);
r.setSequence(readLine);
headerLine = reader.readLine();
readLine = reader.readLine();
return r;
}
}
As expected, the headers and the text are stored together in one object. However, the file is read eight times. So the problem is the parallelization. Where and how can I specify that a file is processed only once, but several files in parallel?
Or do I have to change my custom FileInputFormat even further?
I would modify your source to emit the available filenames (instead of the actual file contents) and then add a new processor to read a name from the input stream and then emit pairs of lines. In other words, split the current source into a source followed by a processor. The processor can be made to run at any degree of parallelism and the source would be a single instance.
I have one method which write to a file. I need to synchronize file object
class MessageFile{
public static final String fileName="Main.html"
#AutoWired
AppConifg appconfig;
public boolean writeToFile(String fileContent) throws Exception{
String path = appConfig.getNewsPath() + File.separator + fileName; // getNewsPath is non-static method
final File alertFile= new File(path);
FileOutputStream out = null;
synchronized (alertFile) {
if (!alertFile.exists()) {
alertFile.createNewFile();
}
try {
out = new FileOutputStream(alertFile, false);
out.write(fileContent.getBytes());
out.flush();
} finally {
if (out != null) {
out.close();
}
}
}
return true;
}
}
But above code won`t take lock exclusive lock on file object as another instance of this class can have lock on this class and write to file.
So I want to how handle this case ?
I found one workaround creating a temporary file name appending time stamp (so temporary file name will be always unique) and after writing content to it , will first delete original file and then rename temporary file to original file name.
You can try synchronizing on MessageFile.class, if it is the only object accessing the file.
Your program does not get exclusive lock on file because you are using synchronized on a local variable alertFile that is not shared between instances of the class MessageFile (each object has its own alertFile). You have two possibilities to solve this:
1- Create some static object and synchronize on it (you may use fileName as it is already there).
2- Having a references in all the objects that point to the same object (passed in the constructor, for example) and synchronize on it.
You are creating new File object (alertFile) every time method is run, so the lock does nothing as it is different every time method is run - you need to have static File instance shared across all method calls.
If path can be different every time the method is run, you could create static Map<String, File> instance and use it like this:
Get path of the file.
If there is no File associated with this path, create it.
Otherwise, recover existing File instance from map.
Use this File as a lock and do operations on it.
Example based on modified answer:
class MessageFile{
public static final String fileName="Main.html"
#AutoWired
AppConifg appconfig;
private static final Map<String, File> filesMap = new HashMap<>();
public boolean writeToFile(String fileContent) throws Exception{
String path = appConfig.getNewsPath() + File.separator + fileName; // getNewsPath is non-static method
final File alertFile;
synchronized(filesMap) {
if (filesMap.containsKey(path)) {
alertFile = filesMap.get(path);
}
else {
alertFile = new File(path);
filesMap.put(path, alertFile);
}
}
FileOutputStream out = null;
synchronized (alertFile) {
if (!alertFile.exists()) {
alertFile.createNewFile();
}
try {
out = new FileOutputStream(alertFile, false);
out.write(fileContent.getBytes());
out.flush();
} finally {
if (out != null) {
out.close();
}
}
}
return true;
}
}
Synchronize on the class level object ie MessageFile.class or use a static synchronize method wrtietofile() . it will make sure only one thread writes into the file at a time . it also guarantees the lock will be released once the entire data is written to the file by a thread .
Is there any easy way to return File object from NodeRef? I'm converting nodes to temporary files and it works but it doesn't seem to be very practical. Here is my code:
public File getTempCopyAsFile() throws IOException{
File tempFile = TempFileProvider.createTempFile("temp_"+this.getDocName(), this.getDocExtension());
try (InputStream is = this.getReader().getContentInputStream()){
FileUtils.copyInputStreamToFile(is, tempFile);
}
return tempFile;
}
public ContentReader getReader() {
return contentService.getReader(nodeRef, ContentModel.PROP_CONTENT);
}
public String getName() {
return (String) nodeService.getProperty(nodeRef, ContentModel.PROP_NAME);
}
public String getDocExtension() {
return "." + FilenameUtils.getExtension(this.getName());
}
public String getDocName() {
return FilenameUtils.removeExtension(this.getName());
}
The easiest way is to use ContentService to get a ContentReader for the node. Once you have that, call ContentReader.getContent(File), which from the javadocs:
Gets content from the repository direct to file
All resources will be closed automatically.
Parameters:
file - the file to write the content to - it will be overwritten
While the content will be stored somewhere on disk within the content repo, accessing the raw file isn't recommended. The safe way is to create a Temp file, then have ContentReader send the contents of the node into it
Is it possible to force Properties not to add the date comment in front? I mean something like the first line here:
#Thu May 26 09:43:52 CEST 2011
main=pkg.ClientMain
args=myargs
I would like to get rid of it altogether. I need my config files to be diff-identical unless there is a meaningful change.
Guess not. This timestamp is printed in private method on Properties and there is no property to control that behaviour.
Only idea that comes to my mind: subclass Properties, overwrite store and copy/paste the content of the store0 method so that the date comment will not be printed.
Or - provide a custom BufferedWriter that prints all but the first line (which will fail if you add real comments, because custom comments are printed before the timestamp...)
Given the source code or Properties, no, it's not possible. BTW, since Properties is in fact a hash table and since its keys are thus not sorted, you can't rely on the properties to be always in the same order anyway.
I would use a custom algorithm to store the properties if I had this requirement. Use the source code of Properties as a starter.
Based on https://stackoverflow.com/a/6184414/242042 here is the implementation I have written that strips out the first line and sorts the keys.
public class CleanProperties extends Properties {
private static class StripFirstLineStream extends FilterOutputStream {
private boolean firstlineseen = false;
public StripFirstLineStream(final OutputStream out) {
super(out);
}
#Override
public void write(final int b) throws IOException {
if (firstlineseen) {
super.write(b);
} else if (b == '\n') {
firstlineseen = true;
}
}
}
private static final long serialVersionUID = 7567765340218227372L;
#Override
public synchronized Enumeration<Object> keys() {
return Collections.enumeration(new TreeSet<>(super.keySet()));
}
#Override
public void store(final OutputStream out, final String comments) throws IOException {
super.store(new StripFirstLineStream(out), null);
}
}
Cleaning looks like this
final Properties props = new CleanProperties();
try (final Reader inStream = Files.newBufferedReader(file, Charset.forName("ISO-8859-1"))) {
props.load(inStream);
} catch (final MalformedInputException mie) {
throw new IOException("Malformed on " + file, mie);
}
if (props.isEmpty()) {
Files.delete(file);
return;
}
try (final OutputStream os = Files.newOutputStream(file)) {
props.store(os, "");
}
if you try to modify in the give xxx.conf file it will be useful.
The write method used to skip the First line (#Thu May 26 09:43:52 CEST 2011) in the store method. The write method run till the end of the first line. after it will run normally.
public class CleanProperties extends Properties {
private static class StripFirstLineStream extends FilterOutputStream {
private boolean firstlineseen = false;
public StripFirstLineStream(final OutputStream out) {
super(out);
}
#Override
public void write(final int b) throws IOException {
if (firstlineseen) {
super.write(b);
} else if (b == '\n') {
// Used to go to next line if did use this line
// you will get the continues output from the give file
super.write('\n');
firstlineseen = true;
}
}
}
private static final long serialVersionUID = 7567765340218227372L;
#Override
public synchronized Enumeration<java.lang.Object> keys() {
return Collections.enumeration(new TreeSet<>(super.keySet()));
}
#Override
public void store(final OutputStream out, final String comments)
throws IOException {
super.store(new StripFirstLineStream(out), null);
}
}
Can you not just flag up in your application somewhere when a meaningful configuration change takes place and only write the file if that is set?
You might want to look into Commons Configuration which has a bit more flexibility when it comes to writing and reading things like properties files. In particular, it has methods which attempt to write the exact same properties file (including spacing, comments etc) as the existing properties file.
You can handle this question by following this Stack Overflow post to retain order:
Write in a standard order:
How can I write Java properties in a defined order?
Then write the properties to a string and remove the comments as needed. Finally write to a file.
ByteArrayOutputStream baos = new ByteArrayOutputStream();
properties.store(baos,null);
String propertiesData = baos.toString(StandardCharsets.UTF_8.name());
propertiesData = propertiesData.replaceAll("^#.*(\r|\n)+",""); // remove all comments
FileUtils.writeStringToFile(fileTarget,propertiesData,StandardCharsets.UTF_8);
// you may want to validate the file is readable by reloading and doing tests to validate the expected number of keys matches
InputStream is = new FileInputStream(fileTarget);
Properties testResult = new Properties();
testResult.load(is);
I'm trying to extend my library for integrating Swing and JPA by making JPA config as automatic (and portable) as can be done, and it means programmatically adding <class> elements. (I know it can be done via Hibernate's AnnotationConfiguration or EclipseLInk's ServerSession, but - portability). I'd also like to avoid using Spring just for this single purpose.
I can create a persistence.xml on the fly, and fill it with <class> elements from specified packages (via the Reflections library). The problem starts when I try to feed this persistence.xml to a JPA provider. The only way I can think of is setting up a URLClassLoader, but I can't think of a way what wouldn't make me write the file to the disk somewhere first, for sole ability to obtain a valid URL. Setting up a socket for serving the file via an URL(localhost:xxxx) seems... I don't know, evil?
Does anyone have an idea how I could solve this problem? I know it sounds like a lot of work to avoid using one library, but I'd just like to know if it can be done.
EDIT (a try at being more clear):
Dynamically generated XML is kept in a String object. I don't know how to make it available to a persistence provider. Also, I want to avoid writing the file to disk.
For purpose of my problem, a persistence provider is just a class which scans the classpath for META-INF/persistence.xml. Some implementations can be made to accept dynamic creation of XML, but there is no common interface (especially for a crucial part of the file, the <class> tags).
My idea is to set up a custom ClassLoader - if you have any other I'd be grateful, I'm not set on this one.
The only easily extendable/configurable one I could find was a URLClassLoader. It works on URL objects, and I don't know if I can create one without actually writing XML to disk first.
That's how I'm setting things up, but it's working by writing the persistenceXmlFile = new File("META-INF/persistence.xml") to disk:
Thread.currentThread().setContextClassLoader(
new URLResourceClassLoader(
new URL[] { persistenceXmlFile.toURI().toURL() },
Thread.currentThread().getContextClassLoader()
)
);
URLResourceClassLoader is URLCLassLoader's subclass, which allows for looking up resources as well as classes, by overriding public Enumeration<URL> findResources(String name).
Maybe a bit late (after 4 years), but for others that are looking for a similar solution, you may be able to use the URL factory I created:
public class InMemoryURLFactory {
public static void main(String... args) throws Exception {
URL url = InMemoryURLFactory.getInstance().build("/this/is/a/test.txt", "This is a test!");
byte[] data = IOUtils.toByteArray(url.openConnection().getInputStream());
// Prints out: This is a test!
System.out.println(new String(data));
}
private final Map<URL, byte[]> contents = new WeakHashMap<>();
private final URLStreamHandler handler = new InMemoryStreamHandler();
private static InMemoryURLFactory instance = null;
public static synchronized InMemoryURLFactory getInstance() {
if(instance == null)
instance = new InMemoryURLFactory();
return instance;
}
private InMemoryURLFactory() {
}
public URL build(String path, String data) {
try {
return build(path, data.getBytes("UTF-8"));
} catch (UnsupportedEncodingException ex) {
throw new RuntimeException(ex);
}
}
public URL build(String path, byte[] data) {
try {
URL url = new URL("memory", "", -1, path, handler);
contents.put(url, data);
return url;
} catch (MalformedURLException ex) {
throw new RuntimeException(ex);
}
}
private class InMemoryStreamHandler extends URLStreamHandler {
#Override
protected URLConnection openConnection(URL u) throws IOException {
if(!u.getProtocol().equals("memory")) {
throw new IOException("Cannot handle protocol: " + u.getProtocol());
}
return new URLConnection(u) {
private byte[] data = null;
#Override
public void connect() throws IOException {
initDataIfNeeded();
checkDataAvailability();
// Protected field from superclass
connected = true;
}
#Override
public long getContentLengthLong() {
initDataIfNeeded();
if(data == null)
return 0;
return data.length;
}
#Override
public InputStream getInputStream() throws IOException {
initDataIfNeeded();
checkDataAvailability();
return new ByteArrayInputStream(data);
}
private void initDataIfNeeded() {
if(data == null)
data = contents.get(u);
}
private void checkDataAvailability() throws IOException {
if(data == null)
throw new IOException("In-memory data cannot be found for: " + u.getPath());
}
};
}
}
}
We can use the Jimfs google library for that.
First, we need to add the maven dependency to our project:
<dependency>
<groupId>com.google.jimfs</groupId>
<artifactId>jimfs</artifactId>
<version>1.2</version>
</dependency>
After that, we need to configure our filesystem behavior, and write our String content to the in-memory file, like this:
public static final String INPUT =
"\n"
+ "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"
+ "<note>\n"
+ " <to>Tove</to>\n"
+ " <from>Jani</from>\n"
+ " <heading>Reminder</heading>\n"
+ " <body>Don't forget me this weekend!</body>\n"
+ "</note>";
#Test
void usingJIMFS() throws IOException {
try (var fs = Jimfs.newFileSystem(Configuration.unix())) {
var path = fs.getPath(UUID.randomUUID().toString());
Files.writeString(path, INPUT);
var url = path.toUri().toURL();
assertThat(url.getProtocol()).isEqualTo("jimfs");
assertThat(Resources.asCharSource(url, UTF_8).read()).isEqualTo(INPUT);
}
}
We can find more examples in the official repository.
If we look inside the jimfs source code we will find the implementation is similar to #NSV answer.