This question is a little complicated but I will do my best to make it simple.
I have a program which I want to run multithreaded.
This is what the program does:
initializes an executable (commandline utility)
loads a file into the executable (files are provided from a data provider method)
sends commands to that executable based on the file which was loaded
parses the received responses from executable
writes results to a csv file
All this takes place in a single method.
However when running in multithreaded mode, everything runs fine except, all the results written to the csv file are wrong and out of order.
However when I added the keyword sychronized in the method declaration and run the program with multiple threads, the program works just fine.
public sychronized void run(Dataprovider data) {
...
}
However the program runs at the same speed as if I were running in single thread mode. How can I fix this? This is driving me nuts...
How can I run this program properly multithreaded?
I'm looking for ideas and/or guidance
Edit:
However when running in multithreaded mode, everything runs fine
except, all the results written to the csv file are wrong and out of
order.
I load a file in the executable, I run some calculations on that file, then save it. I then get the file size in bytes (file.length) for that newly generated file. I compare the results of that new file with the old file (file which was loaded) and I see that the new file is smaller than the old file (which is totally wrong). The file sizes for the new file is consistently 12263 bytes, which is incorrect
Edit:
Here is a partial code which does the writing to CSV file:
Edit:
Removing code example for simplicity
However when running in multithreaded mode, everything runs fine
except, all the results written to the csv file are wrong and out of
order.
I can make sore guesses as to what you mean by this statement, but it would help if it were more specific.
Is it the case that the results are wrong because outputs from different threads are being jumbled together into the same line or even the same token within a line?
In a csv file, the records are typically separated by newline characters. Can you refactor your solution so that a thread produces a complete line before writing to the output, and writes that line all in one go to the output?
Does your solution already do it that way? (It's not clear... there is no code in the question.)
Related
i have an application which accesses a chosen file and reads the data(text in my case) inside it and does some calculations then updates the text and rewrites it into the file.
the application works perfectly fine in terms of accessing files and rewriting them except in one case
here is the case: if the user uses a Desktop and if the user triggers an event that leads to file operations(reading and writing) and in the same time power got cut then after launching the desktop again the file still exists but it has no data in it(it is entirely emptied)
keep this mind that this is only happening when using a Desktop(it doesn't occur on laptops) and only exactly if a file operation has been triggered a split second before the power outage.
i was wondering is there any workaround in java to prevent happening such a thing, i know chances of this happening is very low but it happened and that is why i am here.
If the power goes off while you are writing a file, there are no guarantees1 that the file will be written completely.
You can use FileDescriptor.sync() to reduce the window of vulnerability (by forcing the OS to flush writes to physical disk), but you can't eliminate it completely.
But as #creinig says, the normal way to do this is to write the data into a temporary file and then rename the temporary file. Many OSes can perform a file rename operation atomically.
Incidentally, you say this:
This is only happening when using a desktop computer (it doesn't occur on laptops) ...
A laptop typically has a battery and is not vulnerable to physical power cuts. When you press the power-off button on a laptop, the OS probably gets a chance to flush the file system before the laptop powers down.
1 - Certainly not with a conventional file system. A transactional file system (e.g. Transactional NTFS) may offer some guarantees, but I'm not sure if the Java runtime understands this.
The classical solution to this would be to write the changed contents to a temp file and afterwards rename the temp file to the correct name. That way an interruption (a) leaves the temp file empty and the original file untouched or (b) causes the temp file to not be renamed, leaving the original file untouched.
Essentially the question boils down to 'Is there a way to run a class/jar file using the console for input/output without having JDK'. I've written a program in Java for a colleague to use in his work, but he doesn't have/use JDK. I'm aware there are online compilers, but at this point my question has become focused on running a program directly for interest reasons.
Essentially the program is designed to output instructions to the user, the
user makes input in accordance with the instructions, and this goes back and forth for awhile before the program does some calculations based off the input and returns a final result.
I should mention that he has JRE.
My program doesn't use a GUI, it just prints instructions to the console and gathers input from responses typed in the console. It works fine on my computer when running it from the command line ('java myprogram'), but without jdk the java command isn't available to him, which seems to mean he can't run a class file.
My next attempt was to turn the program into an executable jar file, but using the command 'myprogram.jar' from the command line doesn't really do anything. The jar file does include a manifest. I added a blank screen to the code and running the jar file did create the screen, but still no i/o on the console. From what I've read, I think this is because jar files aren't automatically associated with a console, so there is nowhere for the program's output to go or input to come from. He can't use the 'java -jar myprogram.jar' command because he hasn't got the JDK, so even though that command runs the program the way I want it to, it's not an option.
Basically, I'm wondering if it's possible for someone without JDK to run a program and interact with it entirely using the console/command line and no other interface?
I have a spark cluster with 2 machines say mach-1 and mach-2.
I code on my local and then export it to JAR, and copy it to mach-1.
Then i run the code on mach-1 using spark-submit.
The code tries to read a local file, which exists on mach-1.
It works well most of the time, but sometimes it gave me errors like File does not exist. So, i then copied the file to mach-2 as well, and now the code works.
Similarly, while writing out the file to local, sometimes it worked when the output folder was only available on mach-1, but then it gave an error, and i created the output folder on mach-2 as well. Now it creates the output in both mach-1 and mach-2 (some part in mach-1 and some part in mach-2).
Is this expected behavior? any pointers to texts explaining this.
P.S: i do not collect my RDDs before writing to local file (I do it in foreach). If i do that, the code works well with output folder only being present on mach-1.
Your input data has to exist at every Node. You can achieve this by copy the data to the nodes, using NFS or HDFS.
For your output you can write to NFS or HDFS. Or you call collect(), but only do it, when your Dataset does fit into the Memory of the Driver. When it doesn't fit you should call rdd.toLocalIterator() or take(n).
Is it possible, that you run your code in Cluster Mode and not in Client Mode?
Let's explain my graduation project then my question.
My graduation project is about compiling java code online.
To compile user's code, I am using java compiler api on background bean class. After compilation is finished .class file is generated on the D:\eclipse-jee-mars-R-win32-x86_64\eclipse directory. Then the progam saves input file that is writed in text area(in JSP) to D:\eclipse-jee-mars-R-win32-x86_64\eclipse directory. After creating input file, the program runs the code by using dos command at runtime and process getInput stream. The getInput stream returns output of the code. If entered code(user's code) creates any output file, output file will be generated also.
We suggest for users that "your code's output file name should be 'output.txt'".
If the user's code creates any output file. "output.txt" file is generated on D:\eclipse-jee-mars-R-win32-x86_64\eclipse directory.Then, the program retrieves 'output.txt' contents to JSP and show the "output" of their codes to users.
Thus far, there is no problem. Problem is starting here. For one user there is no problem but think for 2 users or more. If 2 or more users compile codes at the same time problems will occur because their .class .java and "output.txt" files will be the same file because for all users we creates same name file on D:\eclipse-jee-mars-R-win32-x86_64\eclipse directory. Maybe for all users, creating files as unique name will help me but it creates another problem. Our project also analyse codes by looking runtime of the codes. This solution[unique name] brings parallel working codes and analysis results will be unfair. Because sometimes 1 code is worked, sometimes 100 code is worked on the website.
I need to compile, execute, analyse codes one by one, not parallel.
How can I solve this problem? What a solution can be used for this situation?
"Accepting random code from the internet to compile/run it server-side" - this sounds like a particularly bad idea, opening you up for all kinds of nasty attacks. DOS (Denial of Service) being only one of them.
Further, the architecture to externally execute code and share a common well-known file - as you state - makes it impossible to execute more than one of these programs at the same time
Storing this external code in some IDE's directory is another weird thing to me.
In fact, I think that this whole architecture is such a bad idea that I have a hard time suggesting a solution other than to rethink it: If you have a very good reason for accepting random code from the internet (other than researching exactly this question), please state it here.
With protest (and the suggestion to not follow this tip) I'd point to the option to just pipe the application's console output (e.g. System.out) back to the client instead of that one single wellknown file. At least now you can safely be attacked by many users coming in parallel, without mixing up the replies.
I solved my problem with using synchronized block. It is enough for my project.
In project JSP file.
synchronized(this){
//my compilation, running and analysis part is here
}
Now, one thread can process this part. It means that, for thousands users all codes are processed one by one. No parallel working codes.
I've got a huge csv file that keeps increasing forever [although sometimes it gets reset], I know that's not good but unfortunately I can't change the design since it's another application that keeps adding stuff there.
I have split this file into new smaller files considering that new stuff is going to appear in that csv file every time.For example, one csv file for each 1000 values or something like that.
I'm thinking about writing a small program to do it and run it periodically via Windows Scheduled Tasks, is it the best way of fixing this problem? If so, Can you help me with the code [Java, VB, C#...]. If it's not the best solution, which path should I follow?