I'm using org.apache.commons.net.ftp.FTPClient and seeing behavior that is, well... perplexing.
The method beneath intends to go through an FTPFile list, read them in and then do something with the contents. That's all working. What is not (really) working is that the FTPClient object does the following...
1) Properly retrieves and stores the FIRST file in the list
2) List item evaluates to NULL for x number of successive iterations of the loop (x varies on successive attempts
3) manages to retrieve exactly 1 more file in the list
4) reports that it is null for exactly 1 more file in the list
5) hangs indefinitely, reporting no further activity.
public static String mergeXMLFiles(List<FTPFile> files, String rootElementNodeName, FTPClient ftp){
String ret = null;
String fileAsString = null;
//InputStream inStream;
int c;
if(files == null || rootElementNodeName == null)
return null;
try {
System.out.println("GETTING " + files.size() + " files");
for (FTPFile file : files) {
fileAsString = "";
InputStream inStream = ftp.retrieveFileStream(file.getName());
if(inStream == null){
System.out.println("FtpUtil.mergeXMLFiles() couldn't initialize inStream for file:" + file.getName());
continue;//THIS IS THE PART THAT I SEE FOR files [1 - arbitrary number (usually around 20)] and then 1 more time for [x + 2] after [x + 1] passes successfully.
}
while((c = inStream.read()) != -1){
fileAsString += Character.valueOf((char)c);
}
inStream.close();
System.out.println("FILE:" + file.getName() + "\n" + fileAsString);
}
} catch (Exception e) {
System.out.println("FtpUtil.mergeXMLFiles() failed:" + e);
}
return ret;
}
has anyone seen anything like this? I'm new to FTPClient, am I doing something wrong with it?
According to the API for FTPClient.retrieveFileStream(), the method returns null when it cannot open the data connection, in which case you should check the reply code (e.g. getReplyCode(), getReplyString(), getReplyStrings()) to see why it failed. Also, you are suppose to finalize file transfers by calling completePendingCommand() and verifying that the transfer was indeed successful.
It works ok when I add after the "retrieve" command :
int response = client.getReply();
if (response != FTPReply.CLOSING_DATA_CONNECTION){
//TODO
}
Related
I have two files assume its already sorted.
This is just example data, in real ill have around 30-40 Millions of records each file Size 7-10 GB file as row length is big, and fixed.
It's a simple text file, once searched record is found. ill do some update and write to file.
File A may contain 0 or more records of matching ID from File B
Motive is to complete this processing in least amount of time possible.
I am able to do but its time taking process...
Suggestions are welcome.
File A
1000000001,A
1000000002,B
1000000002,C
1000000002,D
1000000002,D
1000000003,E
1000000004,E
1000000004,E
1000000004,E
1000000004,E
1000000005,E
1000000006,A
1000000007,A
1000000008,B
1000000009,B
1000000010,C
1000000011,C
1000000012,C
File B
1000000002
1000000004
1000000006
1000000008
1000000010
1000000012
1000000014
1000000016
1000000018\
// Not working as of now. due to logic is wrong.
private static void readAndWriteFile() {
System.out.println("Read Write File Started.");
long time = System.currentTimeMillis();
try(
BufferedReader in = new BufferedReader(new FileReader(Commons.ROOT_PATH+"input.txt"));
BufferedReader search = new BufferedReader(new FileReader(Commons.ROOT_PATH+"search.txt"));
FileWriter myWriter = new FileWriter(Commons.ROOT_PATH+"output.txt");
) {
String inLine = in.readLine();
String searchLine = search.readLine();
boolean isLoopEnd = true;
while(isLoopEnd) {
if(searchLine == null || inLine == null) {
isLoopEnd = false;
break;
}
if(searchLine.substring(0, 10).equalsIgnoreCase(inLine.substring(0,10))) {
System.out.println("Record Found - " + inLine.substring(0, 10) + " | " + searchLine.substring(0, 10) );
myWriter.write(inLine + System.lineSeparator());
inLine = in.readLine();
}else {
inLine = in.readLine();
}
}
in.close();
myWriter.close();
search.close();
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println("Read and Write to File done in - " + (System.currentTimeMillis() - time));
}
My suggestion would be to use a database. As said in this answer. Using txt files has a big disadvantage over DBs. Mostly because of the lack of indexes and the other points mentioned in the answer.
So what I would do, is create a Database (there are lots of good ones out there such as MySQL, PostgreSQL, etc). Create the tables that are needed, and read the file afterward. Insert each line of the file into the DB and use the db to search and update them.
Maybe this would not be an answer to your concrete question on
Motive is to complete this processing in the least amount of time possible.
But this would be a worthy suggestion. Good luck.
With this approach I am able to process 50M Records in 150 Second on i-3, 4GB Ram and SSD Hardrive.
private static void readAndWriteFile() {
System.out.println("Read Write File Started.");
long time = System.currentTimeMillis();
try(
BufferedReader in = new BufferedReader(new FileReader(Commons.ROOT_PATH+"input.txt"));
BufferedReader search = new BufferedReader(new FileReader(Commons.ROOT_PATH+"search.txt"));
FileWriter myWriter = new FileWriter(Commons.ROOT_PATH+"output.txt");
) {
String inLine = in.readLine();
String searchLine = search.readLine();
boolean isLoopEnd = true;
while(isLoopEnd) {
if(searchLine == null || inLine == null) {
isLoopEnd = false;
break;
}
// Since file is already sorted, i was looking for the //ans i found here..
long seachInt = Long.parseLong(searchLineSubString);
long inInt = Long.parseLong(inputLineSubString);
if(searchLine.substring(0, 10).equalsIgnoreCase(inLine.substring(0,10))) {
System.out.println("Record Found - " + inLine.substring(0, 10) + " | " + searchLine.substring(0, 10) );
myWriter.write(inLine + System.lineSeparator());
}
// Which pointer to move..
if(seachInt < inInt) {
searchLine = search.readLine();
}else {
inLine = in.readLine();
}
}
in.close();
myWriter.close();
search.close();
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println("Read and Write to File done in - " + (System.currentTimeMillis() - time));
}
Currently I am trying to update an old project.
The problem is, that in one of my sources (bungeecord) they have changed two fileds (see enum "protocol") from public final to final modifier. To make the project work again I need to access these two fields.
As a reason of this I try to "inject" the project. This works great, so the modifier changes but I am currently not able to save it to the jar file. But this is necessary.
The process of saving works perfectly for the "userconnection" (see enum below). In this case I edit a class modifier.
If you need any more Information please let me know.
When the "injection" (enum: protocol) is done and I check the modifier type of these fileds I see that there have been some changes.
But when I restart the system and check the filed modifiers again before the "injection" they are as there were no changes.
public static int inject(InjectionType type) {
try{
System.out.println("Starting injection.");
System.out.println(type.getInfo());
ClassPool cp = ClassPool.getDefault();
CtClass clazz = cp.getCtClass(type.getClazz().getName());
switch (type) {
case USERCONNECTION:
int modifier = UserConnection.class.getModifiers();
if (!Modifier.isFinal(modifier) && Modifier.isPublic(modifier)) {
return -1;
}
clazz.setModifiers(Modifier.PUBLIC);
break;
case PROTOCOL:
CtField field = clazz.getField("TO_CLIENT");
field.setModifiers(Modifier.PUBLIC + Modifier.FINAL);
field = clazz.getField("TO_SERVER");
field.setModifiers(Modifier.PUBLIC + Modifier.FINAL);
break;
default:
return -1; //no data
}
ByteArrayOutputStream bout;
DataOutputStream out = new DataOutputStream(bout = new ByteArrayOutputStream());
clazz.getClassFile().write(out);
InputStream[] streams = { new ByteArrayInputStream(bout.toByteArray()) };
File bungee_file = new File(BungeeCord.class.getProtectionDomain().getCodeSource().getLocation().toURI().getPath());
updateZipFile(bungee_file, type, streams);
return 1;
}catch (Exception e){
e.printStackTrace();
}
return 0;
}
private static void updateZipFile(File zipFile, InjectionType type, InputStream[] ins) throws IOException {
File tempFile = File.createTempFile(zipFile.getName(), null);
if (!tempFile.delete()) {
System.out.println("Warn: Cant delete temp file.");
}
if (tempFile.exists()) {
System.out.println("Warn: Temp target file alredy exist!");
}
if (!zipFile.exists()) {
throw new RuntimeException("Could not rename the file " + zipFile.getAbsolutePath() + " to " + tempFile.getAbsolutePath() + " (Src. not found!)");
}
int renameOk = zipFile.renameTo(tempFile) ? 1 : 0;
if (renameOk == 0) {
tempFile = new File(zipFile.toString() + ".copy");
com.google.common.io.Files.copy(zipFile, tempFile);
renameOk = 2;
if (zipFile.delete()) {
System.out.println("Warn: Src file cant delete.");
renameOk = -1;
}
}
if (renameOk == 0) {
throw new RuntimeException("Could not rename the file " + zipFile.getAbsolutePath() + " to " + tempFile.getAbsolutePath() + " (Directory read only? (Temp:[R:" + (tempFile.canRead() ? 1 : 0) + ";W:" + (tempFile.canWrite() ? 1 : 0) + ",D:" + (tempFile.canExecute() ? 1 : 0) + "],Src:[R:" + (zipFile.canRead() ? 1 : 0) + ";W:" + (zipFile.canWrite() ? 1 : 0) + ",D:" + (zipFile.canExecute() ? 1 : 0) + "]))");
}
if (renameOk != 1) {
System.out.println("Warn: Cant create temp file. Use .copy file");
}
byte[] buf = new byte[Configuration.getLoadingBufferSize()];
System.out.println("Buffer size: " + buf.length);
ZipInputStream zin = new ZipInputStream(new FileInputStream(tempFile));
ZipOutputStream out = new ZipOutputStream(new FileOutputStream(zipFile));
ZipEntry entry = zin.getNextEntry();
while (entry != null) {
String path_name = entry.getName().replaceAll("/", "\\.");
boolean notReplace = true;
for (String f : type.getNames()) {
if (f.equals(path_name)) {
notReplace = false;
break;
}
}
if (notReplace) {
out.putNextEntry(new ZipEntry(entry.getName()));
int len;
while ((len = zin.read(buf)) > 0) {
out.write(buf, 0, len);
}
}
entry = zin.getNextEntry();
}
zin.close();
for (int i = 0; i < type.getNames().length; i++) {
InputStream in = ins[i];
int index = type.getNames()[i].lastIndexOf('.');
out.putNextEntry(new ZipEntry(type.getNames()[i].substring(0, index).replaceAll("\\.", "/") + type.getNames()[i].substring(index)));
int len;
while ((len = in.read(buf)) > 0) {
out.write(buf, 0, len);
}
out.closeEntry();
in.close();
}
out.close();
tempFile.delete();
if (renameOk == -1) {
System.exit(-1);
}
}
}
#Getter
public enum InjectionType {
USERCONNECTION(UserConnection.class, new String[] {"net.md_5.bungee.UserConnection.class"}, "Set modifiers for class UserConnection.class to \"public\""),
PROTOCOL(Protocol.class, new String[] {"net.md_5.bungee.protocol.Protocol"}, "Set modifiers for class Protocol.class to \"public\"");
private Class<?> clazz;
private String[] names;
private String info;
InjectionType (Class<?> clazz, String[] names, String info) {
this.clazz = clazz;
this.names = names;
this.info = info;
}
}
When the "injection" (enum: protocol) is done and I check the modifier type of these fileds I see that there have been some changes. But when I restart the system and check the filed modifiers again before the "injection" they are as there were no changes.
What you're trying to do is permanently modify field's access in a jar file using Java reflection. This cannot work as reflection modifies things in runtime only:
Reflection is an API which is used to examine or modify the behavior of methods, classes, interfaces at runtime.
Excerpt taken from this page.
What you need to do is physically edit the jar itself if you want the changes to be permanent. I know you said that you are not able to do that, but as far as I know that is the only possible way. The file itself has to be physically changed if you want the changes to stick after the application has terminated and be applied before the program has started.
Read the official documentation about Java reflection here.
However I don't really understand why is it important that the changes persists after you've restarted the system. The reason you need to change the access is so you can access and perhaps manipulate the class in some way during runtime. What you are doing is correct, one of the more important apsects of reflection is to manipulate data without actually having to modify the physical files themselves and end up using custom distributions.
EDIT: Read this question, it's comments and the accepted answer. They pretty much say the same thing that you can't edit a jar file that is currently being used by JVM, it's locked in a read-only state.
In my Java application I am using a text file(size ~ 300 MB) which is kept in HDFS. Each line of the file contains a string and an Integer ID separated by a comma. I am reading the file line by line and creating Hashmaps(String, ID) from it.
The file looks like this:
String1,Integer1
String2,Integer2
...
Now, I am currently reading the file from HDFS directly using Apacha Hadoop configuration and FileSystem Object.
Configuration conf = new Configuration();
conf.addResource("core-site.xml"));
conf.addResource("hdfs-site.xml"));
conf.set("fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
conf.set("fs.file.impl", org.apache.hadoop.fs.LocalFileSystem.class.getName());
path= "<some location in HDFS>"
FileSystem fs = FileSystem.get(URI.create(path), conf);
in = fs.open(new Path(path));
The input Stream "in" is passed to another function called read(InputStream in) for reading the file.
public void init(InputStream is) throws Exception {
ConcurrentMap<String, String> pageToId = new ConcurrentHashMap();
ConcurrentMap<String, String> idToPage = new ConcurrentHashMap();
logger.info("Free memory: " + Runtime.getRuntime().freeMemory());
InputStreamReader stream = new InputStreamReader(is, StandardCharsets.UTF_8);
BufferedReader reader = new BufferedReader(stream);
List<String> pageIdMappingColumns = ServerProperties.getInstance().getIdMappingColumns();
String line;
int line_no=0;
while (true) {
try {
line = reader.readLine();
if (line == null) {
break;
}
line_no++;
//System.out.println("Free memory: " + Runtime.getRuntime().freeMemory());
String[] values = line.split(COMMA);
//System.out.println("Free memory: " + Runtime.getRuntime().freeMemory());
if (values.length < pageIdMappingColumns.size()) {
throw new RuntimeException(PAGEMAPPER_INVALID_MAPPING_FILE_FORMAT);
}
String id = EMPTY_STR;
String page = EMPTY_STR;
for (int i = 0; i < values.length; i++) {
String s = values[i].trim();
if (PAGEID.equals(pageIdMappingColumns.get(i))) {
id = s;
continue;
}
if (PAGENAME.equals(pageIdMappingColumns.get(i))) {
page = s;
}
}
pageToId.put(page, id);
idToPage.put(id, page);
} catch (Exception e) {
logger.error(PAGEMAPPER_INIT + e.toString() + " on line " + line_no);
}
}
logger.info("Free memory: " + Runtime.getRuntime().freeMemory());
logger.info("Total number of lines: " + line_no);
reader.close();
ConcurrentMap<String, String> oldPageToId = pageToIdRef.get();
ConcurrentMap<String, String> oldIdToPage = idToPageRef.get();
idToPage.put(MINUS_1, START);
idToPage.put(MINUS_2, EXIT);
pageToId.put(START, MINUS_1);
pageToId.put(EXIT, MINUS_2);
/* Update the Atomic reference hashmaps in memory in two conditions
1. If there was no map in memory(first iteration)
2. If the number of page-names and page-id pairs in the mappings.txt file are more than the previous iteration
*/
if (oldPageToId == null || oldIdToPage != null && oldIdToPage.size() <= idToPage.size() && oldPageToId.size() <= pageToId.size()) {
idToPageRef.set(idToPage);
pageToIdRef.set(pageToId);
logger.info(PAGEMAPPER_INIT + " " + PAGEMAPPER_UPDATE_MAPPING);
} else {
logger.info(PAGEMAPPER_INIT + " " + PAGEMAPPER_LOG_MSZ);
}
}
I am closing the stream when the work is done like this:
IOUtils.closeQuietly(is);
I am executing the above code every 1 hour since the file is being changed in HDFS in that duration. So now, I am getting java.lang.OutOfMemoryError: Java heap space.
My question is: Is it better to copy the file to disk and then use it rather than directly accessing it from HDFS as far as memory requirements are concerned ?
Note: The file has > 3200000 lines.
Stream is always the way to choose.
You're receiving OutOfMemory because you never close your stream, hence memory leak.
Either manually close your stream or use try-with-resource
Edit
pageToId.put(page, id);
idToPage.put(id, page);
You're storing atleast 2x your file size in memory. Which is roughly 600MB.
After that, you assign that value to some ref variable:
idToPageRef.set(idToPage);
pageToIdRef.set(pageToId);
I guess that you're still having reference to old ref data somewhere, hence the internal map data is not released.
You also have resource leak at
throw new RuntimeException(PAGEMAPPER_INVALID_MAPPING_FILE_FORMAT);
You should use try-with-resource or manually close your stream in finally block.
In java is there any functionality equivalent to below c# code for getting stream length.
StreamWriter.BaseStream.Length
I have searched on internet and also I checked the properties of "BufferredWriter", "OutputStreamWriter" and "FileOutputStream" but I did not find anything. Any information is appreciated.
Thank you so much.
An OutputStream has finally the length of the content which YOU write into the stream.
Finally I had to use File.length() property as I found no way to get length from stream like C#.
Here is how it was done:
Note (using flag etc.) which file the stream is associated with.
When you need the length of stream, just get File.Length for the file which you had associated with stream like below.
Why I needed to check length is to prevent writing to file more than defined max length.
String sFilePath = this.m_sLogFolderPath + File.separator;
if(this.m_File2Active == true)
{
sFilePath += Def.DEF_FILE2;
}
else
{
sFilePath += Def.DEF_FILE1;
}
File file = new File(sFilePath);
if(file.length() > this.m_lMaxSize)
{
this.m_bwWriter.flush();
this.m_bwWriter.close();
this.m_bwWriter = null;
sFilePath = this.m_sLogFolderPath + File.separator;
if (this.m_File2Active == true)
{
sFilePath += Def.DEF_FILE1;
this.m_File2Active = false;
}
else
{
sFilePath += Def.DEF_FILE2;
this.m_File2Active = true;
}
this.m_bwWriter = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(sFilePath, true), Def.DEF_ENCODING_UTF8));
}
Could you please suggest how to deal with these situations ? I understand that in the second example, it is very rare that it would happen on unix, is it ? If access rights are alright. Also the file wouldn't be even created. I don't understand why the IOException is there, either it is created or not, why do we have to bother with IOException ?
But in the first example, there will be a corrupted zombie file. Now if you tell the user to upload it again, the same thing may happen. If you can't do that, and the inputstream has no marker. You loose your data ? I really don't like how this is done in Java, I hope the new IO in Java 7 is better
Is it usual to delete it
public void inputStreamToFile(InputStream in, File file) throws SystemException {
OutputStream out;
try {
out = new FileOutputStream(file);
} catch (FileNotFoundException e) {
throw new SystemException("Temporary file created : " + file.getAbsolutePath() + " but not found to be populated", e);
}
boolean fileCorrupted = false;
int read = 0;
byte[] bytes = new byte[1024];
try {
while ((read = in.read(bytes)) != -1) {
out.write(bytes, 0, read);
}
} catch (IOException e) {
fileCorrupted = true;
logger.fatal("IO went wrong for file : " + file.getAbsolutePath(), e);
} finally {
IOUtils.closeQuietly(in);
IOUtils.closeQuietly(out);
if(fileCorrupted) {
???
}
}
}
public File createTempFile(String fileId, String ext, String root) throws SystemException {
String fileName = fileId + "." + ext;
File dir = new File(root);
if (!dir.exists()) {
if (!dir.mkdirs())
throw new SystemException("Directory " + dir.getAbsolutePath() + " already exists most probably");
}
File file = new File(dir, fileName);
boolean fileCreated = false;
boolean fileCorrupted = false;
try {
fileCreated = file.createNewFile();
} catch (IOException e) {
fileCorrupted = true;
logger.error("Temp file " + file.getAbsolutePath() + " creation fail", e);
} finally {
if (fileCreated)
return file;
else if (!fileCreated && !fileCorrupted)
throw new SystemException("File " + file.getAbsolutePath() + " already exists most probably");
else if (!fileCreated && fileCorrupted) {
}
}
}
I really don't like how this is done in Java, I hope the new IO in Java 7 is better
I'm not sure how Java is different than any other programming language/environment in the way you are using it:
a client sends some data to your over the wire
as you read it, you write it to a local file
Regardless of the language/tools/environment, it's possible for the connection to be interrupted or lost, for the client to go away, for the disk to die, or for any other error to occur. I/O errors can occur in any and all environments.
What you can do in this situation is highly dependent on the situation and the error that occured. For example, is the data structured in some way where you could ask the user to resume uploading from record 1000, for example? However, there is no single solution that fits all here.