Binary files differ but not to JVM? - java

I'm having an issue where org.apache.commons.io.FileUtils.copyFile(File, File) is producing slightly different files. When I compare these files with bsdiff or in an editor, I can tell they're different. Certain bytes are being copied as question marks. For example 0200 (octal) is being copied as ? (077 octal).
So, I create a test case to include in a bug report. I make a copy of the executable, and then compare using FileUtils.checksumCRC32(File). Unexpectedly, the files have the same checksum. I then compare them by iterating through a FileInputStream of each file. This also asserts that the files are the same.
The files certainly differ. One runs, the other doesn't. bsdiff produces a diff of the two files. I can tell that certain bytes are being copied wrong by inspecting the files with my eyes.
However, to the JVM these files are the same. Any ideas of why I'm observing this behavior?
System info:
Windows 7, 64 bit; JVM 1.6.0_22, 32 bit

Eh, sorry everyone. Maven was 'filtering' the executable, which changed the encoding before copying it to maven's 'target' directory. Then FileUtils was correctly copying the messed up executable from 'target' to the destination. I was comparing the version in my source directory to the one in the destination.

This program writes every possible byte and reads them back in again. If the files were being corrupted how would Java turn those bytes back into their original values. i.e. how could it tell that 077 is 0200 and not 077.
byte[] bytes = new byte[256];
for(int i=0;i<256;i++)
bytes[i] = (byte) i;
FileUtils.writeByteArrayToFile(new File("tmp.dat"), bytes);
byte[] bytes2 = FileUtils.readFileToByteArray(new File("tmp.dat"));
System.out.println("equals "+Arrays.equals(bytes, bytes2));
a dump of the file shows.
od -x tmp.dat
0000000 0100 0302 0504 0706 0908 0b0a 0d0c 0f0e
0000020 1110 1312 1514 1716 1918 1b1a 1d1c 1f1e
0000040 2120 2322 2524 2726 2928 2b2a 2d2c 2f2e
0000060 3130 3332 3534 3736 3938 3b3a 3d3c 3f3e
0000100 4140 4342 4544 4746 4948 4b4a 4d4c 4f4e
0000120 5150 5352 5554 5756 5958 5b5a 5d5c 5f5e
0000140 6160 6362 6564 6766 6968 6b6a 6d6c 6f6e
0000160 7170 7372 7574 7776 7978 7b7a 7d7c 7f7e
0000200 8180 8382 8584 8786 8988 8b8a 8d8c 8f8e
0000220 9190 9392 9594 9796 9998 9b9a 9d9c 9f9e
0000240 a1a0 a3a2 a5a4 a7a6 a9a8 abaa adac afae
0000260 b1b0 b3b2 b5b4 b7b6 b9b8 bbba bdbc bfbe
0000300 c1c0 c3c2 c5c4 c7c6 c9c8 cbca cdcc cfce
0000320 d1d0 d3d2 d5d4 d7d6 d9d8 dbda dddc dfde
0000340 e1e0 e3e2 e5e4 e7e6 e9e8 ebea edec efee
0000360 f1f0 f3f2 f5f4 f7f6 f9f8 fbfa fdfc fffe

Related

JDK11 getFreeSpace and getTotalSpace from File is not matching df

I am seeing df -h giving output like below
root#vrni-platform:/var/lib# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg-var 110G 94G 11G 91% /var
root#vrni-platform:/var/lib# df -k
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/vg-var 114756168 98318504 10585300 91% /var
But if I do the same from java like below
final File dataPath = new File("/var");
final long totalBytes = dataPath.getTotalSpace();
final long usedBytes = totalBytes - dataPath.getFreeSpace();
System.out.printf("Disk utilization: %.2f, Total bytes: %d, Used Bytes: %d", ((double)usedBytes/totalBytes * 100), totalBytes, usedBytes);```
It is printing like below
Disk utilization: 85.68, Total bytes: 117510316032, Used Bytes: 100678909952
Can someone let me know why is this discrepancy in disk utilization?
Environment
Ubuntu 18.04
Java - Zulu OpenJDK 11.0.11
As I also mentioned in the comments, the primary reason is that getFreeSpace seems to report something else than DFs 'Avail' or 'Available'. Going by DFs '1K-blocks' and 'Used', you also get a percentage of 85,68%, while going by '1K-blocks' and 'Available' yields 91%. Also observe how DFs 'Used' and 'Available' (and 'Used' and 'Avail') do not add up to '1K-blocks' (or 'Size')
As suggested by user16320675, using getUsableSpace might be a better method to use than getFreeSpace. As to the reasons for the difference between '1K-blocks' - 'Used' and 'Available' in df, it might be better to ask that on https://unix.stackexchange.com/.

python tarfile.py "file could not be opened successfully"

I have a tarball that I can't open using python:
>>> import tarfile
>>> tarfile.open('/tmp/bad.tar.gz')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "tarfile.py", line 1672, in open
raise ReadError("file could not be opened successfully")
tarfile.ReadError: file could not be opened successfully
but I'm able to extract the file with no problem on the command line.
$ tar -xzvf /tmp/bad.tar.gz
I've traced the python tarfile code, and there's a function "nti" where they're converting bytes. It gets to this line:
obj.uid = nti(buf[108:116])
and blows up. These bits (for the UID) coming through as eight spaces. Not sure where to go from here...
Honestly it looks like the bug is in tarfile.py's nti function:
n = int(nts(s) or "0", 8)
The fall-through logic (or "0") is not working because s is spaces, not None, so int() blows up.
I copied tarfile.py from /var/lib/python2.7/ and wrapped that particular line with a try/catch, which fixed me up:
try:
obj.uid = nti(buf[108:116])
except InvalidHeaderError:
obj.uid = 0
It's a hack solution, though. Really I'd prefer that the python folk took a look at it and fixed the "or "0" logic.
Update
Turns out the tarball was created by the maven-assembly-plugin in a Java 6 project that had just been upgraded to Java 7. The issue was resolved by upgrading the maven-assembly-plugin to 2.5.3.

ENCOG values in output file incorrectly denormalized?

The following was produced using the most recent version of encog-workbench (3.2.0)
I was wondering if this is a bug or if I do not grasp the purpose of the output file.
When I run the [ sunspot example ][1] in the encog workbench, without seggregation, i expect the output file to have the fitted values from the model. When i create the validation chart it presents me with the figure found in the tutorial so this seems correct.
But when i go to the sunspots_output.csv output file I get the following output:
ssn(t-29) ssn(t+1) Output:ssn(t+1)
... first thirty values have output Null ...
-0.600472813 -0.947202522 null
-0.477541371 -1 8.349050184
-0.528762805 -0.976359338 8.334476431
-0.814814815 -0.986603625 8.314903157
-0.817178881 -0.892040977 8.292847897
...
All the output values are around 8 for the rest of the file.
Now when i go back to the validation chart, there is a tab data, which contains the following columns:
Ideal Result
-0.477541371 -0.52449577
-0.528762805 -0.526507195
-0.814814815 -0.535029097
-0.817178881 -0.653884012
If I denormalize the values in these columns, I get the following.
66.3 60.3414868
59.8 60.08623701
23.5 59.00480764
23.2 43.92211894
These seem to be correct values for the actual(if i compare them with the original data) and thus these should be the predicted values in the output column.
Is this a bug or do the values in the output(t+1) column mean something else.
I copied these values to excel and denormalized by typing in the formula for (-1,1).
I was hoping not to have to do this every time I run an experiment.
I am going to move to code eventually. Just wanted to get some preliminary results with the workbench. Using segregation results in the same problem, btw.
If its a bug I'll report it on the encog website.
Thanks for your answers,
Florian
UPDATE
Hey Jef, I downloaded your zip and reproduced the problem using my workbench.
The problem only arises when i do not seggregate, which i do not want to.
There are some clear differences in the .ega file created by workbench-excecutable3.2.0
When i use your .ega file and remove the seggregate section, it works.
When i use mine it doesn't. That's why i uploaded my project [here][2]:
Maybe you can discover if something new interferes with outputting the correct values.
Hope it helps!
Update 3:
My actual goal is to build a forecaster of which the project can be found here:
http://wikisend.com/download/477372/Myproject.rar
I was wondering if you could tell me if I am doing something definitely wrong, because currently my output is total rubbish.
Thanks again.
I tried to reproduce the error, but when I ran my own sunspots prediction I did get predicted values closer to the expected range. You might try running the zipped version of the example, found here.
http://www.heatonresearch.com/dload/encog/example/workbench/SunspotExample.zip
You should be able to run the EGA file and it will produce an output file. Some of my data are as follows:
"year" "mon" "ssn" "dev" "Output:ssn(t+1)"
1948 5 174.0 69.3 156.3030108771
1948 6 167.8 26.6 168.4791037592
1948 7 142.2 28.3 208.1090604116
1948 8 157.9 35.3 186.0234029962
1948 9 143.3 55.9 131.5008296846
1948 10 136.3 44.9 93.0720770479
1948 11 95.8 21.8 89.8269594386
Perhaps comparing the EGA file for the above zip to your EGA file.

Why do I see scrabbled output when using JSch?

I am trying to use JSch. I tried the example here
Although I can connect the output is weird.
I get the following:
Last login: Thu Jan 31 19:44:25 2013 from 10.2.251.77
[1mcli:~ # [m
And if I do e.g. an ls I get:
[0m[01;34m.InstallAnywhere[0m [00m.bash_history [00m.bash_profile[0m
[01;34mbin[0m [00msles11-patched[0m
[01;34m.kbd[0m [00mindex.html[0m [00mtest.sql[0m
[00m.viminfo[0m [00;31mipvsadm-1.26-1.src.rpm[0m
[m[1mcli:~ # [m
These are the directory contents but why are they displayed like that?
I am running in this from Eclipse and this is what I see in Eclipse output. If I run this from Windows CMD it stucks
Update:
I noticed that if I connect to a different linux the output is fine!
Only if I connect to a specific linux installation I see these weird characters! Any idea what is causing this?
Update2:
Following the link of #PeterMmm I did printf "äöü" | xxd. Both the "bad" and good one give:
0000000: e4f6 fc
I also did locale.
In the "bad" case:
# locale
LANG=POSIX
LC_CTYPE=en_US.UTF-8
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=
In the good system:
LANG=POSIX
LC_CTYPE=en_US.UTF-8
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=
Configuration seems to be the same. So what could be causing this?
Please check
Funny Shell Output: [01;32mtestfile.txt[00m instead of testfile.txt
((ChannelShell) channel).setPtyType("dumb");
Does the trick.
They are escape sequences for the terminal emulation. I guess that there is no relation to the character encoding.
Update:
If ChannelShell#setPty(false) is invoked, a pseudo-terminal will not be allocated and escape sequences will not be appeared.
Channel channel=session.openChannel("shell");
((ChannelShell)channel).setPty(false); // !!
...
channel.connect();

Java: why are multiple objects showing up with runhprof output?

I was curious about the runhprof output? I am mainly concerned about the memory section. It looks like there are multiple entries of the same class. Why would that be.
Is there a way to get hprof to print how much memory a particular class(the instances of that class) take up in memory. One value for each class.
Also, what tools do you use beside 'hat' to analyze the output?
I ran the java command with jvm arg:
-Xrunhprof:heap=sites,depth=4,format=a,file=prof/hprof_dump.txt
Here is brief snippet of the output. Some classes are listed multiple times in the output.
SITES BEGIN (ordered by live bytes) Tue Jul 28 19:33:41 2009
percent live alloc'ed stack class
rank self accum bytes objs bytes objs trace name
1 29.75% 29.75% 700080 43755 576000016 36000001 307483 java.lang.Double
2 7.13% 36.88% 167840 5245 370432 11576 300993 clojure.lang.PersistentHashMap$LeafNode
3 2.09% 38.98% 49296 2054 60048 2502 301295 clojure.lang.Symbol
4 2.09% 41.07% 49200 3 49200 3 301071 char[]
5 1.33% 42.40% 31344 1306 68088 2837 300998 clojure.lang.PersistentHashMap$BitmapIndexedNode
6 1.10% 43.50% 25800 645 25800 645 301050 clojure.lang.Var
7 1.05% 44.54% 24624 3 24624 3 301069 byte[]
8 0.86% 45.40% 20184 841 49608 2067 301003 clojure.lang.PersistentHashMap$INode[]
9 0.78% 46.18% 18304 572 58720 1835 301308 clojure.lang.PersistentList
10 0.75% 46.93% 17568 549 17568 549 308832 java.lang.String[]
11 0.70% 47.62% 16416 2 16416 2 301036 byte[]
Eclipse Memory Analyzer is excellent. Loads the dump file up very very quickly, produces lots of nice reports about the heapdump, lets you query the dump for objects/classes using a SQL-like language. Love it.

Categories

Resources