This question came up in Spring class, which has some rather long class names. Is there a limit in the language for class name lengths?
The Java Language Specification states that identifiers are unlimited in length.
In practice though, the filesystem will limit the length of the resulting file name.
65535 characters I believe. From the Java virtual machine specification:
The length of field and method names,
field and method descriptors, and
other constant string values is
limited to 65535 characters by the
16-bit unsigned length item of the
CONSTANT_Utf8_info structure (§4.4.7).
Note that the limit is on the number
of bytes in the encoding and not on
the number of encoded characters.
UTF-8 encodes some characters using
two or three bytes. Thus, strings
incorporating multibyte characters are
further constrained.
here:
https://docs.oracle.com/javase/specs/jvms/se6/html/ClassFile.doc.html#88659
With JDK 1.5, the practical limit for class names on Windows XP with 255 -- longer names gave errors in the file system. This was the full name (directory+package+class).
I have not tried JDK 1.6 on Vista or windows 7, hopefully Sun fixed it to be the NTFS limit of 8000 or so.
No. Java doesn't impose any limit on the class name. But if you interfacing with other systems (e.g. JNI) its better to be on the safe side.
Related
The Java Class File Specification states that:
The code array gives the actual bytes of Java Virtual Machine code that implement the method.
When the code array is read into memory on a byte-addressable machine, if the first byte of the array is aligned on a 4-byte boundary, the tableswitch and lookupswitch 32-bit offsets will be 4-byte aligned. (Refer to the descriptions of those instructions for more information on the consequences of code array alignment.)
(https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html#jvms-4.7.3)
How would I interpret this statement?
The wikipedia page for those 2 instructions mentions this: (https://en.wikipedia.org/wiki/Java_bytecode_instruction_listings)
Tableswitch additional bytes:
4+: [0–3 bytes padding], defaultbyte1, defaultbyte2, defaultbyte3, defaultbyte4, lowbyte1, lowbyte2, lowbyte3, lowbyte4, highbyte1, highbyte2, highbyte3, highbyte4, jump offsets...
Lookupswitch additional bytes:
4+: <0–3 bytes padding>, defaultbyte1, defaultbyte2, defaultbyte3, defaultbyte4, npairs1, npairs2, npairs3, npairs4, match-offset pairs...
I think the <0–3 bytes padding> is relevant to the Class File Specification statement, I just don't know how exactly.
The tableswitch and lookupswitch instructions are defined to have between 0 and 3 bytes of padding, depending on their offset within the method's bytecode. The actual definition of the padding can be found in section 6.5 where the formats of each instruction are listed.
Immediately after the lookupswitch opcode, between zero and three
bytes must act as padding, such that defaultbyte1 begins at an address
that is a multiple of four bytes from the start of the current method
(the opcode of its first instruction).
The statement you highlighted explains the motivation for this design choice, which might otherwise seem odd or pointless.
This allows for a more efficient implementation of a Java interpreter, because if the code is loaded at a 4-byte aligned address, the offsets and keys in the switch tables can be read with aligned access.
Of course, it isn't that important nowadays, because we have fancy JITs, but back in the early days of Java, the JVM probably was implemented as a naive interpreter where this would make a big difference in performance.
I know that a method cannot be larger than 64 KB with Java. The limitation causes us problems with generated code from a JavaCC grammar. We had problems with Java 6 and were able to fix this by changing the grammar. Has the limit been changed for Java 7 or is it planned for Java 8?
Just to make it clear. I don't need a method larger than 64 KB by myself. But I wrote a grammar which compiles to a very large method.
According to JVMS7 :
The fact that end_pc is exclusive is a historical mistake in the
design of the Java virtual machine: if the Java virtual machine code
for a method is exactly 65535 bytes long and ends with an instruction
that is 1 byte long, then that instruction cannot be protected by an
exception handler. A compiler writer can work around this bug by
limiting the maximum size of the generated Java virtual machine code
for any method, instance initialization method, or static initializer
(the size of any code array) to 65534 bytes.
But this is about Java 7. There is no final specs for Java 8, so nobody (except its developers) could answer this question.
UPD (2015-04-06) According to JVM8 it is also true for Java 8.
Good question. As always we should go to the source to find the answer ("The Java® Virtual Machine Specification"). The section does not explicitly mention a limit (as did the Java6 VM spec) though, but somewhat circumspectly:
The greatest number of local variables in the local variables array of a frame created upon invocation of a method (§2.6) is limited to 65535 by the size of the max_locals item of the Code attribute (§4.7.3) giving the code of the method, and by the 16-bit local variable indexing of the Java Virtual Machine instruction set.
Cheers,
It has not changed. The limit of code in methods is still 64 KB in both Java 7 and Java 8.
References:
From the Java 7 Virtual Machine Specification (4.9.1 Static Constraints):
The static constraints on the Java Virtual Machine code in a class file specify how
Java Virtual Machine instructions must be laid out in the code array and what the
operands of individual instructions must be.
The static constraints on the instructions in the code array are as follows:
The code array must not be empty, so the code_length item cannot have the
value 0.
The value of the code_length item must be less than 65536.
From the Java 8 Virtual Machine Specification (4.7.3 The Code Attribute):
The value of the code_length item gives the number of bytes in the code array
for this method.
The value of code_length must be greater than zero (as the code array must
not be empty) and less than 65536.
Andremoniy has answered the java 7 part of this question already, but seems at that time it was soon to decide about java 8 so I complete the answer to cover that part:
Quoting from jvms:
The fact that end_pc is exclusive is a historical mistake in the design of the Java Virtual Machine: if the Java Virtual Machine code for a method is exactly 65535 bytes long and ends with an instruction that is 1 byte long, then that instruction cannot be protected by an exception handler. A compiler writer can work around this bug by limiting the maximum size of the generated Java Virtual Machine code for any method, instance initialization method, or static initializer (the size of any code array) to 65534 bytes.
As you see seems this historical problem doesn't seem to remedy at least in this version (java 8).
As a workaround, and if you have access to the parser's code, you could modify it to work within whatever 'limits are imposed by the JVM compiler ...
(Assuming it den't take forever to find the portions in the parser code to modify)
So far as I know, when JRE executes an Java application,
the string will be seen as a USC2 byte array internally.
In wikipedia, the following content can be found.
Java originally used UCS-2, and added UTF-16 supplementary character support in J2SE 5.0.
With the new release version of Java (Java 7) ,
what is its internal character-encoding?
Is there any possibility that Java start to use UCS-4 internally ?
Java 7 still uses UTF-16 internally (Read the last section of the Charset Javadoc), and it's very unlikely that will change to UCS-4. I'll give you two reasons for that:
Changing from UCS-2=>UCS-4 would most likely meant that they would have to change the char primitive from a 16 bits type to a 32 bits type. Looking in the past at how high Sun/Oracle have valued backwards compatibility, a change like this is very unlikely.
A UCS-4 takes a lot more memory than a UTF-16 encoded String for most use cases.
Q: So far as I know, when JRE executes an Java application, the string
will be seen as a (16-bit Unicode) byte array
A: Yes
Q: With the new release version of Java (Java 7) , what is its
internal charater-encoding?
A: Same
Q: Is there any possibility that Java start to use UCS-4 internally?
A: I haven't heard anything of the kind
However, you can use "code-points" to implement UTF-32 characters in Java 5 and higher:
http://www.ibm.com/developerworks/java/library/j-unicode/
http://jcp.org/en/jsr/detail?id=204
How can null byte injection be done on a java webapp, Or rather - how does on protect against it?
Should I look at each byte of the request parameter and inspect its 'byte' value to be 0 ? I can't imagine a 0 byte sneaking in a request parameter... can it?
My main aim is to make sure the filename used for saving the file is safe enough. And for now, I am not looking answers that recommend (for example): replacing ALL non-word characters with Underscore.
Allowing the user to store files with arbitrary names is dangerous. What happens if the user provides "../../../WINDOWS/explorer.exe"? You should restrict filenames to only contain characters known to be harmless.
'\0' is not known to be harmless. As far as Java is concerned, '\0' is a character like any other. However, the operating system is likely to interpret '\0' as the end of a string. If a string is passed from Java to the operating system, that different interpretation could result in exploitable bugs. Consider:
if (filename.endsWith(".txt") {
store(filename, data);
}
where filename is "C:\Windows\explorer.exe\0.txt", which ends with ".txt" to Java, but with ".exe" to the operating system.
I'm not sure why you're concerned with null byte injection. Java isn't like C/C++, where strings are null-terminated character arrays.
You ought to bind and validate parameters and values coming in from the web tier. How do you define "safe enough"?
You have 2 choices:
1 Scan the string (convert it to a char array first) for null bytes.
2 upgrade to Java 8 or Java 7u40 and you are protected. (Yes, i tested it!, it works!)
in May 1013 Oracle fixed the problem: http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8014846
Null byte injection in filenames was fixed in Java 7 update 40 (released around Sept. 2013). So, its been fixed for a while now, but it WAS a problem for over a decade and it was a NASTY vulnerability in Java. The fix is documented here: http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8014846
-Dave Wichers
I want to print the number of bytes of all free space of a partition using File.getUsuableSpace() but it returns another value than the df command (in Linux) does.
It does not differ that much, but it differs in several megabytes.
What could be the case?
Is the file that your File instance represents on the same partition as the one you're running df on?
From the docs: "Returns the number of bytes available to this virtual machine on the partition named by this abstract pathname"
Emphasis is mine.
This will only give the available space, try getFreeSpace() will return the free space and not available to use space. This works in Java 1.6, for more information, look at the File class documentation.