Related
Why in java (I dont know any other programming languages) can an identifier not start with a number and why are the following declarations also not allowed?
int :b;
int -d;
int e#;
int .f;
int 7g;
Generally you put that kind of limitation in for two reasons:
It's a pain to parse electronically.
It's a pain for humans to parse.
Consider the following code snippet:
int d, -d;
d = 3;
-d = 2;
d = -d;
If -d is a legal identifier, then which value does d have at the end? -3 or 2? It's ambiguous.
Also consider:
int 2e10f, f;
2e10f = 20;
f = 2e10f;
What value does f have at the end? This is also ambiguous.
Also, it's a pain to read either way. If someone declares 2ex10, is that a typo for two million or a variable name?
Making sure that identifiers start with letters means that the only language items they can conflict with are reserved keywords.
That's because section 3.8 of the Java Language Specification says so.
An identifier is an unlimited-length
sequence of Java letters and Java
digits, the first of which must be a
Java letter. An identifier cannot have
the same spelling (Unicode character
sequence) as a keyword (§3.9), boolean
literal (§3.10.3), or the null literal
(§3.10.7).
As for why this decision was made: probably because this simplifies parsing, avoids ambiguous grammar, allows introduction of special syntax in a later version of the language and/or for historical reasons (i.e. because most other languages have the same restrictionsimilar restrictions). Note that your examples example with -d is especially clear:
int -d = 7;
System.out.println("Some number: " + (8 + -d));
Is the minus the first part of an identifier, or the unary minus?
Furthermore, if you had both -d and d as variables, it would be completely ambiguous:
int -d = 7;
int d = 2;
System.out.println("Some number: " + (8 + -d));
Is the result 15 or 6?
I don't know exactly but i think that's because numbers are used to represent literal values, so when the compiler find a token that starts with a number, it knows it is dealing with a literal. if an identifier could start with a number, the compiler would need to use a look ahead to find the next character in the token to find out if it is an identifier or a literal.
Such things aren't allowed in just about any language (I can't think of one right now), mostly to prevent confusion.
Your example -d is an excellent example. How does the compiler know if you meant "the variable named -d" or "the negative of the number in the variable d"? Since it can't tell (or worse yet, it could so you couldn't be sure what would happened when you typed that without reading the rest of the file), it's not allowed.
The example 7g is the same thing. You can specify numbers as certain bases or types by adding letters to the end. The number 8357 is an int in Java, where as 8357L is a long (since there is an 'L' on the end). If variables could start with numbers, there would be cases where you couldn't tell if it was supposed to be a variable name or just a literal.
I would assume the others you listed have similar reasons behind them, some of which may be historical (i.e. C couldn't do it for reason X, and Java is designed to look like C so they kept the rule).
In practice, they are almost never a problem. It's very rare you find a situation where such things are annoying. The one you'll run into the most is variables starting with numbers, but you can always just spell them out (i.e. oneThing, twoThing, threeThing, etc.).
Languages could allow some of these things, but this simplifying assumption makes it easier on the compiler writer, and on you, the programmer, to read the program.
Parsers are (usually) written to break the source text into "tokens" first. An identifier that starts with a number looks like a number. Besides 5e3, is a valid number (5000.0) in some languages.
Meanwhile : and . are tokenized as operators. In some contexts an identifier that starts with one of these would lead to ambiguous code. And so forth.
Every language needs to define what is a valid character for an identifier and what is not. Part of the consideration is going to be ease of parsing, part is going to be to avoid ambiguity (in other words even a perfect parsing algorithm couldn't be sure all the time), part is going to be the preference of the language design (in Java's case similarity with C, C++) and some is just going to be arbitrary.
The point is it has to be something, so this is what it is.
For example, are there not numerous times we wish to have objects with these names?
2ndInning
3rdBase
4thDim
7thDay
But imagine when someone might try to have a variable with the name 666:
int 666 = 777;
float 666F = 777F;
char 0xFF = 0xFF;
int a = 666; // is it 666 the variable or the literal value?
float b = 666F // is it 666F the variable or the literal value?
Perhaps, one way we might think is that variables that begin with a numeral must end with an alphabet - so long as
it does not start with 0x and end with a letter used as a hexadeciamal digit, or
it does not end with characters such as L or F,
etc, etc.
But such rules would make it really difficult for programmers as Yogi Berra had quipped about - how could you think and hit at the same time? You are trying to write a computer programme as quickly and error free as possible and then you would have to bother with all these little bits and pieces of rules. I would rather, as a programmer, have a simple rule on how variables should be named.
In my efforts using lexers and regexp to parse data logs and data streams for insertion into databases, I have not found having a keyword or variable beginning with a numeral would make it anymore difficult to parse - so long there are as short a path as possible to remove ambiguity.
Therefore, it is not so much as making it easier for the compiler but for the programmer.
I have a .toUpperCase() happening in a tight loop and have profiled and shown it is impacting application performance. Annoying thing is it's being called on strings already in capital letters. I'm considering just dropping the call to .toUpperCase() but this makes my code less safe for future use.
This level of Java performance optimization is past my experience thus far. Is there any way to do a pre-compilation, set an annotation, etc. to skip the call to toUpperCase on already upper case strings?
What you need to do if you can is call .toUpperCase() on the string once, and store it so that when you go through the loop you won't have to do it each time.
I don't believe there is a pre-compilation situation - you can't know in advance what data the code will be handling. If anyone can correct me on this, it's be pretty awesome.
If you post some example code, I'd be able to help a lot more - it really depends on what kind of access you have to the data before you get to the loop. If your loop is actually doing the data access (e.g., reading from a file) and you don't have control over where those files come from, your hands are a lot more tied than if the data is hardcoded.
Any many cases there's an easy answer, but in some, there's not much you can do.
You can try equalsIgnoreCase, too. It doesn't make a new string.
No you cannot do this using an annotation or pre-compilation because your input is given during the runtime and the annotation and pre-compilation are compile time constructions.
If you would have known the input in advance then you could simply convert it to uppercase before running the application, i.e. before you compile your application.
Note that there are many ways to optimize string handling but without more information we cannot give you any tailor made solution.
You can write a simple function isUpperCase(String) and call it before calling toUpperCase():
if (!isUpperCase(s)) {
s = s.toUpperCase()
}
It might be not significantly faster but at least this way less garbage will be created. If a majority of the strings in your input are already upper case this is very valid optimization.
isUpperCase function will look roughly like this:
boolean isUpperCase(String s) {
for (int i = 0; i < s.length; i++) {
if (Character.isLowerCase(s.charAt(i)) {
return false;
}
}
return true;
}
you need to do an if statement that conditions those letters out of it. the ideas good just have a condition. Then work with ascii codes so convert it using (int) then find the ascii numbers for uppercase which i have no idea what it is, and then continue saying if ascii whatever is true then ignore this section or if its for specific letters in a line then ignore it for charAt(i)
sorry its a rough explanation
Users submit code (mainly java) on my site to solve simple programming challenges, but sending the code to a server to compile and execute it can sometimes take more than 10 seconds.
To speed up this process, I plan to first check the submissions database to see if equivalent code has been submitted before. I realize this will cause Random methods to always return the same result, but that doesn't matter much. Is there any other potential problem that could be caused by not running the code?
To find matches, I remove comments and whitespace when comparing code. However, the same code can still be written in different ways, such as with different variable names. Is there a way to compare code that will find more equivalent code?
You could store a SHA1 hash of the code to compare with a previous submission. You are right that different variable names would give different hashes. Try running the code through a minifier or obfuscator. That way, variable cat and dog will both end up like a1, then you could see if they are unique. The only other way would be to actually compile it into bytecode, but then it's too late.
Instead of analyzing the source code, why not speed up the compilation? Try having a servlet container always running with a custom ClassLoader, and use the JDK tools.jar to compile on the fly. You could even submit the code via AJAX REST and get the results back the same way.
Consider how Eclipse compiles your files in the background.
Also, consider how http://ideone.com implements their online compiler.
FYI It is a big security risk to allow random code execution. You have to be very careful about hackers.
Variable names:
You can write code to match variable names in one file with the variable names in the other, then you can replace both sets with a consistent variable name.
File 1:
var1 += this(var1 - 1);
File 2:
sum += this(sum - 1);
After you read File 1, you look for what variable name File 2 is using in the place of sum, then make the variable names the same across both files.
*Note, if variables are used in similar ways you may get incorrect substitutions. This is most likely when variables are being declared. To help mitigate this, you can start searching for variable names at the bottom of the file and work up.
Short hands:
Force {} and () braces into each if/else/for/while/etc...
rewrite operations like "i+=..." as "i=i+..."
Functions:
In cases where function order doesn't matter, you can make sure functions are equivalent and then ignore them.
Operator precedence:
"3 + (2 * 4)" is usually equivalent to "2 * 4 + 3"
A way around this could be by determining the precedence of each operation and then matching it to an operation of the same precedence in the other set of code. Once a set of operations have been matched, you can replace them with a variable to represent them.
Ex.
(2+4) * 3 + (2+6) * 5 == someotherequation
//substitute most precedent: (2+4) and (2+6) for a and b
... a * 3 + b * 5
//substitute most precedent: (a*3) and (b*5) for c and d
... c + d
//substitute most precedent....
These are just a couple ways I could think of. If you do it this way, it'll end up being quite a big project... especially if you're working with multiple languages.
What do parenthesis do in Java other than type casting.
I've seen them used in a number of confusing situations, here's one from the Java Tutorials:
//convert strings to numbers
float a = (Float.valueOf(args[0]) ).floatValue();
float b = (Float.valueOf(args[1]) ).floatValue();
I only know only two uses for parenthesis, calls, and grouping expressions. I have searched the web but I can't find any more information.
In the example above I know Float.valueOF(arg) returns an object. What effect does parenthesize-ing the object have?
Absolutely nothing. In this case they are not necessary and can be removed. They are most likely there to make it more clear that floatValue() is called after Float.valueOf().
So this is a case of parenthesis used to group expressions. Here it's grouping a single expression (which does obviously nothing).
It can be shortened to:
float a = Float.valueOf(args[0]).floatValue();
float b = Float.valueOf(args[1]).floatValue();
which can then be logically shortened to
float a = Float.parseFloat(args[0]);
float b = Float.parseFloat(args[1]);
I dont believe they serve any purpose here. Maybe left over after some refactoring
None other than to confuse you. It's as good as saying
float a = Float.valueOf(args[0]).floatValue();
directly.
I suspect the programmer just found it more readable. I don't agree with him in this particular case, but often use parentheses to make it clearer. For example, I find
int i = 3 + (2 * 4);
clearer than
int i = 3 + 2 * 4;
The extra parentheses in your code sample do not add anything.
//Your example:
float a = (Float.valueOf(args[0]) ).floatValue();
// equivalent:
float a = Float.valueOf(args[0]).floatValue();
It could be that the original programmer had done something more elaborate within the parentheses and so had them for grouping, and neglected to remove them when simplifying the code. But trying to read intent into old source is an exercise in futility.
The extra space in args[0]) ) is pretty odd looking, too, as it is unmatched in the opening paren.
Here they are used for grouping. What's inside one of those expression if of type Float, so you can apply the method floatValue() to the whole content of the parenthesis.
They could be removed here as there is no ambiguity. They would have been mandatory with an expression using another operator of higher preseance order. But according to the docs, there is no such operator, the dot/projector has highest priority. So they are really useless here.
Regards,
Stéphane
I've been writing Java for the last couple of years , and now I've started to write in python (in addition).
The problem is that when I look at my Python code it looks like someone tried to hammer Java code into a python format , and it comes out crappy because - well , python ain't Java.
Any tips on how to escape this pattern of "Writing Java in Python"?
Thanks!
You might consider immersing yourself in the Python paradigms. The best way is to first know what they are then explore the best practices by reading some literature and reviewing some code samples. I recommend Learning Python by Mark Lutz; great for beginners and advanced users.
You'll do yourself a great injustice if you program with Python and fail to leverage all of the built-in, developer-friendly, Pythonic syntax.
As my French teacher used to say, "French isn't just English with different words."
If you are new to Python and coming from Java (or C#, or other similar statically typed OO language), these classic articles from PJ Eby and Ryan Tomayko are necessary reading:
Python Is Not Java (PJE)
Java is not Python, either (PJE)
Python Interfaces are not Java Interfaces (PJE)
The Static Method Thing (Tomayko)
Getters/Setters/Fuxors (Tomayko)
You could start by reading The Zen of Python. It'll give you some insight into how Python code is supposed to be written, provided you understand the language enough to understand what it's talking about. :-)
Some of the major ways in which Python differs from C/Java-like languages are:
List comprehensions.
Support for functional programming.
The use of certain Pythonic constructs instead of similar C-like constructs although both seem to work (list comprehensions can be argued to be a part of this, but there are others).
There are others, but these are the main ones that bugged me when I first started Python (and I had come from years of Java like you).
Before using any of these, it is helpful to understand why you should go for pythonic code rather than the usual C/Java way in Python, although both give you the same output.
For starters, Python provides some powerful features not available in C/Java that makes your code much clearer and simpler (although this is subjective, and might not look any better to someone coming from Java at first). The first two points fall into this category.
For example, support for functions as first class objects and closures makes it easy to do things that would need all kinds of weird acrobatics with inner classes in Java.
But a major reason is that Python is an interpreted language, and certain constructs are much faster than the equivalent C/Java-like code. For example, list comprehensions are usually a lot faster than an equivalent for-loop that iterates over the indices of a list and accesses each item by index. This is a very objective benefit, and IMHO a lot of the "Python in way too slow" way of thinking comes from using Java-style code shoe-horned into Python.
One of the best ways to learn about pythonic code is to read other people's code. I actually learnt a lot by looking at Python code posted in answers to SO questions. These often come with explanations and it is usually obvious why it is better than non-pythonic code (speed, clarity, etc.).
Edit:
Of course, there are other ways of getting other people's code. You can also download and look through the code of any good open source Python project. Books are also a good resource, I would recommend O'Reilly Python Cookbook. It has lots of useful code examples and very detailed explanations.
1) Python supports many (but not all) aspects of
object-oriented programming; but it is
possible to write a Python program without
making any use of OO concepts.
1) Java supports only object-oriented
programming.
2) Python is designed to be used interpretively.
A Python statement may be entered at the
interpreter prompt
(>>>)
, and will be executed
immediately. (Implementations make some
use of automatic compilation into bytecodes
(.pyc files).
2) Programs written in Java must be explicitly
compiled into bytecodes (.class files),
though an IDE may do this automatically in a
way that is transparent to the user. Java does
not support direct execution of statements -
though there are tools like Dr. Java that
support this.
3) Python is dynamically typed:
• A variable is introduced by assigning a
value to it. Example:
someVariable = 42
• A variable that has been assigned a value of
a given type may later be assigned a value
of a different type. Example:
someVariable = 42
someVariable = 'Hello, world'
3) Java is
statically typed
:
• A variable must be explicitly declared to be
of some type before assigning a value to it,
though declaration and assignment may be
done at the same time. Examples:
int someVariable;
int someVariable = 42;
• A variable that has been declared to be of a
particular type may not be assigned a value
of a different type.
4) Python supports the following built-in data
types:
Plain integers (normally 32-bit integers in
the range -2147483648 through
2147483647).
• Long integers (size limited only by memory
size of the machine running on)
• Booleans (False and True).
• Real numbers.
• Complex numbers.
In addition, Python supports a number of
types that represent a collection of values -
including strings, lists, and dictionaries.
4) Java has two kinds of data types: primitive
types and reference types. Java supports the
following primitive data types:
• byte - 8-bit integers
• short - 16-bit integers
• int - 32-bit integers
• long - 64-bit integers (Java also supports a
class java.math.BigInteger to represent
integers whose size is limited only by
memory)
• float - 32-bit real numbers.
• double - 32-bit real numbers.
• boolean - (false and true).
• char - a single character.
In addition, Java supports arrays of any type
as the reference types, and the API includes
the class String and a large number of classes
used for collections of values.
5)
Python is line-oriented:
statements end at the
end of a line unless the line break is explicitly
escaped with . There is no way to put more
than one statement on a single line.
Examples:
this is a statement
this is another statement
this is a long statement that extends over more \
than one line
5)
Statements in Java always end with a
semicolon (;)
. It is possible for a statement to
run over more than one line, or to have
multiple statements on a single line.
Examples:
this is a statement;
this is another statement;
this is a long statement that extends over more
than one line;
a statement; another; another;
6)
Python comments begin with #
and extend to
the end of the line. Example:
This is a comment
A new statement starts here
6) Java has two kinds of comments. A comment
beginning with // extend to the end of the
line (like Python comments). Comments can
also begin with /* and end with */. These
can extend over multiple lines or be
embedded within a single line. Examples:
// This is a comment
A new statement starts here
/* This is also a comment */
/* And this is also a comment, which is
long enough to require several lines
to say it. */
Statement starts /* comment */ then continues
7) Python strings can be enclosed in either single
or double quotes (' or ""). A character is
represented by a string of length 1. Examples:
'This is a string'
"This is also a string" # Equivalent
'c' # A string
"c" # An equivalent string
Python uses the following operators for
constructing compound boolean expressions:
and, or and not. Example:
not(x > 0 and y > 0) or z > 0
7) Java strings must be enclosed in double
quotes (""). A character is a different type of
object and is enclosed in single quotes (').
Examples:
"This is a String"
'c' // A character, but not a String
Java uses the following operators for
constructing compound boolean expressions:
&&, ||, ! and ^ (meaning exclusive or)
Example:
! (x > 0 && y > 0) || z > 0 ^ w > 0
8) In Python, the comparison operators
(>, <, >=, <=, == and !=) can be applied to numbers
,
strings, and other types of objects), and
compare values in some appropriate way (e.g.
numeric order, lexical order) where possible.
8) In Java, most of the comparison operators
( >, <, >=, and <=) can be applied only to
primitive types. Two (== and !=) can be
applied to any object, but when applied to
reference types they test for same (different)
object rather than same (different) value.
9) There is no universally-accepted Python
convention for naming classes, variables,
functions etc.
9) By convention, most names in Java use mixed
case. Class names begin with an uppercase
letter; variable and function names begin with
a lowercase letter. Class constants are named
using all uppercase letters with underscores.
Examples:
AClassName
aVariableName
aFunctionName()
A_CLASS_CONSTANT
10) Python definite looping statements have the
form for variable in expression: Example:
for p in pixels:
something
10) Java has two kinds of definite looping
statements. One has the form
for (variable in collection) Example:
for (p in pixels)
something;
11) Python uses the built-in function range() with
for to loop over a range of integers.
Examples:
for i in range(1, 10)
something
(i takes on values 1, 2, 3, 4, 5, 6, 7, 8, 9)
for i in range(1, 10, 2)
something
(i takes on values 1, 3, 5, 7, 9)
11) Java uses a different form of the for to loop
over a range of integers. Examples:
for (int i = 1; i < 10; i ++)
something;
(i takes on values 1, 2, 3, 4, 5, 6, 7, 8, 9)
for (int i = 1; i < 10; i += 2)
something;
(i takes on values 1, 3, 5, 7, 9)
12) Python conditional statements have the form
if condition: and an optional else part has the
form else:. The form elif condition: is
allowed as an alternative to an else:
immediately followed by an if. Examples:
if x < 0:
something
if x < 0:
something
else:
something different
if x < 0:
something
elif x > 0:
something different
else:
yet another thing
12) Java conditional statements have the form
if (condition) and an optional else part has
the form else (no colon) There is no elif
form - else if is used directly. Examples:
if (x < 0)
something;
if (x < 0)
something;
else
something different;
if (x < 0)
something;
else if (x > 0)
something different;
else
yet another thing;
13) The scope of a Python conditional or looping
statement is denoted by indentation. (If
multiple lines are to be included, care must be
used to be sure every line is indented
identically). Examples:
if x < 0:
do something
do another thing regardless of the value of x
if x < 0:
do something
do something else
do yet a third thing
do another thing regardless of the value of x
13) The scope of a Java conditional or looping
statement is normally just the next statement.
Indentation is ignored by the compiler
(though stylistically it is still highly desirable
for the benefit of a human reader). If
multiple lines are to be included, the scope
must be delimited by curly braces ({ , }).
(Optionally, these can be used even if the
scope is a single line.) Examples:
if (x < 0)
do something;
do another thing regardless of the value of x;
if (x < 0)
do something; // Bad style-don't do this!
do another thing regardless of the value of x;
if (x < 0)
{
do something;
do something else;
do yet a third thing;
}
do another thing regardless of the value of x;
if (x < 0)
{
do something;
}
do another thing regardless of the value of x;
If you want to see some fairly idiomatic Python that does non-trivial stuff, there's Dive Into Python, although Dive Into Python 3 is newer and might be a better source of style tips. If you're looking more for some points to review, there's Code Like a Pythonista.
You could post your code at Refactor my code to see if someone can show you a more Pythonic way to do it.
Definitely not a panacea but I think you should try some code golf in Python. Obviously nobody should write "golfed" code IRL, but finding the most terse way to express something really forces you to exploit the built in functionality of the language.
Someone provided me with this list of how "Python is not Java" when I started Python after Java, and it was very helpful.
Also, check out this similar SO question that I posted a short time ago when in a similar position.
Try to find algorithms that you understand well and see how they are implemented in python standard libraries.
Persist. :)
Learn a few other languages. It will help you make the difference between algorithms (the structure of processing, unchanged between languages) and the local syntaxic features of the language. Then you can "write Foo in Bar" for any combination of languages "Foo" and "Bar".
Eat Python, Sleep Python and Drink Python. That is the only way........
This is useful if you want to understand how to code to python in a more pythonic or correct way: http://www.python.org/dev/peps/pep-0008/