Iam trying to implement the buffer overrun problem of a C program using java with the help of eclipse CDT .
By giving a constant value as the array subscript, its working fine as I expected.
See the sample code:
CASTArraySubscriptExpression exprsn = (CASTArraySubscriptExpression)astName.getParent().getParent();
String size = exprsn.getSubscriptExpression().toString();
System.out.println("Size : " + size);
Using this code, able to detect the array subscript value of the below code:
int a[10];
a[12] = 4;//Here it detect the buffer overrun problem.
But if I give like this:
int a[10];
int i = 21;
a[i] = 4;
Here, not able to detect the value of the index i.
How I can detect the value using CDT?
In the second case it is not enough to just look at the AST of a subscript expression to detect an error, you need at least some basic data flow analysis.
However, according to CDT/designs/StaticAnalysis building a data flow graph is planned as future work, so you either have to do it yourself or wait until it is implemented.
As a simple solution for the special case like your second example, when you have a local variable reference in the subscript, you can check if the variable is not used anywhere in the AST between the initialization and its use in subscript.
Related
I am writing my first app in Android Studio, I am a self-taught novice. When I write data into the subscript of an array I have created as a user-defined class the value is written into an adjacent subscript as well! Have traced to some code where I move data down one position in the array, thought I could do this in one operation, but it seems this messes something up and I need to copy each member of the class individually.
Here is my class
class LeaderboardClass
{
public String DateTime;
public String UserName;
public long Milliseconds; //0 denotes not in use
}
Here is my array declaration
LeaderboardClass[] LeaderboardData = new LeaderboardClass[LeaderboardEntries];
I want to move some data from subscript j to subscript j+1
I tried
LeaderboardData[j + 1] = LeaderboardData[j];
I thought this would copy all the data from subscript j to j+1
Then when I subsequently write to the array I (subscript i) I get the correct entry I made, plus a duplicate entry in the adjacent subscript i+1.
When I rewrite the above to:
LeaderboardData[j + 1].UserName = LeaderboardData[j].UserName;
LeaderboardData[j + 1].DateTime = LeaderboardData[j].DateTime;
LeaderboardData[j + 1].Milliseconds = LeaderboardData[j].Milliseconds;
Everything else behaves as expected. So I was wondering exactly what is happening with my first (presumably incorrect) code?
Thanks.
In Java, there's a difference between primitive values and objects (instances of classes): Primitives are stored by value whereas objects are stored by reference. This means that your code would work as you expect if you were using integers. However, since you are using a class, the array merely stores the references to those objects. Hence, when you do LeaderboardData[j + 1] = LeaderboardData[j]; you are merely copying the reference of that object. Therefore, LeaderboardData[j + 1]and LeaderboardData[j] will point to the same object.
Sidenote: If you run your program with a debugger, you can actually see this in action:
The number behind the # denotes the reference number and if you look closely, you can see that the objects at indices 8 and 9 both have the reference #716.
To fix this, I would suggest that you use lists instead of arrays as they allow you to remove and add new entries. The standard list implementation is an ArrayList but in your use-case, a LinkedList might be more efficient.
Lastly, a closing notes on your code: For variable names (like DateTime, UserName or LeaderboardData should always start with a lowercase letter to distinguish them from classes. That way, you can avoid lots of confusion - especially because Java also has a built-in class called DateTime.
In my book there is an example which explains the differences between arrays in Java and C.
In Java we can create an array by writing:
int[] a = new int[5];
This just allocates storage space on the stack for five integers and we can access them exactly as we would have done in Java
int a[5] = {0};
int i;
for (i = 0, i < 5; i++){
printf("%2d: %7d\n", i, a[i]);
}
Then the author says the following
Of course our program should not use a number 5 as we did on several places in the example, instead we use a constant. We can use the C preprocessor to do this:
#define SIZE 5
What are advantages of defining a constant SIZE 5?
Using a named constant is generally considered good practice because if it is used in multiple places, you only need to change the definition to change the value, rather than change every occurrence - which is error prone.
For example, as mentioned by stark in the comments, it is likely that you'll want to loop over an array. If the size of the array is defined by a named constant called SIZE, then you can use that in the loop bounds. Changing the size of the array then only requires changing the definition of SIZE.
There is also the question of whether #define is really the right solution.
To borrow another comment, from Jonathan Leffer: see static const vs #define vs enum for a discussion of different ways of naming constants. While modern C does allow using a variable as an array size specifier, this technically results in a variable-length array which may incur a small overhead.
You should use a constant, because embedding magic numbers in code makes it harder to read and maintain. For instance, if you see 52 in some code, you don't know what it is. However, if you write #define DECKSIZE 52, then whenever you see DECKSIZE, you know exactly what it means. In addition, if you want to change the deck size, say 36 for durak, you could simply change one line, instead of changing every instance throughout the code base.
Well, imagine that you create a static array of 5 integer just like you did int my_arr [5]; ,you code a whole programm with it, but.. suddenly you realise that maybe you need more space. Imagine that you wrote a code of 6-700 lines, you MUST replace every occurence of you array with the fixed number of your choice. Every for loop, and everything that is related with the size of this array. You can avoid all of this using the preprocessor command #define which will replace every occurence of a "keyword" with the content you want, it's like a synonymous for something. Eg: #define SIZE 5 will replace in your code every occurence of the word SIZE with the value 5.
I find comments here to be superflous. As long as you use your constant (5 in this case) only once, it doesn't matter where it is. Moreover, having it in place improves readability. And you certainly do not need to use the constant in more than one place - afterall, you should infer the size of array through sizeof operator anyways. The benefit of sizeof approach is that it works seamlessly with VLAs.
The drawback of global #define (or any other global name) is that it pollutes global namespace. One should understand that global names is a resource to be used conservatively.
#define SIZE 5
This looks like an old outdated way of declaring constants in C code that was popular in dinosaur era. I suppose some lovers of this style are still alive.
The preferred way to declare constants in C languages nowadays is:
const int kSize = 5;
We've tried using the MillerUpdatingRegression class in one of our projects and ran into an issue. After creating an instance of the class, providing the number of variables to expect and adding observations from the entire sample set, we call the "regress(int[])" method, informing the regression process which variables we'd like to include (a subset of the entire predictor set).
When we do this, we receive an ArrayIndexOutOfBounds exception during the process because the number of variables to expect (nvars, provided when the MillerUpdatingRegression class was instantiated) is less than the number of variables passed to the "regress(int[])" method. Our understanding was that this array of integers could be a subset of the predictor indices from all observations.
Does anyone know what we're missing here?
==== Updated with Code ====
double predictorData[][] = new double[n][125];
double predictions[] = new double[n];
//predictorData is a [n x 125] two-dimensional array of
//features/predictors with n samples and 125 predictors
//predictionsArray is a n-length array of predictions
//for the sample set
int numberOfPredictorVariables = 125;
boolean includeConstantWhenBuildingModel = true;
MillerUpdatingRegression regression = new MillerUpdatingRegression(numberOfPredictorVariables,includeConstantWhenBuildingModel);
regression.addObservations(predictorData,predictionsArray)
int predictorsToIncludeInRegression[] = {0,3,9,11};
regression.regress(predictorsToIncludeInRegression);
//this is where the ArrayIndexOutOfBounds exception is generated
I can just guess here without a complete code example, but the number of observations must must be larger than the number of variables (which is 125 in your example).
To be more precisely, the n in your code must be larger than 125 for for the regression to work. The number of predictors passed into the regress method can be less than that.
Im currently using a program called KNIME, which is used for analysing data. For some of my data, I want each row in a column to be averaged with the value in the previous row. The 'java snippet' option requires a 'global value declaration' and a 'method body'. The column name is 'new acc'.
I understand to use this program more efficiently I'll probably need to learn simple java (and its on my to do), but just for this evening I would like a quick check on some of the data used.
Any help is really appreciated - ive attached an image of the layout.
Thanks!
If you aren't required to use the Java Snippet, I'd recommend the Math Formula node.
There's a Moving Average Node which might be suitable for the task.
What about putting
double acc = Double.NaN;
to the global area, and something like this to the method body:
if (Double.isNaN(acc)) {
acc = $z$;
return $z$;
} else {
double avg = (acc + $z$) / 2;
acc = $z$;
return avg;
}
As a partial answer to the one from Sylvansight, it should be noted that the Java Snippet node is executed on a per row basis, so it's not even possible to use the Java Snippet node to access the values in the previous or subsequent rows.
Math formula node fits better your problem, but if you want to use **java snippet node (simple) ** just put the formula in the return (using normal java sintax). return 1+9;
I have updated this question(found last question not clear, if you want to refer to it check out the reversion history). The current answers so far do not work because I failed to explain my question clearly(sorry, second attempt).
Goal:
Trying to take a set of numbers(pos or neg, thus needs bounds to limit growth of specific variable) and find their linear combinations that can be used to get to a specific sum. For example, to get to a sum of 10 using [2,4,5] we get:
5*2 + 0*4 + 0*5 = 10
3*2 + 1*4 + 0*5 = 10
1*2 + 2*4 + 0*5 = 10
0*2 + 0*4 + 2*5 = 10
How can I create an algo that is scalable for large number of variables and target_sums? I can write the code on my own if an algo is given, but if there's a library avail, I'm fine with any library but prefer to use java.
One idea would be to break out of the loop once you set T[z][i] to true, since you are only basically modifying T[z][i] here, and if it does become true, it won't ever be modified again.
for i = 1 to k
for z = 0 to sum:
for j = z-x_i to 0:
if(T[j][i-1]):
T[z][i]=true;
break;
EDIT2: Additionally, if I am getting it right, T[z][i] depends on the array T[z-x_i..0][i-1]. T[z+1][i] depends on T[z+1-x_i..0][i-1]. So once you know if T[z][i] is true, you only need to check one additional element (T[z+1-x_i][i-1]) to know if T[z+1][i-1] will be true.
Let's say you represent the fact whether T[z][i] was updated by a variable changed. Then, you can simply say that T[z][i] = changed && T[z-1][i]. So you should be done in two loops instead of three. This should make it much faster.
Now, to scale it - Now that T[z,i] depends only on T[z-1,i] and T[z-1-x_i,i-1], so to populate T[z,i], you do not need to wait until the whole (i-1)th column is populated. You can start working on T[z,i] as soon as the required values are populated. I can't implement it without knowing the details, but you can try this approach.
I take it this is something like unbounded knapsack? You can dispense with the loop over c entirely.
for i = 1 to k
for z = 0 to sum
T[z][i] = z >= x_i cand (T[z - x_i][i - 1] or T[z - x_i][i])
Based on the original example data you gave (linear combination of terms) and your answer to my question in the comments section (there are bounds), would a brute force approach not work?
c0x0 + c1x1 + c2x2 +...+ cnxn = SUM
I'm guessing I'm missing something important but here it is anyway:
Brute Force Divide and Conquer:
main controller generates coefficients for say, half of the terms (or however many may make sense)
it then sends each partial set of fixed coefficients to a work queue
a worker picks up a partial set of fixed coefficients and proceeds to brute force its own way through the remaining combinations
it doesn't use much memory at all as it works sequentially on each valid set of coefficients
could be optimized to ignore equivalent combinations and probably many other ways
Pseudocode for Multiprocessing
class Controller
work_queue = Queue
solution_queue = Queue
solution_sets = []
create x number of workers with access to work_queue and solution_queue
#say for 2000 terms:
for partial_set in coefficient_generator(start_term=0, end_term=999):
if worker_available(): #generate just in time
push partial set onto work_queue
while solution_queue:
add any solutions to solution_sets
#there is an efficient way to do this type of polling but I forget
class Worker
while true: #actually stops when a stop work token is received
get partial_set from the work queue
for remaining_set in coefficient_generator(start_term=1000, end_term=1999):
combine the two sets (partial_set.extend(remaining_set))
if is_solution(full_set):
push full_set onto the solution queue