String naming convention in java - java

I am currently trying to make a naming convention. The idea behind this is parsing.
Lets say I obtain an xml doc. Everything can be used once, but these 2 in the code below can be submitted several times within the xml document. It could be 1, or simply 100.
This states that ItemNumber and ReceiptType will be grabbed for the first element.
ItemNumber1 = eElement.getElementsByTagName("ItemNumber").item(0).getTextContent();
ReceiptType1 = eElement.getElementsByTagName("ReceiptType").item(0).getTextContent();
This one states that it will grab the second submission if they were in their twice.
ItemNumber2 = eElement.getElementsByTagName("ItemNumber").item(1).getTextContent();
ReceiptType2 = eElement.getElementsByTagName("ReceiptType").item(1).getTextContent();
ItemNumber and ReceiptType must both be submitted together. So if there is 30 ItemNumbers, there must be 30 Receipt Types.
However now I would like to set this in an IF statement to create variables.
I was thinking something along the lines of:
int cnt = 2;
if (eElement.getElementsByTagName("ItemNumber").item(cnt).getTextContent();)
**MAKE VARIABLE**
Then make a loop which adds one to count to see if their is a third or 4th. Now here comes the tricky part..I need them set to a generated variable. Example if ItemNumber 2 existed, it would set it to
String ItemNumber2 = eElement.getElementsByTagName("ItemNumber").item(cnt).getTextContent();
I do not wish to make pre-made variable names as I don't want to code a possible 1000 variables if that 1000 were to happen.
KUDOS for anyone who can help or give tips on just small parts of this as in the naming convention etc. Thanks!

You don't know beforehand how many ItemNumbers and ReceiptTypes you'll get ? Maybe consider using two Lists (java.util.List). Here is an example.
boolean finished = ... ; // true if there is no more item to process
List<String> listItemNumbers = new ArrayList<>();
List<String> listReceiptTypes = new ArrayList<>();
int cnt = 0;
while(!finished) {
String itemNumber = eElement.getElementsByTagName("ItemNumber").item(cnt).getTextContent();
String receiptType = eElement.getElementsByTagName("ReceiptType").item(cnt).getTextContent();
listItemNumbers.add(itemNumber);
listReceiptTypes.add(receiptType);
++cnt;
// update 'finished' (to test if there are remaining itemNumbers to process)
}
// use them :
int indexYouNeed = 32; // for example
String itemNumber = listItemNumbers.get(indexYouNeed); // index start from 0
String receiptType = listReceiptTypes.get(indexYouNeed);

Related

How to generate 1000 unique email-ids using java

My requirement is to generate 1000 unique email-ids in Java. I have already generated random Text and using for loop I'm limiting the number of email-ids to be generated. Problem is when I execute 10 email-ids are generated but all are same.
Below is the code and output:
public static void main() {
first fr = new first();
String n = fr.genText()+"#mail.com";
for (int i = 0; i<=9; i++) {
System.out.println(n);
}
}
public String genText() {
String randomText = "abcdefghijklmnopqrstuvwxyz";
int length = 4;
String temp = RandomStringUtils.random(length, randomText);
return temp;
}
and output is:
myqo#mail.com
myqo#mail.com
...
myqo#mail.com
When I execute the same above program I get another set of mail-ids. Example: instead of 'myqo' it will be 'bfta'. But my requirement is to generate different unique ids.
For Example:
myqo#mail.com
bfta#mail.com
kjuy#mail.com
Put your String initialization in the for statement:
for (int i = 0; i<=9; i++) {
String n = fr.genText()+"#mail.com";
System.out.println(n);
}
I would like to rewrite your method a little bit:
public String generateEmail(String domain, int length) {
return RandomStringUtils.random(length, "abcdefghijklmnopqrstuvwxyz") + "#" + domain;
}
And it would be possible to call like:
generateEmail("gmail.com", 4);
As I understood, you want to generate unique 1000 emails, then you would be able to do this in a convenient way by Stream API:
Stream.generate(() -> generateEmail("gmail.com", 4))
.limit(1000)
.collect(Collectors.toSet())
But the problem still exists. I purposely collected a Stream<String> to a Set<String> (which removes duplicates) to find out its size(). As you may see, the size is not always equals 1000
999
1000
997
that means your algorithm returns duplicated values even for such small range.
Therefore, you'd better research already written email generators for Java or improve your own (for example, by adding numbers, some special characters that, in turn, will generate a plenty of exceptions).
If you are planning to use MockNeat, the feature for implementing email strings is already implemented.
Example 1:
String corpEmail = mock.emails().domain("startup.io").val();
// Possible Output: tiptoplunge#startup.io
Example 2:
String domsEmail = mock.emails().domains("abc.com", "corp.org").val();
// Possible Output: funjulius#corp.org
Note: mock is the default "mocking" object.
To guarantee uniqueness you could use a counter as part of the email address:
myqo0000#mail.com
bfta0001#mail.com
kjuy0002#mail.com
If you want to stick to letters only then convert the counter to base 26 representation using 'a' to 'z' as the digits.

Finding the index of a string with specific requirements?

I'm a bit of a newbie to programming. :P
I'm working with Processing right now to create a table of subjects with their ID, Title and Availability.
I have a 2D array that contains information like this:
units[0][0] = "CAKE100"; //Subject ID
units[1][0] = "Eating Cake And Baking Too"; //Subject Title
units[2][0] = "November"; //Subject Availability
units[0][1] = "TACO204"; //Subject ID
units[1][1] = "Tacos And Other Delicious Things"; //Subject Title
units[2][1] = "April"; //Subject Availability
units[0][2] = "KITC102"; //Subject ID
units[1][2] = "Kitchen Safety"; //Subject Title
units[2][2] = "June"; //Subject Availability
I'm trying to filter through the unit[0][x] section to find the index location of every Subject ID that has "1" in the fourth position of the string.
For example, I want to return [0] [0] and [0] [2], because "CAKE100" and "KITC102" both have "1" in the fourth position.
I've tried to use indexOf or .substring but for some reason I can't figure it out.
EDIT:
Not sure how much help it will be but here's my butchered code:
void checkLevel100() {
for ( int j = 0; j < units.length; j++) {
position = 0;
// position = units[0][j].indexOf("1"); //This returns 4;
if (units[0][j].substring(4) == "1") { //This doesn't run at all, and so it returns 0.
position = j;
}
fill(0);
text(position, width/2, height/2);
}
}
I also did what Kevin Workman suggested. Here is the code for that:
for (int i = 0; i < units.length; i++) {
if(units[0][i] == "TACO204"){ //This results in 1, as expected
location[i] = i;
println(i);
}
Once again, thank you for your time :)
You need to break your problem down into smaller steps.
Step 1: Loop over your 2D array. You might use a nested for loop for this.
To test that this step works, you might just print every element in your 2d array before worrying about any logic.
Step 2: Write an if statement that checks whether the value at that index passes your test.
To test that this step works, you might want to create a separate program that just tests a hardcoded value instead of using arrays.
Step 3: Once you have the first two steps working, then you can save the indexes that pass your test into some kind of data structure. You might us an ArrayList<Integer> for this.
Create a separate example program that just loops over an array without worrying about any logic. Create another separate program that just uses an if statement to test your condition against a hard-coded value. Then if you get stuck on a specific step, you can post a more specific question along with an MCVE. Good luck.
I think it would be simpler for you to store your subject's values as an object of a class. Perhaps store the numerical portion of the ID as an int too? Using a class will enable you to write more human readable code.
public class FoodSubject {
String stringID;
int numID;
String title;
String availability;
public FoodSubject(String stringID, int numID,
String title, String availability) {
this.stringID = stringID;
this.numID = numID;
this.title = title;
this.availability = availability;
}
}
To make a new array of your FoodSubject objects use this code:
FoodSubject[] subjectArray = new FoodSubject[3];
subjectArray[0] = new FoodSubject("CAKE", 100, "Eating Cake And Baking Too", "November");
Access the values like this subjectArray[0].numID
Look into overriding the toString() method in your class to print out all the values of your subject easily. This will help you identify problems in the rest of your code much easier!
If you still want to store the entire ID as a String use the charAt(int index) function to test if the character is a '1'.
This if (units[0][j].substring(4) == "1") { code is incorrect. Look at the String Java documentation for information on what substring does.

Ektorp CouchDb: Query for pattern with multiple contains

I want to query multiple candidates for a search string which could look like "My sear foo".
Now I want to look for documents which have a field that contains one (or more) of the entered strings (seen as splitted by whitespaces).
I found some code which allows me to do a search by pattern:
#View(name = "find_by_serial_pattern", map = "function(doc) { var i; if(doc.serialNumber) { for(i=0; i < doc.serialNumber.length; i+=1) { emit(doc.serialNumber.slice(i), doc);}}}")
public List<DeviceEntityCouch> findBySerialPattern(String serialNumber) {
String trim = serialNumber.trim();
if (StringUtils.isEmpty(trim)) {
return new ArrayList<>();
}
ViewQuery viewQuery = createQuery("find_by_serial_pattern").startKey(trim).endKey(trim + "\u9999");
return db.queryView(viewQuery, DeviceEntityCouch.class);
}
which works quite nice for looking just for one pattern. But how do I have to modify my code to get a multiple contains on doc.serialNumber?
EDIT:
This is the current workaround, but there must be a better way i guess.
Also there is only an OR logic. So an entry fits term1 or term2 to be in the list.
#View(name = "find_by_serial_pattern", map = "function(doc) { var i; if(doc.serialNumber) { for(i=0; i < doc.serialNumber.length; i+=1) { emit(doc.serialNumber.slice(i), doc);}}}")
public List<DeviceEntityCouch> findBySerialPattern(String serialNumber) {
String trim = serialNumber.trim();
if (StringUtils.isEmpty(trim)) {
return new ArrayList<>();
}
String[] split = trim.split(" ");
List<DeviceEntityCouch> list = new ArrayList<>();
for (String s : split) {
ViewQuery viewQuery = createQuery("find_by_serial_pattern").startKey(s).endKey(s + "\u9999");
list.addAll(db.queryView(viewQuery, DeviceEntityCouch.class));
}
return list;
}
Looks like you are implementing a full text search here. That's not going to be very efficient in CouchDB (I guess same applies to other databases).
Correct me if I am wrong but from looking at your code looks like you are trying to search a list of serial numbers for a pattern. CouchDB (or any other database) is quite efficient if you can somehow index the data you will be searching for.
Otherwise you must fetch every single record and perform a string comparison on it.
The only way I can think of to optimize this in CouchDB would be the something like the following (with assumptions):
Your serial numbers are not very long (say 20 chars?)
You force the search to be always 5 characters
Generate view that emits every single 5 char long substring from your serial number - more or less this (could be optimized and not sure if I got the in):
...
for (var i = 0; doc.serialNo.length > 5 && i < doc.serialNo.length - 5; i++) {
emit([doc.serialNo.substring(i, i + 5), doc._id]);
}
...
Use _count reduce function
Now the following url:
http://localhost:5984/test/_design/serial/_view/complex-key?startkey=["01234"]&endkey=["01234",{}]&group=true
Will return a list of documents with a hit count for a key of 01234.
If you don't group and set the reduce option to be false, you will get a list of all matches, including duplicates if a single doc has multiple hits.
Refer to http://ryankirkman.com/2011/03/30/advanced-filtering-with-couchdb-views.html for the information about complex keys lookups.
I am not sure how efficient couchdb is in terms of updating that view. It depends on how many records you will have and how many new entries appear between view is being queried (I understand couchdb rebuilds the view's b-tree on demand).
I have generated a view like that that splits doc ids into 5 char long keys. Out of over 1K docs it generated over 30K results - id being 32 char long, simple maths really: (serialNo.length - searchablekey.length + 1) * docscount).
Generating the view took a while but the lookups where fast.
You could generate keys of multiple lengths, etc. All comes down to your records count vs speed of lookups.

Null Pointer that makes no sense to me?

Im currently working on a program and any time i call Products[1] there is no null pointer error however, when i call Products[0] or Products[2] i get a null pointer error. However i am still getting 2 different outputs almost like there is a [0] and 1 or 1 and 2 in the array. Here is my code
FileReader file = new FileReader(location);
BufferedReader reader = new BufferedReader(file);
int numberOfLines = readLines();
String [] data = new String[numberOfLines];
Products = new Product[numberOfLines];
calc = new Calculator();
int prod_count = 0;
for(int i = 0; i < numberOfLines; i++)
{
data = reader.readLine().split("(?<=\\d)\\s+|\\s+at\\s+");
if(data[i].contains("input"))
{
continue;
}
Products[prod_count] = new Product();
Products[prod_count].setName(data[1]);
System.out.println(Products[prod_count].getName());
BigDecimal price = new BigDecimal(data[2]);
Products[prod_count].setPrice(price);
for(String dataSt : data)
{
if(dataSt.toLowerCase().contains("imported"))
{
Products[prod_count].setImported(true);
}
else{
Products[prod_count].setImported(false);
}
}
calc.calculateTax(Products[prod_count]);
calc.calculateItemTotal(Products[prod_count]);
prod_count++;
This is the output :
imported box of chocolates
1.50
11.50
imported bottle of perfume
7.12
54.62
This print works System.out.println(Products[1].getProductTotal());
This becomes a null pointer System.out.println(Products[2].getProductTotal());
This also becomes a null pointer System.out.println(Products[0].getProductTotal());
You're skipping lines containing "input".
if(data[i].contains("input")) {
continue; // Products[i] will be null
}
Probably it would be better to make products an ArrayList, and add only the meaningful rows to it.
products should also start with lowercase to follow Java conventions. Types start with uppercase, parameters & variables start with lowercase. Not all Java coding conventions are perfect -- but this one's very useful.
The code is otherwise structured fine, but arrays are not a very flexible type to build from program logic (since the length has to be pre-determined, skipping requires you to keep track of the index, and it can't track the size as you build it).
Generally you should build List (ArrayList). Map (HashMap, LinkedHashMap, TreeMap) and Set (HashSet) can be useful too.
Second bug: as Bohemian says: in data[] you've confused the concepts of a list of all lines, and data[] being the tokens parsed/ split from a single line.
"data" is generally a meaningless term. Use meaningful terms/names & your programs are far less likely to have bugs in them.
You should probably just use tokens for the line tokens, not declare it outside/ before it is needed, and not try to index it by line -- because, quite simply, there should be absolutely no need to.
for(int i = 0; i < numberOfLines; i++) {
// we shouldn't need data[] for all lines, and we weren't using it as such.
String line = reader.readLine();
String[] tokens = line.split("(?<=\\d)\\s+|\\s+at\\s+");
//
if (tokens[0].equals("input")) { // unclear which you actually mean.
/* if (line.contains("input")) { */
continue;
}
When you offer sample input for a question, edit it into the body of the question so it's readable. Putting it in the comments, where it can't be read properly, is just wasting the time of people who are trying to help you.
Bug alert: You are overwriting data:
String [] data = new String[numberOfLines];
then in the loop:
data = reader.readLine().split("(?<=\\d)\\s+|\\s+at\\s+");
So who knows how large it is - depends on the success of the split - but your code relies on it being numberOfLines long.
You need to use different indexes for the line number and the new product objects. If you have 20 lines but 5 of them are "input" then you only have 15 new product objects.
For example:
int prod_count = 0;
for (int i = 0; i < numberOfLines; i++)
{
data = reader.readLine().split("(?<=\\d)\\s+|\\s+at\\s+");
if (data[i].contains("input"))
{
continue;
}
Products[prod_count] = new Product();
Products[prod_count].setName(data[1]);
// etc.
prod_count++; // last thing to do
}

Tips optimizing Java code

So, I've written a spellchecker in Java and things work as they should. The only problem is that if I use a word where the max allowed distance of edits is too large (like say, 9) then my code runs out of memory. I've profiled my code and dumped the heap into a file, but I don't know how to use it to optimize my code.
Can anyone offer any help? I'm more than willing to put up the file/use any other approach that people might have.
-Edit-
Many people asked for more details in the comments. I figured that other people would find them useful, and they might get buried in the comments. Here they are:
I'm using a Trie to store the words themselves.
In order to improve time efficiency, I don't compute the Levenshtein Distance upfront, but I calculate it as I go. What I mean by this is that I keep only two rows of the LD table in memory. Since a Trie is a prefix tree, it means that every time I recurse down a node, the previous letters of the word (and therefore the distance for those words) remains the same. Therefore, I only calculate the distance with that new letter included, with the previous row remaining unchanged.
The suggestions that I generate are stored in a HashMap. The rows of the LD table are stored in ArrayLists.
Here's the code of the function in the Trie that leads to the problem. Building the Trie is pretty straight forward, and I haven't included the code for the same here.
/*
* #param letter: the letter that is currently being looked at in the trie
* word: the word that we are trying to find matches for
* previousRow: the previous row of the Levenshtein Distance table
* suggestions: all the suggestions for the given word
* maxd: max distance a word can be from th query and still be returned as suggestion
* suggestion: the current suggestion being constructed
*/
public void get(char letter, ArrayList<Character> word, ArrayList<Integer> previousRow, HashSet<String> suggestions, int maxd, String suggestion){
// the new row of the trie that is to be computed.
ArrayList<Integer> currentRow = new ArrayList<Integer>(word.size()+1);
currentRow.add(previousRow.get(0)+1);
int insert = 0;
int delete = 0;
int swap = 0;
int d = 0;
for(int i=1;i<word.size()+1;i++){
delete = currentRow.get(i-1)+1;
insert = previousRow.get(i)+1;
if(word.get(i-1)==letter)
swap = previousRow.get(i-1);
else
swap = previousRow.get(i-1)+1;
d = Math.min(delete, Math.min(insert, swap));
currentRow.add(d);
}
// if this node represents a word and the distance so far is <= maxd, then add this word as a suggestion
if(isWord==true && d<=maxd){
suggestions.add(suggestion);
}
// if any of the entries in the current row are <=maxd, it means we can still find possible solutions.
// recursively search all the branches of the trie
for(int i=0;i<currentRow.size();i++){
if(currentRow.get(i)<=maxd){
for(int j=0;j<26;j++){
if(children[j]!=null){
children[j].get((char)(j+97), word, currentRow, suggestions, maxd, suggestion+String.valueOf((char)(j+97)));
}
}
break;
}
}
}
Here's some code I quickly crafted showing one way to generate the candidates and to then "rank" them.
The trick is: you never "test" a non-valid candidate.
To me your: "I run out of memory when I've got an edit distance of 9" screams "combinatorial explosion".
Of course to dodge a combinatorial explosion you don't do thing like trying to generate yourself all words that are at a distance from '9' from your misspelled work. You start from the misspelled word and generate (quite a lot) of possible candidates, but you refrain from creating too many candidates, for then you'd run into trouble.
(also note that it doesn't make much sense to compute up to a Levenhstein Edit Distance of 9, because technically any word less than 10 letters can be transformed into any other word less than 10 letters in max 9 transformations)
Here's why you simply cannot test all words up to a distance of 9 without either having an OutOfMemory error or simply a program never terminating:
generating all the LED up to 1 for the word "ptmizing", by only adding one letter (from a to z) generates already 9*26 variations (i.e. 324 variations) [there are 9 positions where you can insert one out of 26 letters)
generating all the LED up to 2, by only adding one letter to what we know have generates already 10*26*324 variations (60 840)
generating all the LED up to 3 gives: 17 400 240 variations
And that is only by considering the case where we add one, add two or add three letters (we're not counting deletion, swaps, etc.). And that is on a misspelled word that is only nine characters long. On "real" words, it explodes even faster.
Sure, you could get "smart" and generate this in a way not to have too many dupes etc. but the point stays: it's a combinatorial explosion that explodes fastly.
Anyway... Here's an example. I'm simply passing the dictionary of valid words (containing only four words in this case) to the corresponding method to keep this short.
You'll obviously want to replace the call to the LED with your own LED implementation.
The double-metaphone is just an example: in a real spellchecker words that do "sound alike"
despite further LED should be considered as "more correct" and hence often suggest first. For example "optimizing" and "aupteemising" are quite far from a LED point of view, but using the double-metaphone you should get "optimizing" as one of the first suggestion.
(disclaimer: following was cranked in a few minutes, it doesn't take into account uppercase, non-english words, etc.: it's not a real spell-checker, just an example)
#Test
public void spellCheck() {
final String src = "misspeled";
final Set<String> validWords = new HashSet<String>();
validWords.add("boing");
validWords.add("Yahoo!");
validWords.add("misspelled");
validWords.add("stackoverflow");
final List<String> candidates = findNonSortedCandidates( src, validWords );
final SortedMap<Integer,String> res = computeLevenhsteinEditDistanceForEveryCandidate(candidates, src);
for ( final Map.Entry<Integer,String> entry : res.entrySet() ) {
System.out.println( entry.getValue() + " # LED: " + entry.getKey() );
}
}
private SortedMap<Integer, String> computeLevenhsteinEditDistanceForEveryCandidate(
final List<String> candidates,
final String mispelledWord
) {
final SortedMap<Integer, String> res = new TreeMap<Integer, String>();
for ( final String candidate : candidates ) {
res.put( dynamicProgrammingLED(candidate, mispelledWord), candidate );
}
return res;
}
private int dynamicProgrammingLED( final String candidate, final String misspelledWord ) {
return Levenhstein.getLevenshteinDistance(candidate,misspelledWord);
}
Here you generate all possible candidates using several methods. I've only implemented one such method (and quickly so it may be bogus but that's not the point ; )
private List<String> findNonSortedCandidates( final String src, final Set<String> validWords ) {
final List<String> res = new ArrayList<String>();
res.addAll( allCombinationAddingOneLetter(src, validWords) );
// res.addAll( allCombinationRemovingOneLetter(src) );
// res.addAll( allCombinationInvertingLetters(src) );
return res;
}
private List<String> allCombinationAddingOneLetter( final String src, final Set<String> validWords ) {
final List<String> res = new ArrayList<String>();
for (char c = 'a'; c < 'z'; c++) {
for (int i = 0; i < src.length(); i++) {
final String candidate = src.substring(0, i) + c + src.substring(i, src.length());
if ( validWords.contains(candidate) ) {
res.add(candidate); // only adding candidates we know are valid words
}
}
if ( validWords.contains(src+c) ) {
res.add( src + c );
}
}
return res;
}
One thing you could try is, increase the Java's heap size, in order to overcome "out of memory error".
Following article will help you in order to understand how to increase heap size in Java
http://viralpatel.net/blogs/2009/01/jvm-java-increase-heap-size-setting-heap-size-jvm-heap.html
But I think the better approach to address your problem is, find out a better algorithm than the current algorithm
Well without more Information on the topic there is not much the community could do for you... You can start with the following:
Look at what your Profiler says (after it has run a little while): Does anything pile up? Are there a lot of Objects - this should normally give you a hint on what is wrong with your code.
Publish your saved dump somewhere and link it in your question, so someone else could take a look at it.
Tell us which profiler you are using, then somebody can give you hints on where to look for valuable information.
After you have narrowed down your problem to a specific part of your Code, and you cannot figure out why there are so many objects of $FOO in your memory, post a snippet of the relevant part.

Categories

Resources