Parsing out parts of a string in an array in Java

Parsing out parts of a string in an array in Java - java

I'm having trouble finding a solution in parsing out a particular part of each item in an Arraylist.
The Arraylist contains strings in both these formats:
http://some-url.com/that/goes-to-some-place-abc-defg/api/xml
http://some-url.com/that/goes-to-somewhere-zyxw-vut/api/xml
The key point is that the string won't change, the only thing that will be different in each of these is the "abc-defg" and "zyxw-vut". Note that they may be anything of varying length. This is the part I need to parse out to use elsewhere.
Only idea I've had is writing something to parse out everything after the 5th hyphen up to the next "/" for the former and the 4th to the next "/" for the latter.
I'm not sure how to do this though and there's likely a better method I haven't thought of.
Does anyone have any ideas on how to go about doing this?

You may use split("\\/") ;
This will return an array of String objects split along the / separator.
String.split(String regex)
For example for your String : http://some-url.com/that/goes-to-some-place-abc-defg/api/xml
This will return an array of size 7 , the content of the array would be :
http:
some-url.com
that
goes-to-some-place-abc-defg
api
xml
What you want is the index 4 of this array.
Note that index 1 is empty, this comes from the split of the // part.

You could just save the "random" parts in the ArrayList. This would save memory since the rest of the string is always the same:
ArrayList<String> list = new ArrayList<String>();
//Add your random parts
list.add("abc-defg");
list.add("zyxw-vut");
/* ... */
And when you want to get a full link from the list, you use a helper method:
private String getLinkFromList(int index) {
if (list == null || index < 0 || index >= list.size())
return null;
return "http://some-url.com/that/goes-to-somewhere-" + list.get(index) + "/api/xml";
}
A few notes:
This solution also has a few downsides: Memory is always allocated when the getLinkFromList(int) method is called. When you save the full link in the ArrayList and always just use the get(int) method from the ArrayList, you have a slight performance gain. If that list is not larger than a few MegaBytes and your code is only running on (modern) computers, then you should prefer saving the full link.
But when your code is running on an android phone (or when your list is a few GigaBytes large), where memory is still an important thing to consider, then you should make your list as small as possible and use my method as shown above.

Related

How do we compare a list of strings to just a string

I have:
String[] Value={"Available to trade at 12 30","I love sherlock"}
and I want to check if sherlock is present in the list without using for each loop.

Java streams are handy for this
String[] value = {"Available to trade at 12 30", "I love sherlock"};
Stream.of(value).anyMatch(s -> s.contains("sherlock"));
If you want to get the string that has sherlock:
String[] value = {"Available to trade at 12 30", "I love sherlock"};
Stream.of(value).filter(s -> s.contains("sherlock")).findFirst().get();
Or use findAny(), if you don't care about order. Both findAny and findFirst return an Optional, which will be empty if there are no matches and .get() will throw.

You can do something like this
Arrays.asList(Value).contains("string to be searched");
Better option is to convert array to list so that multiple functions can be used

The problem inherently requires you to iterate through all the elements in the array, essentially performing a for each. However, you can choose whether you want good memory performance or good execution time for the lookup.
If you want good memory performance leave it as is and iterate through the list every time you perform the check. For good execution time you should create a hashset and populate it with every substring present in the list. This is a bit time consuming and memory-intensive but once you have built your set you can keep it and reuse, making each runtime check only take log n time.

You could convert the array into a single String and then use the String .contains method.
String entireArray = Arrays.toString(Value);
boolean sherlockPresent = entireArray.contains("sherlock");

1 to 1 association of 2 string lists in java

I am a relatively new programmer and am working on my first project to build a portfolio. In my project I have 2 rather large lists of strings (about 3.1 million) and I need to "associate" the elements in each one with a 1 to 1 relationship from predetermined values (elements are selected according to a set method) not just linearly (from top to bottom). For example:
lista(0) = list1(5);
listb(0) = list2(2);
lista(1) = list1(1);
listb(1) = list2(4);
lista(2) = list1(3);
listb(2) = list2(1);
The point of this is to reorder the lists in a manner that can be recreated at a later time or by a different program by "remembering" a set of values. I am using 2 lists because I need to be able to search one list for a String then pull the value from the corresponding element in the other list.
I have tried many different methods like storing each list in an arrayList then accessing the elements in the preset order and storing them in new arrayLists in the new order, then removing the elements from the old arrayLists. This would be ideal but didn't work because removing elements from a really large arrayList was very slow. I figured that removing an element from the lists will prevent it from being used again.
I tried storing them in String arrays, then accessing each element in the predefined method, storing them in another array then nulling out the elements so that they wont be used again, but creating null spaces made searching a nightmare, because if the program hit a null element during the predefined "move" value, I had to add in checks for nulls, then more movement which made things more complicated and harder to reproduce later.
I need an easy, and efficient way to create these associations between these 2 lists and ANY ideas are welcome.
This is my first post to stackoverflow and I apologize if its formatted improperly or confusing, but please be gentle.

if you need to pull one value from a given string, why not using a map ? The key is the value of the first list and the value is the value of the second list

use Map<String,String> which stores Key as a string and value as a string.And the best part is time complexity of removing an element would be O(1).

As mentioned before, Map is an option.More specifically HashMap, or another option could be Hashtable. Make sure you look at what each has to offer. Some major differences are HashMap allows nulls but it is not synchronized. On the other hand Hashtable is synchronized and does not accept null as key.

Building an inverted index in Java-logic

I have a collection of around 1500 documents. I parsed through each document and extract tokens. These tokens are stored in an hashmap(as key) and the total number of times they occur in the collection (i.e. frequency) is stored as the value.
I have to extend this to build an inverted index. That is, the term(key)| number of documents it occurs it-->DocNo|Frequency in that document. For exmple,
Term DocFreq DocNum TermFreq
data 3 1 12
23 31
100 17
customer 2 22 43
19 2
Currently, I have the following in Java,
hashmap<string,integer>
for(each document)
{
extract line
for(each line)
{
extract word
for(each word)
{
perform some operations
get value for word from hashmap and increment by one
}
}
}
I have to build on this code. I can't really think of a good way to implement an inverted index.
So far, I thought of making value a 2D array. So the term would be the key and the value(i.e 2D array) would store the docId and termFreq.
Please let me know if my logic is correct.

I would do it by using a Map<String, TermFrequencies>. This map would maintain a TermFrequencies object for each term found. The TermFrequencies object would have the following methods:
void addOccurrence(String documentId);
int getTotalNumberOfOccurrences();
Set<String> getDocumentIds();
int getNumberOfOccurrencesInDocument(String documentId);
It would use a Map<String, Integer> internally to associate each document the term occurs in with the number of occurrences of the term in the document.
The algorithm would be extremely simple:
for(each document) {
extract line
for(each line) {
extract word
for(each word) {
TermFrequencies termFrequencies = map.get(word);
if (termFrequencies == null) {
termFrequencies = new TermFrequencies(word);
}
termFrequencies.addOccurrence(document);
}
}
}
The addOccurrence() method would simply increment a counter for the total number of occurrences, and would insert or update the number of occurrences in the internam map.

I think it is best to have two structures: a Map<docnum, Map<term,termFreq>> and a Map<term, Set<docnum>>. Your docFreqs can be read off as set.size in the values of the second map. This solution involves no custom classes and allows a quick retrieval of everything needed.
The first map contains all the informantion and the second one is a derivative that allows quick lookup by term. As you process a document, you fill the first map. You can derive the second map afterwards, but it is also easy to do it in one pass.

I once implemented what you're asking for. The problem with your approach is that it is not abstract enough. You should model Terms, Documents and their relationships using objects. In a first run, create the term index and document objects and iterate over all terms in the documents while populating the term index. Afterwards, you have a representation in memory that you can easily transform into the desired output.
Do not start by thinking about 2d-arrays in an object oriented language. Unless you want to solve a mathematical problem or optimize something it's not the right approach most of the time.

I dont know if this is still a hot question, but I would recommend you to do it like this:
You run over all your documents and give them an id in increasing order. For each document you run over all the words.
Now you have a Hashmap that maps Strings (your words) to an array of DocTermObjects. A DocTermObject contains a docId and a TermFrequency.
Now for each word in a document, you look it up in your HashMap, if it doesn't contain an Array of DocTermObjects you create it, else you look at its very LAST element only (this is important due to runtime, think about it). If this element has the docId that you treat at the moment, you increase the TermFrequency. Else or if the Array is empty, you add a new DocTermObject with your actual docId and set the TermFrequency to 1.
Later you can use this datastructure to compute scores for example. The scores you could also save in the DoctermObjects of course.
Hope it helped :)

IndexOutOfBoundsException - only sometimes?

I keep getting random java.lang.IndexOutOfBoundsException errors on my program.
What am i doing wrong?
The program runs fine, its a really long for loop, but for some elements i seem to be getting that error and then it continues on to the next element and it works fine.
for (int i = 0; i < response.getSegments().getSegmentInfo().size()-1; i++) {
reservedSeats = response.getSegments().getSegmentInfo().get(i).getCabinSummary().getCabinClass().get(i).getAmountOfResSeat();
usedSeats = response.getSegments().getSegmentInfo().get(i).getCabinSummary().getCabinClass().get(i).getAmountOfUsedSeat();
System.out.println("Reserved Seats: " + reservedSeats);
System.out.println("Used Seats : " + usedSeats);
}
How can i prevent this errors?

For those thinking this is an array, it is more likely a list.
Let me guess, you used to be getting ConcurrentModificationExceptions, so you rewrote the loop to use indexed lookup of elements (avoiding the iterator). Congratulations, you fixed the exception, but not the issue.
You are changing your List while this loop is running. Every now and then, you remove an element. Every now and then you look at the last element size()-1. When the order of operations looks like:
(some thread)
remove an element from response.getSegments().getSegmentInfo()
(some possibly other thread)
lookup up the size()-1 element of the above
You access an element that no longer exists, and will raise an IndexOutOfBoundsException.
You need to fix the logic around this List by controlling access to it such that if you need to check all elements, you don't assume the list will be the same as it crosses all elements, or (the much better solution) freeze the list for the loop.
A simple way of doing the latter is to do a copy of the List (but not the list's elements) and iterate over the copy.
--- Edited as the problem dramatically changed in an edit after the above was written ---
You added a lot of extra code, including a few extra list lookups. You are using the same index for all the list lookups, but there is nothing to indicate that all of the lists are the same size.
Also, you probably don't want to skip across elements, odds are you really want to access all of the cabin classes in a segmentInfo, not just the 3rd cabinClass in the 3rd segmentInfo, etc.

You seem to be using i to index into two entirely separate List objects:
response.getSegments().getSegmentInfo().get(i) // indexing into response.getSegments().getSegmentInfo()
.getCabinSummary().getCabinClass().get(i) // indexing into getCabinSummary().getCabinClass()
.getAmountOfResSeat();
This looks wrong to me. Is this supposed to happen this way? And is the list returned by getCabinClass() guaranteed to be at least as long as the one returned by getSegmentInfo()?

You're using i both as an index for the list of segment infos and for the list of cabin classes. This smells like the source of your problem.
I don't know your domain model but I'd expect that we need two different counters here.
Refactored code to show problem (guessed the types, replace with correct class names)
List<SegmentInfo> segmentInfos = response.getSegments().getSegmentInfo();
for (int i = 0; i < segmentInfos.size()-1; i++) {
// use i to get actual segmentInfo
SegmentInfo segmentInfo = segmentInfos.get(i);
List<CabinClass> cabinClasses = segmentInfo.getCabinSummary.getCabinClass();
// use i again to get actual cabin class ???
CabinClass cabinClass = cabinClasses.get(i);
reservedSeats = cabinClass.getAmountOfResSeat();
usedSeats = cabinClass.getAmountOfUsedSeat();
System.out.println("Reserved Seats: " + reservedSeats);
System.out.println("Used Seats : " + usedSeats);
}

Assuming that response.getSegments().getSegmentInfo() always returns an array of the same size, calling .get(i) on it should be safe, given the loop header (but are you aware that you are skipping the last element?) However, are you sure that .getCabinSummary() will return an array that is as large as the getSegmentInfo() array? It looks suspicious that you are using i to perform lookups in two different arrays.
You could split the first line in the loop body into two separate lines (I'm only guessing the type names here):
List<SegmentInfo> segmentInfo = response.getSegments().getSegmentInfo().get(i);
reservedSeats = segmentInfo.getCabinSummary().get(i).getAmountOfResSeat();
Then you'll see which lookup causes the crash.

Find and list duplicates in an unordered array consisting of 10,000,000,00 elements

How can duplicate elements in an array, that consists of
unordered 10,000,000,00 elements, be determined? How can they be listed?
Please ensure the performance is taken care of while writing the logic of Java code.
What is the space complexity and time complexity of the logic?
Consider an example array, DuplicateArray[], as shown below.
String DuplicateArray[] = {"tom","wipro","hcl","Ibm","rachael","tom","wipro","hcl","Ibm","rachael",
"Bill","HP","hcl","Ibm","rachael","tom","wipro","hcl","Ibm","rachael","tom","wipro","hcl","Ibm","rachael",
"Bill","HP","hcl","Ibm","rachael","tom","wipro","hcl","Ibm","rachael","tom","wipro","hcl","Ibm","rachael",
"Agnus","wipro","hcl","Ibm","rachael","tom","wipro","hcl","Ibm","rachael","tom","wipro","hcl","Ibm","rachael",
"Obama","wipro","hcl","Ibm","rachael","tom","wipro","hcl","Ibm","rachael","rachael","tom","wipro","hcl","Ibm","rachael",
"Obama","HP","TCS","CTS","rachael","tom","wipro","hcl","Ibm","rachael","rachael","tom","wipro","hcl","Ibm","rachael"}

I suggest you to use Set. Best for you will be HashSet. Put your elements to it one by one. And check existence in every insert operation.
Something like this:
HashSet<String>hs = new HashSet<String>();
HashSet<String>Answer = new HashSet<String>();
for(String s: DuplicateArray){
if(!hs.contains(s))
hs.add(s);
else
Answer.add(s);
}
Code depends on the the assumption, that type of elements of your array is String

Here you go
class MyValues{
public int i = 1;
private String value = null;
public MyValues(String v){
value = v;
}
int hashCode()
{
return value.length;
}
boolean equals(Object obj){
return obj.equals(value);
}
}
Now iterate for duplicates
private Set<MyValues> values = new TreeSet<MyValues>();
for(String s : duplicatArray){
MyValues v = new MyValues(s);
if (values.add(v))
{
v.i++;
}
}
Time and space are both linear.

How many duplicates are expected? A few or comparable to the number of entries or something in between?
Do you know anything else about the values? E.g are they from some specific dictionary?
If not, iterate over the array, build a HashSet, noting when you are about to add an entry that's already there and keeping those in a list. I can't see anything else is going to be faster.

Firstly, do you mean 10,000,000,00 as one billion or 10 billion. If you mean the later, you cannot have more than 2 billion elements in an array or a Set. The suggestions you have so far will not work in this situation. To have 10 billion Strings in memory you will need at least 640 GB and AFAIK, there is not server available which will allow this volume of memory in a single JVM.
For a task this large, you may have to consider a solution which breaks up the work, either across multiple machines or put the work into files to be processed later.
You have to either assume;
You have a relatively small number of unique Strings. In this case, you can built a Set in memory of the words you have seen so far. These will fit into memory. (Or you might assume they do)
Break up the files into manageable sizes. A simple way to do this would be to write to a few hundred work files based on hashcode. The hashcode for the same strings will be the same so as you process each file in memory, you know that it will contain all the duplicates, if there are any.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Parsing out parts of a string in an array in Java - java

Related

How do we compare a list of strings to just a string

1 to 1 association of 2 string lists in java

Building an inverted index in Java-logic

IndexOutOfBoundsException - only sometimes?

Find and list duplicates in an unordered array consisting of 10,000,000,00 elements

Categories

Resources