How can I group strings based on their length? - java

public class sortingtext {
public static void main(String[] args) throws IOException {
String readline="i have a sentence with words";
String[] words=readline.split(" ");
Arrays.sort(words, (a, b)->Integer.compare(b.length(), a.length()));
for (int i=0;i<words.length;i++)
{
int len = words[i].length();
int t=0;
System.out.println(len +"-"+words[i]);
}
}
input:
i have a sentence with words
My code split a string and then it should print each word and their length.
The output I get looks like:
8- sentence
5- words
4- have
4-with
1-I
1-a
I want to group the words of same length to get that:
8- sentence
5- words
4- have ,with
1- I ,a
But I don't get how to group them.

Easy with the stream API:
final Map<Integer, List<String>> lengthToWords = new TreeMap<>(
Arrays.stream(words)
.collect(Collectors.groupingBy(String::length))
);
The stream groups the words by length into a map (implementation detail, but it will be a HashMap), the TreeMap then sorts this map based on the key (the word length).
Alternatively, you can write it like this which is more efficient but in my opinion less readable.
final Map<Integer, List<String>> lengthToWords = Arrays.stream(words)
.collect(Collectors.groupingBy(String::length, TreeMap::new, Collectors.toList()));

If you are a beginner or not familiar with stream API:
public static void main(String[] args) {
String readline= "i have a sentence with words";
String[] words = readline.split(" ");
Arrays.sort(words, (a, b)->Integer.compare(b.length(), a.length()));
// declare a variable to hold the current string length
int currLength = -1;
for(int i = 0; i<words.length; i++){
if(currLength == words[i].length()){
// if currLength is equal to current word length just append a comma and this word
System.out.print(", "+words[i]);
}
else{
// if not update currLength, jump to a new line and print new length with the current word
currLength = words[i].length();
System.out.println();
System.out.print(currLength+ " - "+words[i]);
}
}
}
Note: The println("...") method prints the string "..." and moves the cursor to a new line. The print("...") method instead prints just the string "...", but does not move the cursor to a new line. Hence, subsequent printing instructions will print on the same line. The println() method can also be used without parameters, to position the cursor on the next line.

Related

How do I put indexes and words together to make 1 final string/array? (Java)

Background to question
I have been working on developing code that is able to read a string, compress it an send it to a new file in a compressed form. (e.g. "hello, hello" -> "hello[0, 1]) and this works great at the moment. Here is the link for anyone who wants to use it: https://pastebin.com/v6YF34mU . The next stage is being able to re-create the file using the indexes and the words.
Question
I currently have got this code:
public static void main(String[] args) {
// TODO code application logic here
Pattern seaPattern = Pattern.compile("(.*?)\\[(.*?)\\],");
String compressedSea = "see[2, 4, 5],sea[0, 1, 3],";
Matcher seaMatcher = seaPattern.matcher(compressedSea);
while (seaMatcher.find()) {
String seaWords = seaMatcher.group(1);
String[] seaIndexes = seaMatcher.group(2).split(", ");
for (String str : seaIndexes) {
int seaIndex = Integer.parseInt(str);
System.out.print(seaIndex);
}
In short. I reads the string and splits it up into to different arrays. One of the arrays contains the indexes and the other contains the words. The next stage is putting these together and creating 1 string that is able to be compressed. I am relatively new to Java so I am not completely sure how I would go about doing this.
If anyone has got any ideas on how to do this it would be much appreciated!
I would suggest you to create a class for combining your word and indexes.
I have posted a suggestion below:
Updated answer after OP clarified what he wanted to output
public static void main(String[] args) throws FileNotFoundException {
Map<Integer, String> wordMap = new HashMap<Integer, String>();
Pattern seaPattern = Pattern.compile("(.*?)\\[(.*?)\\],");
String compressedSea = "see[2, 4, 5],sea[0, 1, 3],";
Matcher seaMatcher = seaPattern.matcher(compressedSea);
while (seaMatcher.find()) {
String word = seaMatcher.group(1);
String[] seaIndexes= seaMatcher.group(2).split(", ");
for(String s : seaIndexes){
wordMap.put(Integer.valueOf(s), word);
}
}
//HashMap will printout ordered by the key value.
//This is because the key is an Integer
//The hashed key value is therefore the key value itself.
System.out.println(wordMap);
}
Output
{0=sea, 1=sea, 2=see, 3=sea, 4=see, 5=see}
Bonus
If you want to iterate though the HashMap, you can do the following:
Iterator it = wordMap.entrySet().iterator();
while(it.hasNext()){
Map.Entry pair = (Map.Entry)it.next();
System.out.println(pair.getValue());
}
Output
sea
sea
see
sea
see
see
If I understood it correctly.
Create a new array with the highest index in index array.
String[] finalString = new Array[highestIndex];
now loop through your words array
for(String str: wordsArray){
//grab the index of each str
//put your str in the new array at the same indexes
finalString [index] = str;
}
}
Are you just asking how to put two strings into one string? Or am I understanding wrong?
If it's just adding two strings, that can be done by using the + sign.
String newString = string1 + string2;
If you're asking how to put them into a hashmap (so key-value pair), that would be:
HashMap<Integer, String> map = new HashMap<>();
map.put(value, word);

What is the most efficient way to add 3 characters at a time to an araylist from a text file?

Say you have a text file with "abcdefghijklmnop" and you have to add 3 characters at a time to an array list of type string. So the first cell of the array list would have "abc", the second would have "def" and so on until all the characters are inputted.
public ArrayList<String> returnArray()throws FileNotFoundException
{
int i = 0
private ArrayList<String> list = new ArrayList<String>();
Scanner scanCharacters = new Scanner(file);
while (scanCharacters.hasNext())
{
list.add(scanCharacters.next().substring(i,i+3);
i+= 3;
}
scanCharacters.close();
return characters;
}
Please use the below code,
ArrayList<String> list = new ArrayList<String>();
int i = 0;
int x = 0;
Scanner scanCharacters = new Scanner(file);
scanCharacters.useDelimiter(System.getProperty("line.separator"));
String finalString = "";
while (scanCharacters.hasNext()) {
String[] tokens = scanCharacters.next().split("\t");
for (String str : tokens) {
finalString = StringUtils.deleteWhitespace(str);
for (i = 0; i < finalString.length(); i = i + 3) {
x = i + 3;
if (x < finalString.length()) {
list.add(finalString.substring(i, i + 3));
} else {
list.add(finalString.substring(i, finalString.length()));
}
}
}
}
System.out.println("list" + list);
Here i have used StringUtils.deleteWhitespace(str) of Apache String Utils to delete the blank space from the file tokens.and the if condition inside for loop to check the substring for three char is available in the string if its not then whatever character are left it will go to the list.My text file contains the below strings
asdfcshgfser ajsnsdxs in first line and in second line
sasdsd fghfdgfd
after executing the program result are as,
list[asd, fcs, hgf, ser, ajs, nsd, xs, sas, dsd, fgh, fdg, fd]
public ArrayList<String> returnArray()throws FileNotFoundException
{
private ArrayList<String> list = new ArrayList<String>();
Scanner scanCharacters = new Scanner(file);
String temp = "";
while (scanCharacters.hasNext())
{
temp+=scanCharacters.next();
}
while(temp.length() > 2){
list.add(temp.substring(0,3));
temp = temp.substring(3);
}
if(temp.length()>0){
list.add(temp);
}
scanCharacters.close();
return list;
}
In this example I read in all of the data from the file, and then parse it in groups of three. Scanner can never backtrack so using next will leave out some of the data the way you're using it. You are going to get groups of words (which are separated by spaces, Java's default delimiter) and then sub-stringing the first 3 letters off.
IE:
ALEXCY WOWZAMAN
Would give you:
ALE and WOW
The way my example works is it gets all of the letters in one string and continuously sub strings off letters of three until there are no more, and finally, it adds the remainders. Like the others have said, it would be good to read up on a different data parser such as BufferedReader. In addition, I suggest you research substrings and Scanner if you want to continue to use your current method.

Most efficient data structure for storing an alphabetically ordered word list

My program will read in a paragraph of words (stored in a text file). It will then need to do the following:
Print out a list of all the words (alphabetical order). For each word, print the frequency count (how many times the word appears in the entire paragraph) and the line numbers in which the word appears on (does not need to be ordered). If a word appears on a line multiple times, the line number does not need to be stored twice (the frequency count of this word will still be updated)
Display a list of words ordered from most frequent to least frequent.
The user will input a specific word. If the word is found, print out its frequency count.
Limitations: I cannot use the Collections class and I cannot store data multiple times. (e.g. Reading in words from the paragraph and storing them into a Set and an ArrayList)
Coding this won't be hard, but I can't figure out what would be the most efficient implementation since the data size could be a few paragraphs from a Wikipedia article or something. Here's my idea for now:
Have a Word class. This Word class will contain methods to return the word's frequency count and the lines in which the word appears on (and other relevant data).
The paragraph will be stored in a text file. The program will read the data line by line. Split the line into an array and read in words one by one.
As words are being read in from the text file, put the words into some sort of structure. If the structure does not contain the word, create a new word object.
If the structure already contains the word, update the frequency counter for that word.
I will also have a int to record down the line number. These line numbers will be updated accordingly.
This is somewhat incomplete, but it is what I'm thinking for now. The whole 'Word' class may probably be completely unnecessary, too.
First, you could create a class that holds the data for the occurrences and the row numbers (along with the word). This class could implement the Comparable interface, providing easy comparisons based on the word frequencies:
public class WordOccurrence implements Comparable<WordOccurrence> {
private final String word;
private int totalCount = 0;
private Set<Integer> lineNumbers = new TreeSet<>();
public WordOccurrence(String word, int firstLineNumber) {
this.word = word;
addOccurrence(firstLineNumber);
}
public final void addOccurrence(int lineNumber) {
totalCount++;
lineNumbers.add(lineNumber);
}
#Override
public int compareTo(WordOccurrence o) {
return totalCount - o.totalCount;
}
#Override
public String toString() {
StringBuilder lineNumberInfo = new StringBuilder("[");
for (int line : lineNumbers) {
if (lineNumberInfo.length() > 1) {
lineNumberInfo.append(", ");
}
lineNumberInfo.append(line);
}
lineNumberInfo.append("]");
return word + ", occurences: " + totalCount + ", on rows "
+ lineNumberInfo.toString();
}
}
When reading the words from the file, it's useful to return the data in a Map<String, WordOccurrence>, mapping words into WordOccurrences. Using a TreeMap, you'll get alphabetical ordering "for free". Also, you may want to remove punctuation from the lines (e.g. using a regexp like \\p{P}) and ignore the case of the words:
public TreeMap<String, WordOccurrence> countOccurrences(String filePath)
throws IOException {
TreeMap<String, WordOccurrence> words = new TreeMap<>();
File file = new File(filePath);
BufferedReader reader = new BufferedReader(new InputStreamReader(
new FileInputStream(file)));
String line = null;
int lineNumber = 0;
while ((line = reader.readLine()) != null) {
// remove punctuation and normalize to lower-case
line = line.replaceAll("\\p{P}", "").toLowerCase();
lineNumber++;
String[] tokens = line.split("\\s+");
for (String token : tokens) {
if (words.containsKey(token)) {
words.get(token).addOccurrence(lineNumber);
} else {
words.put(token, new WordOccurrence(token, lineNumber));
}
}
}
return words;
}
Displaying the occurrences in alphabetical order using the above code is as simple as
for (Map.Entry<String, WordOccurrence> entry :
countOccurrences("path/to/file").entrySet()) {
System.out.println(entry.getValue());
}
If you cannot use Collections.sort() (and a Comparator<WordOccurrence>) for sorting by occurrences, you need to write the sorting yourself. Something like this should do it:
public static void displayInOrderOfOccurrence(
Map<String, WordOccurrence> words) {
List<WordOccurrence> orderedByOccurrence = new ArrayList<>();
// sort
for (Map.Entry<String, WordOccurrence> entry : words.entrySet()) {
WordOccurrence wo = entry.getValue();
// initialize the list on the first round
if (orderedByOccurrence.isEmpty()) {
orderedByOccurrence.add(wo);
} else {
for (int i = 0; i < orderedByOccurrence.size(); i++) {
if (wo.compareTo(orderedByOccurrence.get(i)) > 0) {
orderedByOccurrence.add(i, wo);
break;
} else if (i == orderedByOccurrence.size() - 1) {
orderedByOccurrence.add(wo);
break;
}
}
}
}
// display
for (WordOccurrence wo : orderedByOccurence) {
System.out.println(wo);
}
}
Running the above code using the following test data:
Potato; orange.
Banana; apple, apple; potato.
Potato.
will produce this output:
apple, occurrences: 2, on rows [2]
banana, occurrences: 1, on rows [2]
orange, occurrences: 1, on rows [1]
potato, occurrences: 3, on rows [1, 2, 3]
potato, occurrences: 3, on rows [1, 2, 3]
apple, occurrences: 2, on rows [2]
banana, occurrences: 1, on rows [2]
orange, occurrences: 1, on rows [1]
You can use a simple TreeMap<String, Integer> for frequency lookups.
Lookups should be O(1), given that the words are short(i.e what you would find a normal text). If you expect lots of unsuccessful lookups (lots of searches for words that don't exist), you could prefilter using a Bloom Filter.
I'd start with a straightforward implementation, and optimize further if needed (parse the stream directly, instead of splitting each line with a separator and reiterating).
you can use TreeMap it is very suitable for getting the data ordered. use your word as key and the frequency as value. for example let the following is you paragraph
Java is good language Java is object oriented
so I will do the following in order to store each word and its frequency
String s = "Java is good language Java is object oriented" ;
String strArr [] = s.split(" ") ;
TreeMap<String, Integer> tm = new TreeMap<String, Integer>();
for(String str : strArr){
if(tm.get(str) == null){
tm.put(str, 1) ;
}else{
int count = tm.get(str) ;
count+=1 ;
}
}
hopefully this will help you
you can have a structure like this one :
https://gist.github.com/jeorfevre/946ede55ad93cc811cf8
/**
*
* #author Jean-Emmanuel je#Rizze.com
*
*/
public class WordsIndex{
HashMap<String, Word> words = new HashMap<String, Word>();
public static void put(String word, int line, int paragraph){
word=word.toLowerCase();
if(words.containsKey(word)){
Word w=words.get(word);
w.count++;
}else{
//new word
Word w = new Word();
w.count=1;
w.line=line;
w.paragraph=paragraph;
w.word=word;
words.put(word, w);
}
}
}
public class Word{
String word;
int count;
int line;
int paragraph;
}
enjoy

Find the length of longest chain formed using given words in String

Okk As programmer we love get involved in logic building but that is not the case some time we become blank over some type of puzzle as below mentioned. Let me declare that this is not any kind of homework or job stuff it simply a logic and performance practice puzzle.Okk the puzzle of given an Strings` with comma separated words like
String S= peas,sugar,rice,soup
Now crux is to find out length of longest chain of the words like last character of word should be the first character of next word and so on to create a longest possible chain and finally to calculate the length of that chain.
Now I had tried to figure out some sort of solution like
split the string with comma
add them in list
sort that list
etc
but now how to develop further logic As I m little poor over logic development,Help is appreciated and if above half logic is not proper as it should be than what must the simple sort and perfect way to get the length of the longest chain of words.
Summary
input: String S= peas,sugar,rice,soup.
output: 4 length of words (peas->sugar->rice->soup) or (soup->peas->sugar->rice) etc
Once you have list (or array) you can iterate over the array checking your condition (equality of last letter of n-th words with the first letter of first word) and increase counter each time. Once the condition is false just escape the loop. Your counter will hold value you need.
okk friends here the logic and core part which I had made and my puzzle got solved
import java.util.Map;
import java.util.Stack;
public class CandidateCode
{
public static int chainLength=0;
public static void main(String[] args) {
String s= "peas,sugar,rice,soup";
int chainLengthfinal=wordChain(s);
System.out.println("final length:"+chainLengthfinal);
}
public static int wordChain(String input1)
{
List<String> stringList = new ArrayList<String>();
stringList= Arrays.asList(input1.split(","));
boolean ischain = new CandidateCode().hasChain(stringList);
if (ischain) {
return chainLength;
}
return 0;
}
Map<Character, List<String>> startsWith = new HashMap<Character, List<String>>();
Map<Character, List<String>> endsWith = new HashMap<Character, List<String>>();
private Character getFirstChar(String str) {
return str.charAt(0);
}
private Character getLastChar(String str) {
return str.charAt(str.length() - 1);
}
boolean hasChain(List<String> stringList) {
for (String str : stringList) {
Character start = getFirstChar(str);
Character end = getLastChar(str);
List<String> startsWithList;
List<String> endsWithList;
if (startsWith.containsKey(start)) {
startsWithList = startsWith.get(start);
} else {
startsWithList = new ArrayList<String>();
startsWith.put(start, startsWithList);
}
if (endsWith.containsKey(end)) {
endsWithList = endsWith.get(end);
} else {
endsWithList = new ArrayList<String>();
endsWith.put(end, endsWithList);
}
startsWithList.add(str);
endsWithList.add(str);
}
Stack<String> stringStack = new Stack<String>();
for (String str : stringList) {
if (hasChain(stringList.size(), str, stringStack)) {
System.out.println(stringStack);
System.out.println("size "+stringStack.size());
chainLength= stringStack.size();
return true;
}
}
return false;
}
private boolean hasChain(int size, String startString, Stack<String> stringStack) {
if (size == stringStack.size()) return true;
Character last = getLastChar(startString);
if (startsWith.containsKey(last)) {
List<String> stringList = startsWith.get(last);
for (int i = 0; i < stringList.size(); i++) {
String candidate = stringList.remove(i--);
stringStack.push(candidate);
if (hasChain(size, candidate, stringStack)) {
return true;
}
stringStack.pop();
stringList.add(++i, candidate);
}
}
return false;
}
}
output of the above program will be
[soup, peas, sugar, rice]
size 4.
final length:4.
initialize a " " string named last(String last=" ")
get the first string by splitting with comma
substring the last char of the string and store it to last
boolean brokenchain=false;
length=0;
while(more string to split with comma)&&(!brokenchain){
split string with comma
substring to get first char
if(first char!=last){
brokenchain=true;
}else{
length++;
get last char of this string with substring and store it to last
}
}
if you have for input a sequence of legth 5 and the it brokes and there is a sequence of length 6 following which you want to count and print as output, you have to store the count variable in a map, for example, as a key associated with the sequence as far. then you continue the loop(you have to make the brokenchain=false again) until the input string sequence ends. then you get the bigger key from your map and print it with his associated value(the biggest sequence)
I think you need to find the largest and smallest number.
split the string with comma
add them as list_item
compare list_item1 and list_item2, the largest value becomes list_item_X
compare list_item3 and list_item4, the largest value becomes list_item_Y
Now compare list_item1 and list_item_X, the largest value becomes
So the largest value is list_item_Z, here is implimentation through code.
$s = 'peas,sugar,rice,soup';
$list_items = explode(',', $s);
$lengths = array_map('strlen', $list_items);
echo "The shortest is " . min($lengths) .
". The longest is " . max($lengths);

How can I extract specific terms from each string line?

I have a serious problem with extracting terms from each string line. To be more specific, I have one csv formatted file which is actually not csv format (it saves all terms into line[0] only)
So, here's just example string line among thousands of string lines;
test.csv
line1 : "31451    CID005319044   15939353   C8H14O3S2    beta-lipoic acid   C1CS#S[C##H]1CCCCC(=O)O "
line2 : "12232 COD05374044 23439353  C924O3S2    saponin   CCCC(=O)O "
line3 : "9048   CTD042032 23241  C3HO4O3S2 Berberine  [C##H]1CCCCC(=O)O "
I want to extract "beta-lipoic acid" ,"saponin" and "Berberine" only which is located in 5th position.
You can see there are big spaces between terms, so that's why I said 5th position.
In this case, how can I extract terms located in 5th position for each line?
one more thing ;
the length of whitespace between each six terms is not always equal.
the length could be one,two,three or four..five... something like that..
Another try:
import java.io.File;
import java.util.Scanner;
public class HelloWorld {
// The amount of columns per row, where each column is seperated by an arbitrary number
// of spaces or tabs
final static int COLS = 7;
public static void main(String[] args) {
System.out.println("Tokens:");
try (Scanner scanner = new Scanner(new File("input.txt")).useDelimiter("\\s+")) {
// Counten the current column-id
int n = 0;
String tmp = "";
StringBuilder item = new StringBuilder();
// Operating of a stream
while (scanner.hasNext()) {
tmp = scanner.next();
n += 1;
// If we have reached the fifth column, take its content and append the
// sixth column too, as the name we want consists of space-separated
// expressions. Feel free to customize of your name-layout varies.
if (n % COLS == 5) {
item.setLength(0);
item.append(tmp);
item.append(" ");
item.append(scanner.next());
n += 1;
System.out.println(item.toString()); // Doing some stuff with that
//expression we got
}
}
}
catch(java.io.IOException e){
System.out.println(e.getMessage());
}
}
}
if your line[]'s type is String
String s = line[0];
String[] split = s.split(" ");
return split[4]; //which is the fifth item
For the delimiter, if you want to go more precisely, you can use regular expression.
How is the column separated? For example, if the columns are separated by tab character, I believe you can use the split method. Try using the below:
String[] parts = str.split("\\t");
Your expected result will be in parts[4].
Just use String.split() using a regex for at least 2 whitespace characters:
String foo = "31451    CID005319044   15939353   C8H14O3S2    beta-lipoic acid   C1CS#S[C##H]1CCCCC(=O)O";
String[] bar = foo.split("\\s\\s");
bar[4]; // beta-lipoic acid

Categories

Resources