I'm trying to count the occurrences per line from a text file containing a large amount of codes (numbers).
Example of text file content:
9045,9107,2376,9017
2387,4405,4499,7120
9107,2376,3559,3488
9045,4405,3559,4499
I want to compare a similar set of numbers that I get from a text field, for example:
9107,4405,2387,4499
The only result I'm looking for, is if it contains more than 2 numbers (per line) from the text file. So in this case it will be true, because:
9045,9107,2376,9017 - false (1)
2387,4405,4499,7120 - true (3)
9107,2387,3559,3488 - false (2)
9045,4425,3559,4490 - false (0)
From what I understand, the best way to do this, is by using a 2d-array, and I've managed to get the file imported successfully:
Scanner in = null;
try {
in = new Scanner(new File("areas.txt"));
} catch (FileNotFoundException ex) {
Logger.getLogger(NewJFrame.class.getName()).log(Level.SEVERE, null, ex);
}
List < String[] > lines = new ArrayList < > ();
while ( in .hasNextLine()) {
String line = in .nextLine().trim();
String[] splitted = line.split(", ");
lines.add(splitted);
}
String[][] result = new String[lines.size()][];
for (int i = 0; i < result.length; i++) {
result[i] = lines.get(i);
}
System.out.println(Arrays.deepToString(result));
The result I get:
[[9045,9107,2376,9017], [2387,4405,4499,7120], [9107,2376,3559,3488], [9045,4405,3559,4499], [], []]
From here I'm a bit stuck on checking the codes individually per line. Any suggestions or advice? Is the 2d-array the best way of doing this, or is there maybe an easier or better way of doing it?
The expected number of inputs defines the type of searching algorithm you should use.
If you aren't searching through thousands of lines then a simple algorithm will do just fine. When in doubt favour simplicity over complex and hard to understand algorithms.
While it is not an efficient algorithm, in most cases a simple nested for-loop will do the trick.
A simple implementation would look like this:
final int FOUND_THRESHOLD = 2;
String[] comparedCodes = {"9107", "4405", "2387", "4499"};
String[][] allInputs = {
{"9045", "9107", "2376", "9017"}, // This should not match
{"2387", "4405", "4499", "7120"}, // This should match
{"9107", "2376", "3559", "3488"}, // This should not match
{"9045", "4405", "3559", "4499"}, // This should match
};
List<String[] > results = new ArrayList<>();
for (String[] input: allInputs) {
int numFound = 0;
// Compare the codes
for (String code: input) {
for (String c: comparedCodes) {
if (code.equals(c)) {
numFound++;
break; // Breaking out here prevents unnecessary work
}
}
if (numFound >= FOUND_THRESHOLD) {
results.add(input);
break; // Breaking out here prevents unnecessary work
}
}
}
for (String[] result: results) {
System.out.println(Arrays.toString(result));
}
which provides us with the output:
[2387, 4405, 4499, 7120]
[9045, 4405, 3559, 4499]
To expand on my comment, here's a rough outline of what you could do:
String textFieldContents = ... //get it
//build a set of the user input by splitting at commas
//a stream is used to be able to trim the elements before collecting them into a set
Set<String> userInput = Arrays.stream(textFieldContents .split(","))
.map(String::trim).collect(Collectors.toSet());
//stream the lines in the file
List<Boolean> matchResults = Files.lines(Path.of("areas.txt"))
//map each line to true/false
.map(line -> {
//split the line and stream the parts
return Arrays.stream(line.split(","))
//trim each part
.map(String::trim)
//select only those contained in the user input set
.filter(part -> userInput.contains(part))
//count matching elements and return whether there are more than 2 or not
.count() > 2l;
})
//collect the results into a list, each element position should correspond to the zero-based line number
.collect(Collectors.toList());
If you need to collect the matching lines instead of a flag per line you could replace map() with filter() (same content) and change the result type to List<String>.
Related
public class sortingtext {
public static void main(String[] args) throws IOException {
String readline="i have a sentence with words";
String[] words=readline.split(" ");
Arrays.sort(words, (a, b)->Integer.compare(b.length(), a.length()));
for (int i=0;i<words.length;i++)
{
int len = words[i].length();
int t=0;
System.out.println(len +"-"+words[i]);
}
}
input:
i have a sentence with words
My code split a string and then it should print each word and their length.
The output I get looks like:
8- sentence
5- words
4- have
4-with
1-I
1-a
I want to group the words of same length to get that:
8- sentence
5- words
4- have ,with
1- I ,a
But I don't get how to group them.
Easy with the stream API:
final Map<Integer, List<String>> lengthToWords = new TreeMap<>(
Arrays.stream(words)
.collect(Collectors.groupingBy(String::length))
);
The stream groups the words by length into a map (implementation detail, but it will be a HashMap), the TreeMap then sorts this map based on the key (the word length).
Alternatively, you can write it like this which is more efficient but in my opinion less readable.
final Map<Integer, List<String>> lengthToWords = Arrays.stream(words)
.collect(Collectors.groupingBy(String::length, TreeMap::new, Collectors.toList()));
If you are a beginner or not familiar with stream API:
public static void main(String[] args) {
String readline= "i have a sentence with words";
String[] words = readline.split(" ");
Arrays.sort(words, (a, b)->Integer.compare(b.length(), a.length()));
// declare a variable to hold the current string length
int currLength = -1;
for(int i = 0; i<words.length; i++){
if(currLength == words[i].length()){
// if currLength is equal to current word length just append a comma and this word
System.out.print(", "+words[i]);
}
else{
// if not update currLength, jump to a new line and print new length with the current word
currLength = words[i].length();
System.out.println();
System.out.print(currLength+ " - "+words[i]);
}
}
}
Note: The println("...") method prints the string "..." and moves the cursor to a new line. The print("...") method instead prints just the string "...", but does not move the cursor to a new line. Hence, subsequent printing instructions will print on the same line. The println() method can also be used without parameters, to position the cursor on the next line.
Say you have a text file with "abcdefghijklmnop" and you have to add 3 characters at a time to an array list of type string. So the first cell of the array list would have "abc", the second would have "def" and so on until all the characters are inputted.
public ArrayList<String> returnArray()throws FileNotFoundException
{
int i = 0
private ArrayList<String> list = new ArrayList<String>();
Scanner scanCharacters = new Scanner(file);
while (scanCharacters.hasNext())
{
list.add(scanCharacters.next().substring(i,i+3);
i+= 3;
}
scanCharacters.close();
return characters;
}
Please use the below code,
ArrayList<String> list = new ArrayList<String>();
int i = 0;
int x = 0;
Scanner scanCharacters = new Scanner(file);
scanCharacters.useDelimiter(System.getProperty("line.separator"));
String finalString = "";
while (scanCharacters.hasNext()) {
String[] tokens = scanCharacters.next().split("\t");
for (String str : tokens) {
finalString = StringUtils.deleteWhitespace(str);
for (i = 0; i < finalString.length(); i = i + 3) {
x = i + 3;
if (x < finalString.length()) {
list.add(finalString.substring(i, i + 3));
} else {
list.add(finalString.substring(i, finalString.length()));
}
}
}
}
System.out.println("list" + list);
Here i have used StringUtils.deleteWhitespace(str) of Apache String Utils to delete the blank space from the file tokens.and the if condition inside for loop to check the substring for three char is available in the string if its not then whatever character are left it will go to the list.My text file contains the below strings
asdfcshgfser ajsnsdxs in first line and in second line
sasdsd fghfdgfd
after executing the program result are as,
list[asd, fcs, hgf, ser, ajs, nsd, xs, sas, dsd, fgh, fdg, fd]
public ArrayList<String> returnArray()throws FileNotFoundException
{
private ArrayList<String> list = new ArrayList<String>();
Scanner scanCharacters = new Scanner(file);
String temp = "";
while (scanCharacters.hasNext())
{
temp+=scanCharacters.next();
}
while(temp.length() > 2){
list.add(temp.substring(0,3));
temp = temp.substring(3);
}
if(temp.length()>0){
list.add(temp);
}
scanCharacters.close();
return list;
}
In this example I read in all of the data from the file, and then parse it in groups of three. Scanner can never backtrack so using next will leave out some of the data the way you're using it. You are going to get groups of words (which are separated by spaces, Java's default delimiter) and then sub-stringing the first 3 letters off.
IE:
ALEXCY WOWZAMAN
Would give you:
ALE and WOW
The way my example works is it gets all of the letters in one string and continuously sub strings off letters of three until there are no more, and finally, it adds the remainders. Like the others have said, it would be good to read up on a different data parser such as BufferedReader. In addition, I suggest you research substrings and Scanner if you want to continue to use your current method.
I'm trying to write a program that takes in a text file as input, adds words in it as keys and the associated to the words values schould be page numbers they are located in. Text looks like this:
Page1
blah bla bl
Page2
some blah
So for word "blah" output must be
blah : [1,2].
I only inserted the keys, but I can't figure out how to insert associated values to them. Here's what I have so far:
BufferedReader reader = new BufferedReader(input);
try {
Map <String, List<Integer>> library
= new TreeMap<String, List<Integer>>();
String line = reader.readLine();
while (line != null) {
String[] tokens = line.trim().split("\\s+");
for (int i = 0; i < tokens.length; i++) {
String word = tokens[i];
if (!library.containsKey(word)
&& !word.startsWith("Page")) {
library.put(word, new LinkedList<Integer>());
if (tokens[0].startsWith("Page")
&& library.containsKey(word)) {
List<Integer> pages = library.get(word);
int page = getNum(tokens[0]);
pages.add(page);
page++;
}
}
}
}
line = reader.readLine();
}
}
To get number of page I use this method
private static int getNum(String s) {
int result = 0;
int p = 1;
int i = s.length() - 1;
while (i >= 0) {
int d = s.charAt(i) - '0';
if (d >= 0 && d <= 9) {
result += d * p;
} else {
break;
}
i--;
p *= 10;
}
return result;
}
Thank's for all Your ideas!
The pages variable is declared inside the scope of your inner if statement. Once that block ends the variable is out of scope and undefined. If you want to use the list of pages later then it needs to be declared as a class variable.
I assume you are using pages to later generate a table of contents. But it's not strictly necessary as you can generate it later from your word index - I'll demonstrate how to do that below.
You also need to declare a currentPage variable which hold the latest 'PageN' text you have seen. There's no need to increment this manually: you should just store the number in the text (which copes with blank pages).
Page numbers seem to always be on their own line so page detection should be on the line text not on the word (which copes with situations where a line reads 'for more information see Page72').
It's also worth checking that there's a valid page number before your first word.
So putting that all together your code should be structured something like the following:
Map<String, Set<Integer>> index = new TreeMap<>();
int currentPage = -1;
String currentLine;
while ((currentLine = reader.readLine()) != null) {
if (isPage(currentLine)) {
currentPage = getPageNum(currentLine);
} else {
assert currentPage > 0;
for (String word: words(currentLine)) {
if (!index.contains(word))
index.put(word, new TreeSet<>());
index.get(word).add(currentPage);
}
}
}
I've separated methods words, isPage and getPageNum but you seem to have working code for all of those.
I've also changed the List of pages to a Set to reflect the fact that you only want a word-page reference once in the index.
To get an ordered list of all pages from the index use:
index.values().stream()
.flatMap(List::stream).distinct().sorted()
.collect(Collectors.toList());
That's assuming Java8 but it's not too hard to convert if you don't have streams.
If you are going to generate a reverse index (pages to words) then for efficiency reasons you should probably create the reverse map (Map<Integer, List<String>>) as you are processing the words.
You should try something like this. I'm not totally sure how you're using the pages, but this code will check if library contains the word (like you already have) and then if it doesn't it will add the page number to the list for that word.
if (!library.containsKey(word) && !word.startsWith("Page")) {
library.put(word, new LinkedList<Integer>());
}
else {
library.put(word, library.get(word).add(page));
}
Your problem seems to be in this piece of logic:
if (tokens[0].startsWith("Page")
&& library.containsKey(word)) {
clearly you are adding page numbers only when line starts with Page otherwise the logic inside if condition is not executed so you never updated the page number for any words.
I have a serious problem with extracting terms from each string line. To be more specific, I have one csv formatted file which is actually not csv format (it saves all terms into line[0] only)
So, here's just example string line among thousands of string lines;
test.csv
line1 : "31451 CID005319044 15939353 C8H14O3S2 beta-lipoic acid C1CS#S[C##H]1CCCCC(=O)O "
line2 : "12232 COD05374044 23439353 C924O3S2 saponin CCCC(=O)O "
line3 : "9048 CTD042032 23241 C3HO4O3S2 Berberine [C##H]1CCCCC(=O)O "
I want to extract "beta-lipoic acid" ,"saponin" and "Berberine" only which is located in 5th position.
You can see there are big spaces between terms, so that's why I said 5th position.
In this case, how can I extract terms located in 5th position for each line?
one more thing ;
the length of whitespace between each six terms is not always equal.
the length could be one,two,three or four..five... something like that..
Another try:
import java.io.File;
import java.util.Scanner;
public class HelloWorld {
// The amount of columns per row, where each column is seperated by an arbitrary number
// of spaces or tabs
final static int COLS = 7;
public static void main(String[] args) {
System.out.println("Tokens:");
try (Scanner scanner = new Scanner(new File("input.txt")).useDelimiter("\\s+")) {
// Counten the current column-id
int n = 0;
String tmp = "";
StringBuilder item = new StringBuilder();
// Operating of a stream
while (scanner.hasNext()) {
tmp = scanner.next();
n += 1;
// If we have reached the fifth column, take its content and append the
// sixth column too, as the name we want consists of space-separated
// expressions. Feel free to customize of your name-layout varies.
if (n % COLS == 5) {
item.setLength(0);
item.append(tmp);
item.append(" ");
item.append(scanner.next());
n += 1;
System.out.println(item.toString()); // Doing some stuff with that
//expression we got
}
}
}
catch(java.io.IOException e){
System.out.println(e.getMessage());
}
}
}
if your line[]'s type is String
String s = line[0];
String[] split = s.split(" ");
return split[4]; //which is the fifth item
For the delimiter, if you want to go more precisely, you can use regular expression.
How is the column separated? For example, if the columns are separated by tab character, I believe you can use the split method. Try using the below:
String[] parts = str.split("\\t");
Your expected result will be in parts[4].
Just use String.split() using a regex for at least 2 whitespace characters:
String foo = "31451 CID005319044 15939353 C8H14O3S2 beta-lipoic acid C1CS#S[C##H]1CCCCC(=O)O";
String[] bar = foo.split("\\s\\s");
bar[4]; // beta-lipoic acid
I want to read a file and detect if the character after the symbol is a number or a word. If it is a number, I want to delete the symbol in front of it, translate the number into binary and replace it in the file. If it is a word, I want to set the characters to number 16 at first, but then, if another word is used, I want to add the 1 to the original number. Here's what I want:
If the file name reads (... represents a string that does not need to be translated):
%10
...
%firststring
...
%secondstring
...
%firststring
...
%11
...
and so on...
I want it to look like this:
0000000000001010 (10 in binary)
...
0000000000010000 (16 in binary)
...
0000000000010001 (another word was used, so 16+1 = 17 in binary)
...
0000000000010000 (16 in binary)
...
0000000000001011 (11 in binary)
And here's what I tried:
anyLines is just a string array which has the contents of the file (if I were to say System.out.println(anyLines[i]), I would the file's contents printed out).
UPDATED!
try {
ReadFile files = new ReadFile(file.getPath());
String[] anyLines = files.OpenFile();
int i;
int wordValue = 16;
// to keep track words that are already used
Map<String, Integer> wordValueMap = new HashMap<String, Integer>();
for (String line : anyLines) {
// if line doesn't begin with &, then ignore it
if (!line.startsWith("#")) {
continue;
}
// remove
line = line.substring(1);
Integer binaryValue = null;
if (line.matches("\\d+")) {
binaryValue = Integer.parseInt(line);
}
else if (line.matches("\\w+")) {
binaryValue = wordValueMap.get(line);
// if the map doesn't contain the word value, then assign and store it
if (binaryValue == null) {
binaryValue = wordValue;
wordValueMap.put(line, binaryValue);
++wordValue;
}
}
// I'm using Commons Lang's StringUtils.leftPad(..) to create the zero padded string
System.out.println(Integer.toBinaryString(binaryValue));
}
Now, I only have to replace the symbols (%10, %firststring, etc) with the binary value.
After executing this code, what I get as the output is:
1010
10000
10001
10000
1011
%10
...
%firststring
...
%secondstring
...
%firststring
...
%11
...
Now I just need to replace the %10 with 1010, the %firststring with 10000 and so on, so that the file would read like this:
0000000000001010 (10 in binary)
...
0000000000010000 (16 in binary)
...
0000000000010001 (another word was used, so 16+1 = 17 in binary)
...
0000000000010000 (16 in binary)
...
0000000000001011 (11 in binary)
Do you have any suggestions on how to make this work?
This may not be doing what you think it's doing:
int binaryValue = wordValue++;
Because you are using the post-increment operator, binary value is being assigned the old worldValue value, and then worldValue is incremented. I'd do this on two separate lines with the increment being done first:
wordValue++;
int binaryValue = wordValue; // binaryValue now gets the new value for wordValue
EDIT 1
OK, if you still need our help, I suggest you do the following:
Show us a sample of the data file so we can see what it actually looks like.
Explain the difference between the anyLines array and the lines array and how they relate to the data file. They both have Strings, and lines is obviously the result of splitting anyLines with "\n" but what again is anyLines. You state that the file is a text file, but how do you get the initial array of Strings from this text file? Is there another delimiter that you use to get this array? Have you tried to debug the code by printing out the contents of anyLines and lines?
If you need wordValue to persist with each iteration of a loop through anyLines (again, knowing what this is would help), you will need to declare and initialize it before the loop.
If you can't create and post an SSCCE, at least make your code formatting consistent and readable, something like the code below.
Have a look at the link on how to ask smart questions for more tips on information that you could give us that would help us to help you.
Sample code formatting:
try {
ReadFile files = new ReadFile(file.getPath());
String[] anyLines = files.OpenFile();
String[] anyLines = {};
int i;
// test if the program actually read the file
for (i = 0; i < anyLines.length; i++) {
String[] lines = anyLines[i].split("\n");
int wordValue = 76;
Map<String, Integer> wordValueMap = new HashMap<String, Integer>();
for (String currentLine : lines) {
if (!currentLine.startsWith("%")) {
continue;
}
currentLine = currentLine.substring(1);
Integer value;
if (currentLine.matches("\\d+")) {
value = Integer.parseInt(currentLine);
} else if (currentLine.matches("\\w+")) {
value = wordValueMap.get(currentLine);
if (value == null) {
int binaryValue = wordValue++;
wordValueMap.put(currentLine, binaryValue);
// TODO: fix below
// !! currentLine.replace(currentLine, binaryValue);
value = binaryValue;
}
} else {
System.out.println("Invalid input");
break;
}
System.out.println(Integer.toBinaryString(value));
}
}
} finally {
// Do we need a catch block? If so, catch what?
// What's supposed to go in here?
}
Luck!