How to disregard numbers when reading from a text file? - java

Right now I want to store a text file that goes like this:
1 apple
2 banana
3 orange
4 lynx
5 cappuccino
and so on into a data structure. Would the best way of doing this be mapping the int to the string somehow, or should I make an arraylist? I'm supposed to, when I store the words themselves, disregard the int and any whitespace, and keep only the word itself. How do I disregard the int when reading in lines? Here is my hacked together code right now:
public Dictionary(String filename) throws IOException {
if (filename==null)
throw new IllegalArgumentException("Null filename");
else{
try {
BufferedReader in = new BufferedReader(new FileReader(filename));
String str;
int numLines=0;
while ((str = in.readLine()) != null) {
numLines++;
}
String[] words=new String[numLines];
for (int i=0; i<words.length;i++){
words[i]=in.readLine();
}
in.close();
} catch (IOException e) {
}
}
}
Thank you in advance for the help!!

Just implement the power of the regular expression:
List texts<String> = new ArrayList<String>();
Pattern pattern = Pattern.compile("[^0-9\\s]+");
String text = "1 apple 2 oranges 3 carrots";
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
texts.add(matcher.group(0));
}
regular expressions are very much popular these days. the compile method is used for compiling your search pattern, with the numbers you see in the parameter is to prevent getting them on your search. So it's completely safe. use apache's IOUtilities to convert a text file to String

This won´t work because you are already at the end of the file, so the in.readLine() methode will return null.
I would use a Map to store the name and the amount...something like this:
HashMap<String, Integer> map = new HashMap<String, Integer>();
while( (line = br.readLine() !=null){
//also check if the array is null and the right size, trim, etc.
String[] tmp = line.split(" ");
map.put(tmp[1], Integer.parseInt(tmp[0]) );
}
Otherwise you can try it with the Scanner class. Good luck.

You can give regular expressions a try.
Pattern p = Pattern.compile("[^0-9\\s]+");
String s = "1 apple 2 oranges";
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group(0));
}
Output =
apple
oranges
To get a idea about regular expressions Java regex tutorial.

I suggest you use a List of items to store the results parsed from the file. One way to parse every text line is to use the String.split(String) method. Also note that you should handle exceptions in the code properly and do not forget to close the Reader when you are done (no matter whether flawlessly or with an exception => use a finally block). The following example should put you on track... Hope this helps.
package test;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.io.Reader;
import java.util.ArrayList;
import java.util.List;
public class Main {
public static void main(String[] args) throws IOException {
Main m = new Main();
m.start("test.txt");
}
private void start(String filename) throws IOException {
System.out.println(readFromFile(filename));
}
private final class Item {
private String name;
private int id;
public Item(String name, int id) {
this.name = name;
this.id = id;
}
public int getId() {
return id;
}
public String getName() {
return name;
}
#Override
public String toString() {
return "Item [name=" + name + ", id=" + id + "]";
}
}
private List<Item> readFromFile(String filename) throws IOException {
List<Item> items = new ArrayList<Item>();
Reader r = null;
try {
r = new FileReader(filename);
BufferedReader br = new BufferedReader(r);
String line = null;
while ((line = br.readLine()) != null) {
String[] lineItems = line.split(" ");
if (lineItems.length != 2) {
throw new IOException("Incorrect input file data format! Two space separated items expected on every line!");
}
try {
int id = Integer.parseInt(lineItems[0]);
Item i = new Item(lineItems[1], id);
items.add(i);
} catch (NumberFormatException ex) {
throw new IOException("Incorrect input file data format!", ex); // JDK6+
}
}
} finally {
if (r != null) {
r.close();
}
}
return items;
}
}

If your words don't contain spaces, you could use String.split( " " ) to split up the String into an array of Strings delimited by spaces.
Then just take the second element of the array (the first will be the number).
Also, the String.trim( ) method will remove any whitespace before or after the String.
Note: there's probably some error checking that you'd want to perform (what if the String isn't formatted as you expect). But this code snippet gives the basic idea:
...
String s = in.readLine( );
String[] tokens = s.split( " " );
words[i] = tokens[1].trim( );
...

If you want to do something easy just substring the original work by counting digits:
int t = 0;
while (word.charAt(t) >= '0' && word.charAt(t) <= '9')
++t;
word = word.substring(t);
If words NEVER contain spaces you can also use word.split(" ")[1]

Instead of using a buffer reader use the Scanner class and instead of using an Array use an ArrayList, like so :
import java.util.Scanner;
import java.util.ArrayList;
public class Dictionary {
private ArrayList strings = new ArrayList();
code...
public Dictionary(String fileName) throws IOException {
code...
try {
Scanner inFile = new Scanner(new fileRead(fileName));
ArrayList.add("Dummy"); // Dummy value to make the index start at 1
while(inFile.hasNext()) {
int n = inFile.nextInt(); // this line just reads in the int from the file and
// doesn't do anything with it
String s = inFile.nextLine().trim();
strings.add(s);
}
inFile.close(); // don't forget to close the file
}
and then since your data goes 1, 2, 3, 4, 5, you can just use the index to retrieve each item's number.
By doing this:
for(int i = 1; i < strings.size(); i++) {
int n = i;
String s = n + " " + strings.get(i);
System.out.println(s);
}

Related

How do I Take certain elements of a String array and create a new array java

so I am currently writing a program that reads in inputs from a file. (I am a beginner in java and don't understand a lot yet, so if you guys can work with me being slow that would be great.)
The file consists of a whole bunch of information regarding country data based on sales of products. The two pieces of the file that I care about are the Country names and the profit numbers. What I'm stuck with is how do I take specific portions of the file and read them into an array and then tally up the total profits? Currently I have read in the header of the file, found the indexes of the Country and profit of the header ( I assumed that finding the index of the headers will translate to finding numbers and names for profit and country later on). The file for example has multiple countries and they repeat multiple times through the file in a random order. Ex
Any help will be useful thanks!
my code right now is:
public static void main(String[]args)throws IOException{
Scanner in = new Scanner(new File("sample-csv-file-for-testing-fixed.csv"));
PrintWriter pw = new PrintWriter(new File("Output.csv"));
// gets first line of file
String firstline = in.nextLine();
firstline.trim();
String data = firstline.replaceAll(" ","");
String[] header = data.split(",") ;
// find index of Country and Profit and store them into variables
String country = "Country";
String profit = "Profit";
int index1 =0 , index2=0;
for(int i = 0;i<header.length;i++){
if(header[i].equals(country)){
index1 = i;
}
}
for(int i = 0;i<header.length;i++){
if(header[i].equals(profit)){
index2 = i;
}
}
System.out.println(index1+" "+index2);
while( in.hasNextLine()){
String line = in.nextLine();
String nextline = line.replaceAll(" ","");
String[] values = nextline.split(",");
for(int i = 0;i< values.length;i++){
System.out.print(values[i]+ " ");
}
}
// Read in line of file into string, separate the string into an array
// keep track of country names
// find a way to get rid of all other numbers except profit
// sum the total profit for each line for each country
// create a output file and print out the table
}
If I don't understand bad, you want something like this:
public static final String COUNTRY_HEADER = "country";
public static final String PROFIT_HEADER = "profit";
public static void main(String[] args) throws URISyntaxException, IOException {
final Scanner in = new Scanner(new File("src/main/resources/group-by.txt"));
final String firstLine = in.nextLine();
final String[] headers = firstLine.split(" ");
int countryIndex = -1;
int profitIndex = -1;
for (int i = 0; i < headers.length; i++) {
if (headers[i].equalsIgnoreCase(COUNTRY_HEADER)) {
countryIndex = i;
} else if (headers[i].equalsIgnoreCase(PROFIT_HEADER)) {
profitIndex = i;
}
}
final Map<String, Long> profitsByCountry = new HashMap<>();
while (in.hasNextLine()) {
final String line = in.nextLine();
final String[] values = line.split(" ");
profitsByCountry.merge(values[countryIndex], Long.valueOf(values[profitIndex]), Long::sum);
}
profitsByCountry.forEach((key, value) -> System.out.printf("Country: %s, Profit: %d%n", key, value));
// Do more stuff
}
Basically, once you have located the index of the columns you are looking for, you just need to go throw the rest of the lines in the file and accumulate their values.
Note: The data example you have offered has one mistake, there is an extra 'blah' in the last line for 'USA'
A File Stream based solution. Finding the index of the header uses the same logic as #Dave and #fjvierm.
public class FileStreaming {
public static void main(String[] args) {
try (BufferedReader br = Files.newBufferedReader(Paths.get("filestreamdata.txt"))) {
int[] idx = getIndex(br.readLine());
Map<String, Integer> result = br.lines()
.map(l -> l.split(" +"))
.map(ss -> new AbstractMap.SimpleEntry<>(ss[idx[0]], Integer.parseInt(ss[idx[1]])))
.collect(Collectors.toMap(AbstractMap.SimpleEntry::getKey,
AbstractMap.SimpleEntry::getValue,
Integer::sum));
result.forEach((key, value) -> System.out.printf("%s %d\n", key, value));
} catch (IOException e) {
e.printStackTrace();
}
}
private static int[] getIndex(String line) {
String[] splits = line.split(" +");
int[] result = new int[2];
for (int i = 0; i < splits.length; i++) {
if (splits[i].equals("country")) {
result[0] = i;
}
if (splits[i].equals("profit")) {
result[1] = i;
}
}
return result;
}
}
The below code might help.
It assumes: the header is always the first line; the header record begins with "Segment"; and the profit values are always in the same field position as the “Profit” header.
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.Files;
import java.io.IOException;
import java.util.*;
import java.util.stream.*;
public class SumTheProfit {
public static void main(String[] args) throws IOException {
String fileName = "test.csv";
String firstColumnHeader = "Segment";
String profitColumnHeader = "Profit";
// put header record into an array
Path filePath = Paths.get(fileName);
String[] firstLine = Files.lines(filePath)
.map(s -> s.replaceAll(" ", ""))
.map(s -> s.split(","))
.findFirst()
.get();
// get the index of "Profit" from the header
int profitIndex = java.util.Arrays.asList(firstLine).indexOf(profitColumnHeader);
List<String> list = new ArrayList<>();
// filter out header record & collect each profit (index 5) into a list
try (Stream<String> stream = Files.lines(Paths.get(fileName))) {
list = stream
.filter(line -> !line.startsWith(firstColumnHeader))
.map(line -> line.split("\\s*(,|\\s)\\s*")[profitIndex])
.collect(Collectors.toList());
} catch (IOException e) {
e.printStackTrace();
}
//sum each profit value in the list
Integer sum = list.stream().mapToInt(num -> Integer.parseInt(num)).sum();
System.out.println(sum);
}
}
This takes a declarative approach using the Java Streams API, as that's easier to read in comparison to an imperative for loop approach that hides application logic inside boilerplate code.

How to read a file character by character until a specific string is complete?

I'm struggling with this concept for a problem I have. I wrote a compression algorithm that compresses text files based on a list of character to code pairs that are stored in objects within an array. Now when decompressing I've realized I have to read each the file character by character until a string is created that matches one of the codes, write the character that that code corresponds to, and keep iterating over the file until it's finished.
I'm not too certain where to go from here, but here is what I have so far:
CompressedFile compFile = new CompressedFile(args[0],"read");
TextFile outFile = new TextFile(args[0].substring(0,args[0].lastIndexOf('.') )+".dec","write");
String output = "";
String temp = "";
char charac = 0;
check = false;
while(!check) {
charac =compFile.readBit();
if(charac==(char)0) {
throw new NullPointerException();
}
temp=compressionCodes.getCharacter((compressionCodes.findCode(charac)));
There's a lot more that's missing but it shouldn't really be important, it's just this loop that I'm really struggling with.
Here's a sample complete program that I have that illustrates how you can potentially do a match. NOTE: Using an array to store associations is a BAD IDEA if you have lookup performance. However, if you have to use an array, you simply need to iterate over the array, looking for the first lookup association that matches your search criteria.
Source
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.util.Scanner;
public class OpCodeLookupService {
Pair[] opCodes;
public static final class Pair {
String first;
String second;
public Pair(String first, String second) {
this.first = first;
this.second = second;
}
}
public OpCodeLookupService(Pair[] opCodes) {
this.opCodes = opCodes;
}
public Pair pairLookup(String toLookup) {
for(Pair p : this.opCodes) {
if (p.first.equals(toLookup)) {
return p;
}
}
return null;
}
public String lookup(String filePath) {
try {
// In the comments, you mentioned you cannot use BufferedReader to ingest the file. In this example, I'm showing another way via Scanner which is a very easy to use class for ingesting input streams.
Scanner s = new Scanner(new FileInputStream(filePath));
StringBuilder stringToExamine = new StringBuilder();
while (s.hasNext()) {
String nextString = s.next();
for (char c : nextString.toCharArray()) {
stringToExamine.append(c);
Pair pair = pairLookup(stringToExamine.toString());
if (pair != null) {
return pair.second;
}
}
}
return null; //Indicates string is not found.
} catch (FileNotFoundException e) {
e.printStackTrace();
throw new RuntimeException("Cannot load file");
}
}
public static void main(String...args) {
final Pair p = new Pair("thisisopcode", "12345");
Pair[] pairs = new Pair[1];
pairs[0] = p;
OpCodeLookupService opService = new OpCodeLookupService(pairs);
System.out.println(opService.lookup("/Users/liuben10/foo.txt"));
}
}
So given a text file that looks like this:
"thisisopcodeklajsdfklajdsfkljadf",
It would output:
"12345"
BufferedReader rd = new BufferedReader(new InputStreamReader(
new FileInputStream(fName),"utf-8"));
int k;
String concatString ="";
while((k =rd.read())!=-1){
if(concatString.equal(specificString) break;
else concatString += (char)k;
}
// break the while loop when string match

Read a paragraph from the user and replace specific words In java

How Would we write a program using Java to read a paragraph from the user and replace specific words mentioned in a vector to the following format, i.e.,
For example word Happy is reduced to H****.
Any Help will be Appriciated.
import java.io.*;
import java.util.*;
class replaceString {
public static String putStars(String str) {
char first_char = str.charAt(0);
String ans = new String();
ans = String.valueOf(first_char);
for(int i = 1;i < str.length(); ++i ) {
ans = ans + "*";
}
return ans;
}
public static String replaceWords(String str, Vector<String> v1) {
String[] words = str.split("\\W+"); //split across all types of punctuation
String ansString = new String();
for(String to_be_replaced : words) {
boolean found = false;
for(String to_replace_with : v1) {
if(to_be_replaced.equals(to_replace_with)) {
//System.out.println("in");
ansString = ansString +putStars(to_be_replaced) + " ";
found = true;
}
}
if(found == false) {
ansString = ansString + to_be_replaced + " ";
}
}
return ansString;
}
public static String replaceWords1(String str, Vector<String> v1) {
for(String currStr : v1) {
str.replace(str, );
}
return ansString;
}
public static void main(String args[])throws Exception {
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
System.out.println("Enter the paragraph that you would like to edit ");
String s = br.readLine();
// Let us assume some strings in our very own vector
Vector<String> v1 = new Vector<String>();
v1.addElement("Hello");
v1.addElement("Hi");
v1.addElement("Heya");
v1.addElement("Howdy");
v1.addElement("Howu");
String ans = replaceWords(s, v1);
System.out.println("Paragraph after replacement becomes\n\n"+ ans);
}
}
this is my current code but its not working fine
There could be other possibilities, but here's an example I did based on this answer:
We need all the words we need / want to match, and store them in an array:
String [] words = {"appy", "eya", "dy"};
(Optional) If you really need a Vector, I suggest to create a List (ArrayList) instead, and we can do it this way:
List <String> wordsToReplace = Arrays.asList(words);
Otherwise just modify the method in the next step to receive an array...
We create a function that receives this List and the phrase we want to check for and that returns the new String with the replaced text in it
So, our whole code ends up like this:
import java.util.Arrays;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class WordReplacer {
public static void main(String[] args) {
String [] words = {"appy", "eya", "dy"};
List <String> wordsToReplace = Arrays.asList(words);
System.out.println(replaceWords("Happy", wordsToReplace));
System.out.println(replaceWords("Heya", wordsToReplace));
System.out.println(replaceWords("Howdy?", wordsToReplace));
System.out.println(replaceWords("Howdy? My friend lives in Pompeya and every time I see her I say \"Heya\" to her, she is very happy", wordsToReplace));
}
private static String replaceWords(String word, List <String> wordsToReplace) {
for (String s : wordsToReplace) {
Pattern p = Pattern.compile(s, Pattern.CASE_INSENSITIVE); //We create a pattern that matches each word in our list. (1)
Matcher m = p.matcher(word); //We actually check for each match against our phrase
StringBuilder sb = new StringBuilder();
if (m.find()) { //If there was a match, we're going to replace each character for an '*' (2)
for (int i = 0; i < s.length(); i++) {
sb.append("*");
}
}
word = m.replaceAll(sb.toString()); //We replace each match with '*' (3)
}
return word; //We return the modified word
}
}
I'm going to explain what each comment (1), (2), (3) do in a better and simpler way:
(1) As shown in the linked answer, they use \b regex command to match whole words, but in this case we're using it to match parts of words, not whole words, so we don't need it here...
(2) Only if we found a match we fill the StringBuilder with * characters... If we didn't do it this way, we would be getting: H* instead of H**** for the case of Happy word, this way we ensure we get the correct amount of * for every word in the List.
(3) We replace the matches for the total number of * in the StringBuilder so we get the correct output.
The program above produces the following output:
H****
H***
How**?
How**? My friend lives in Pomp*** and every time I see her I say "H***" to her, she is very h****
Try something like that with a map that contains yours replacing rules :
String input; //input string
Map<String,String> mapReplace = new HashMap<String,String>();
mapReplace.put("Hello","H****");
Iterator<String> keys = mapReplace.keySet().iterator();
while(keys.hasNext()){
String key = keys.next();
input = input.replace(input, mapReplace.get(key));
}

Find word using Java

I am trying to write a Java class to find word surrounded by ( ) in text file and output the word and its occurrences in different line.
How can I write this in Java?
Input file
School (AAA) to (AAA) 10/22/2011 ssss(ffs)
(ffs) 7368 House 8/22/2011(h76yu) come 789 (AAA)
Car (h76yu) to (h76yu) extract9998790
2/3/2015 (AAA)
Output file
(AAA) 4
(ffs) 2
(h76yu) 3
This is what I got so far..
public class FindTextOccurances {
public static void main(String[] args) throws IOException {
int sum=0
String line = value.toString();
for (String word : line.split("(\\W+")) {
if (word.charAt(0) == '(‘ ) {
if (word.length() > 0) {
sum +=line.get();
}
context.write(new Text(word), new IntWritable(sum));
}
}
}
You can find the text between brackets without splitting or using regular expressions like so (assuming that all brackets are closed, and you don't have nested brackets):
int lastBracket = -1;
while (true) {
int start = line.indexOf('(', lastBracket + 1);
if (start == -1) {
break;
}
int end = line.indexOf(')', start + 1);
System.out.println(line.substring(start + 1, end - 1);
lastBracket = start;
}
If you split on "(\W+)" you are going to keep ALL the things that ARE NOT between parenthesis (as you are splitting on the parenthesized words).
What you want is a matcher:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
...
Map<String, Int> occurrences = new HashMap<>();
Matcher m = Pattern.compile("(\\W+)").matcher(myString);
while (m.find()) {
String matched = m.group();
String word =matched.substring(1, matched.length()-1); //remove parenthesis
occurrences.put(word, occurences.getOrDefault(word, 0)+1);
}
This may help i did it with regular expressions i did not declared variables adjust them as to your needs.I wish this may solve your problem
BufferedReader fr = new BufferedReader(new InputStreamReader(new FileInputStream(file), "ASCII"));
while(true)
{
String line = fr.readLine();
if(line==null)
break;
String[] words = line.split(" ");//those are your words
}
for(int i = 0;i<=words.length();i++)
{
String a = words[i];
if(a.matches("[(a-z)]+"))
{
j=i;
while(j<=words.length();)
{
count++;
}
System.out.println(a+" "+count);
}
}

reading lines in from a text file and sorting them into a linked list java

I need to read in each line from a text file, and sort it according to length first, and then it's position in the original document, before adding the lines to a linked list.
The contents of the list must then be printed out, line by line, with a prefix indicating which line number is being printed out, and how many non-space characters are on the line.
Below is a sample I/O:
Input (i.e. contents of text file)
this
is
just
a
test
Output
1/1: a
2/2: is
3/4: this
4/4: just
5/4: test
You'll need to use a File and a Scanner. The code would look something like this:
import java.io.*;
import java.util.scanner;
public class ReadAndWrite {
public static void main(String[] args) throws IOException {
Scanner scan = new Scanner(new File("yourfile.txt"));
int i = 1;
while(scan.hasNext()) {
String s = scan.nextLine();
System.out.println("Line " + i + " says " + s + " and has " + s.length() + " characters.";
i++;
}
System.out.println("/nNo more lines in the file.");
}
}
I need to read in each line from a text file : Use FileReader and BufferedReader
and sort it according to length first, and then it's position in the original document, before adding the lines to a linked list : create a HashMap with (String,lineNo) of original doc.
Use Comparator to sort - first by length, then by line pos (get it from hashMap)using ternary operator .
how many non-space characters are on the line : split the line using "s+" . add the lengths of all the sub arrays using a for loop.
while printing from the arraylist, print count + nonSpaceChars in line + line .
Hope this will be sufficient
Instead of solving it for you I will provide you with various links which will help you solve your assignment.
1) Readinga file in JAVA
2) Various string operations which can be performed on the string read : String operations
3) Sorting Collections in JAVA using compartors: Collection sorting
import java.util.*;
import java.io.*;
public class HelloWorld{
public static class mystruct {
public String line;
public int number;
public mystruct(String line, int count) {
this.line = line;
this.number = count;
}
}
public static void main(String []args){
LinkedList<mystruct> list = new LinkedList<mystruct>();
mystruct temp;
int count=0;
try{
FileInputStream fstream = new FileInputStream("input.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
String readline;
while ((readline = br.readLine()) != null) {
count++;
temp = new mystruct(readline, count);
list.add(temp);
}
in.close();
} catch (Exception e) {
System.err.println("Error: " + e.getMessage());
}
Collections.sort(list, new Comparator<mystruct>() {
public int compare(mystruct o1, mystruct o2) {
if (o1.line.length() != o2.line.length())
return (o1.line.length() - o2.line.length());
else {
return (o1.number - o2.number);
}
}
});
for (int i = 0; i < list.size(); i++) {
temp = list.get(i);
System.out.println(temp.line);
}
}
}

Categories

Resources