.contains not working when reading from a text file? - java

Recently started Java and have been trying to make a database sorts of program which reads from a preset text file, the user can either search for a definition using the term or keywords/terms within the definition itself. The searching by term works fine but the key term always outputs not found.
FileReader fr = new FileReader("text.txt");
BufferedReader br = new BufferedReader(fr);
boolean found = false;
String line = br.readLine(); // first line so the term itself
String lineTwo = br.readLine(); // second line which is the definition
do {
if (lineTwo.toLowerCase().contains(keyterm.toLowerCase())) {
found = true;
System.out.println("Found "+keyterm);
System.out.println(line);
System.out.println(lineTwo);
}
} while ((br.readLine()!=null)&(!found));
if (!found){System.out.println("Not Found");} br.close(); fr.close();
This is my method used to check for the key term which works partially, it seems to be able to find the first two lines. Which causes it to output the definition of the first term if the key term is there however it doesn't work for any of the other terms.
edit
The text file it reads from looks something like this:
term
definition
term
definition
Each have their own line.
Edit 2
Thanks to #Matthew Kerian it now checks through the whole file, changing the end of the do while loop to
while (((lineTwo = br.readLine())!=null)&(!found));
It now finds the actual definition but is now outputting the wrong term with it.
Edit 3 The key term is defined by the users input
Edit 4 If it wasn't clear the output in the end I am looking for is either the definition of the term/key term if it is in the txt file or just not found if its not found.
Edit 5 Tried to look at what it was outputting and noticed it was outputting array (the first term in the text file) after every "lineTwo" it seems as though line is not updating.
Final Edit Managed to crudely solve the problem by making another text file with it flipped in the way it goes term definition it now goes definition term, lets me call upon the next line once the definition is found so it reads properly.

lineTwo is not begin refreshed with new data. Something like this would work better:
do {
if (lineTwo.toLowerCase().contains(keyterm.toLowerCase())) {
found = true;
System.out.println("Found "+keyterm);
System.out.println(line);
System.out.println(lineTwo);
}
} while (((lineTwo = br.readLine())!=null)&(!found));
We're still checking for EOF by checking nullness, but by setting it equal to line two we're constantly refreshing our buffer.

Related

Modifying txt file in Java without creating temp file

I have a .txt file that will be accessed by many users, possibly at the same time (or close to that) and because of that I need a way modify that txt file without creating a temporary file and I haven't found answer or solution to this. So far, I only found this approach ->
Take existing file -> modify something -> write it to a new file (temp file) -> delete the old file.
But his approach is not good to me, I need something like: Take existing file -> modify it -> save it.
Is this possible? I'm really sorry if this question already exists, I tried searching Stack-overflow and I read thru Oracle Docs but I haven't found solution that suits my needs.
EDIT:
After modification, file would stay the same size as before. For example imagine list of students, each student can have value 1 or 0 (passed or failed the exam)
So in this case I would just need to update one character per row in a file (that is per, student). Example:
Lee Jackson 0 -> Lee Jackson 0
Bob White 0 -> would become -> Bob White 1
Jessica Woo 1 -> Jessica Woo 1
In the example above we have a file with 3 records one below other and I need to update 2nd record while 1st and 3rd would became the same and all that without creating a new file.
Here's a potential approach using RandomAccessFile. The idea would be to use readline to read it in strings but to remember the position in the file so you can go back there and write a new line. It's still risky in case anything in the text encoding would change byte lenght, because that could overwrite the line break for example.
void modifyFile(String file) throws IOException {
try (RandomAccessFile raf = new RandomAccessFile(file, "rw")) {
long beforeLine = raf.getFilePointer();
String line;
while ((line = raf.readLine()) != null) {
// edit the line while keeping its length identical
if (line.endsWith("0")) {
line = line.substring(0, line.length() - 1) + "1";
}
// go back to the beginning of the line
raf.seek(beforeLine);
// overwrite the bytes of that line
raf.write(line.getBytes());
// advance past the line break
String ignored = raf.readLine();
// and remember that position again
beforeLine = raf.getFilePointer();
}
}
}
Handling correct String encoding is tricky in this case. If the file isn't in the encoding used by readline() and getBytes(), you could workaround that by doing
// file is in "iso-1234" encoding which is made up.
// reinterpret the byte as the correct encoding first
line = new String(line.getBytes("ISO-8859-1"), "iso-1234");
... modify line
// when writing use the expected encoding
raf.write(line.getBytes("iso-1234"));
See How to read UTF8 encoded file using RandomAccessFile?
Try storing the changes you want to make to a file in the RAM (string or linked list of strings). If you read in the file to a linked list of strings (per line of the file) and write a function to merge the string you want to insert into that linked list of lines from the file and then rewrite the file entirely by putting down every line from the linked list it should give you what you want. Heres what I mean in psudocode the order is important here.
By reading in the file and setting after input we minimize interference with other users.
String lineYouWantToWrite = yourInput
LinkedList<String> list = new LinkedList<String>()
while (file has another line)
list.add(file's next line)
add your string to whatever index of list you want
write list to file line by line, file's first line = list[1]...

I am trying to read a csv file in java. Below is my code

final BufferedReader bufferedReader = new BufferedReader(new
InputStreamReader(file.getInputStream(entry)));
String line = "";
while ((line = bufferedReader.readLine()) != null) {
System.out.println("line" + line);
final String[] rows = line.split(",");
this is my csv file
" 9:42:43AM","Aug 20, 2015","RaceSummary","Page:1","Id","Race","Type","Rot.","District","PrideFor","ArtSeq","ReportSeq","Content","Type","Md","Bar Group","1","LINC ADAPTER SECTION 4","Content","N","A - ARLIN","1","1","1","Oscar James, Sr.","Content","0","<N.P.>"
i am trying to print the column which i mentioned in the csv.But i dont know why my out put is getting upto "Pride" as one line and "For" as another line like that it was repeating for the next two values ("ArtSeq","ReportSeq").Can any one suggest me where i went wrong.
Thanks.
As you can see in your input you have second value have commas "Aug 20, 2015" this leads to more numbers of splits than that you expect.
Example :
You would expect this " 9:42:43AM","Aug 20, 2015" to be 2 parts but it will be three
[0]" 9:42:43AM"
[1]"Aug 20
[2] 2015"
You can change you split to be
line.split("\",\"");
I believe that should solve your problem.
Based on the output you provided...
line" 9:42:43AM","Aug 20, 2015","Race Summary","Page: 1","Id","Race","Type","Rot.","District","Pride
lineFor","Art lineSeq","Report lineSeq","Content","Type","Md","Bar Group","1","LINC ADAPTER SECTION 4","Content","N","A - ARLIN","1","1","1","Oscar James, Sr.","Content","0","<N.P.>"
Considering it is different then your input, I'd guess there might be a special character or something on the input file (for example, a tab or line spaceing). This is causing your while loop to read the first line (up to the line break), and then read the next line. If you put both of these onto the same line in the file it will probably work better.
I should clarify as well, nothing in the code you posted would cause this behaviour, it is either somewhere else in your code or in the file itself.

Scanner's nextLine(), Only fetching partial

So, using something like:
for (int i = 0; i < files.length; i++) {
if (!files[i].isDirectory() && files[i].canRead()) {
try {
Scanner scan = new Scanner(files[i]);
System.out.println("Generating Categories for " + files[i].toPath());
while (scan.hasNextLine()) {
count++;
String line = scan.nextLine();
System.out.println(" ->" + line);
line = line.split("\t", 2)[1];
System.out.println("!- " + line);
JsonParser parser = new JsonParser();
JsonObject object = parser.parse(line).getAsJsonObject();
Set<Entry<String, JsonElement>> entrySet = object.entrySet();
exploreSet(entrySet);
}
scan.close();
// System.out.println(keyset);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}
}
as one goes over a Hadoop output file, one of the JSON objects in the middle is breaking... because scan.nextLine() is not fetching the whole line before it brings it to split. ie, the output is:
->0 {"Flags":"0","transactions":{"totalTransactionAmount":"0","totalQuantitySold":"0"},"listingStatus":"NULL","conditionRollupId":"0","photoDisplayType":"0","title":"NULL","quantityAvailable":"0","viewItemCount":"0","visitCount":"0","itemCountryId":"0","itemAspects":{ ... "sellerSiteId":"0","siteId":"0","pictureUrl":"http://somewhere.com/45/x/AlphaNumeric/$(KGrHqR,!rgF!6n5wJSTBQO-G4k(Ww~~
!- {"Flags":"0","transactions":{"totalTransactionAmount":"0","totalQuantitySold":"0"},"listingStatus":"NULL","conditionRollupId":"0","photoDisplayType":"0","title":"NULL","quantityAvailable":"0","viewItemCount":"0","visitCount":"0","itemCountryId":"0","itemAspects":{ ... "sellerSiteId":"0","siteId":"0","pictureUrl":"http://somewhere.com/45/x/AlphaNumeric/$(KGrHqR,!rgF!6n5wJSTBQO-G4k(Ww~~
Most of the above data has been sanitized (not the URL (for the most part) however... )
and the URL continues as:
$(KGrHqZHJCgFBsO4dC3MBQdC2)Y4Tg~~60_1.JPG?set_id=8800005007
in the file....
So its slightly miffing.
This also is entry #112, and I have had other files parse without errors... but this one is screwing with my mind, mostly because I dont see how scan.nextLine() isnt working...
By debug output, the JSON error is caused by the string not being split properly.
And almost forgot, it also works JUST FINE if I attempt to put the offending line in its own file and parse just that.
EDIT:
Also blows up if I remove the offending line in about the same place.
Attempted with JVM 1.6 and 1.7
Workaround Solution:
BufferedReader scan = new BufferedReader(new FileReader(files[i]));
instead of scanner....
Based on your code, the best explanation I can come up with is that the line really does end after the "~~" according to the criteria used by Scanner.nextLine().
The criteria for an end-of-line are:
Something that matches this regex: "\r\n|[\n\r\u2028\u2029\u0085]" or
The end of the input stream
You say that the file continues after the "~~", so lets put EOF aside, and look at the regex. That will match any of the following:
The usual line separators:
<CR>
<NL>
<CR><NL>
... and three unusual forms of line separator that Scanner also recognizes.
0x0085 is the <NEL> or "next line" control code in the "ISO C1 Control" group
0x2028 is the Unicode "line separator" character
0x2029 is the Unicode "paragraph separator" character
My theory is that you've got one of the "unusual" forms in your input file, and this is not showing up in .... whatever tool it is that you are using to examine the files.
I suggest that you examine the input file using a tool that can show you the actual bytes of the file; e.g. the od utility on a Linux / Unix system. Also, check that this isn't caused by some kind of character encoding mismatch ... or trying to read or write binary data as text.
If these don't help, then the next step should be to run your application using your IDE's Java debugger, and single-step it through the Scanner.hasNextLine() and nextLine() calls to find out what the code is actually doing.
And almost forgot, it also works JUST FINE if I attempt to put the offending line in its own file and parse just that.
That's interesting. But if the tool you are using to extract the line is the same one that is not showing the (hypothesized) unusual line separator, then this evidence is not reliable. The process of extraction may be altering the "stuff" that is causing the problems.

How to skip rest of line using BufferedReader

In my program users frequently search a .txt file for certain information. To know if the right bit of data has been found I first check each line to see if it starts with a special character signalling the start of a group of data, something like this:
//one character has so far been read
if(character == '#'){
//continue to examine data
}else{
//skip the rest of the line
}
The problem I'm having is how to actually "skip the rest of the line", if the line did not start with my special character of choice.
As per complaints about insufficient information: I am indeed using a while loop to read each line
You can just do the action inside the if:
BufferedReader csvFile = new BufferedReader(
new InputStreamReader(inputStream));
while ((csvLine = csvFile.readLine()) != null) {
if (csvLine.charAt(0) == '#') {
// do # data action here
}
}
use the scanner class and the method nextLine().it will help you a lot
in case if it seems a bit difficult. then read the file line by line and then use RegEx pattern to check your required pattern for that line of file.

StringBuilders ending with mass nul characters

I'm having a very difficult time debugging a problem with an application I've been building. The problem itself I cannot seem to reproduce with a representitive test program with the same issue which makes it difficult to demonstrate. Unfortunately I cannot share my actual source because of security, however, the following test represents fairly well what I am doing, the fact that the files and data are unix style EOL, writing to a zip file with a PrintWriter, and the use of StringBuilders:
public class Tester {
public static void main(String[] args) {
// variables
File target = new File("TESTSAVE.zip");
PrintWriter printout1;
ZipOutputStream zipStream;
ZipEntry ent1;
StringBuilder testtext1 = new StringBuilder();
StringBuilder replacetext = new StringBuilder();
// ensure file replace
if (target.exists()) {
target.delete();
}
try {
// open the streams
zipStream = new ZipOutputStream(new FileOutputStream(target, true));
printout1 = new PrintWriter(zipStream);
ent1 = new ZipEntry("testfile.txt");
zipStream.putNextEntry(ent1);
// construct the data
for (int i = 0; i < 30; i++) {
testtext1.append("Testing 1 2 3 Many! \n");
}
replacetext.append("Testing 4 5 6 LOTS! \n");
replacetext.append("Testing 4 5 6 LOTS! \n");
// the replace operation
testtext1.replace(21, 42, replacetext.toString());
// write it
printout1 = new PrintWriter(zipStream);
printout1.println(testtext1);
// save it
printout1.flush();
zipStream.closeEntry();
printout1.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
The heart of the problem is that the file I see at my side is producing a file of 16.3k characters. My friend, whether he uses the app on his pc or whether he looks at exactly the same file as me sees a file of 19.999k characters, the extra characters being a CRLF followed by a massive number of null characters. No matter what application, encoding or views I use, I cannot at all see these nul characters, I only see a single LF at the last line, but I do see a file of 20k. In all cases there is a difference between what is seen with the exact same files on the two machines even though both are windows machines and both are using the same editing softwares to view.
I've not yet been able to reproduce this behaviour with any amount of dummy programs. I have been able to trace the final line's stray CRLF to my use of println on the PrintWriter, however. When I replaced the println(s) with print(s + '\n') the problem appeared to go away (the file size was 16.3k). However, when I returned the program to println(s), the problem does not appear to return. I'm currently having the files verified by a friend in france to see if the problem really did go away (since I cannot see the nuls but he can), but this behaviour has be thoroughly confused.
I've also noticed that the StringBuilder's replace function states "This sequence will be lengthened to accommodate the specified String if necessary". Given that the stringbuilders setLength function pads with nul characters and that the ensureCapacity function sets capacity to the greater of the input or (currentCapacity*2)+2, I suspected a relation somewhere. However, I have only once when testing with this idea been able to get a result that represented what I've seen, and have not been able to reproduce it since.
Does anyone have any idea what could be causing this error or at least have a suggestion on what direction to take the testing?
Edit since the comments section is broken for me:
Just to clarify, the output is required to be in unix format regardless of the OS, hence the use of '\n' directly rather than through a formatter. The original StringBuilder that is inserted into is not in fact generated to me but is the contents of a file read in by the program. I'm happy the reading process works, as the information in it is used heavily throughout the application. I've done a little probing too and found that directly prior to saving, the buffer IS the correct capacity and that the output when toString() is invoked is the correct length (i.e. it contains no null characters and is 16,363 long, not 19,999). This would put the cause of the error somewhere between generating the string and saving the zip file.
Finally found the cause. Managed to reproduce the problem a few times and traced the cause down not to the output side of the code but the input side. My file reading function was essentially this:
char[] buf;
int charcount = 0;
StringBuilder line = new StringBuilder(2048);
InputStreamReader reader = new InputStreamReader(stream);// provides a line-wise read
BufferedReader file = new BufferedReader(reader);
do { // capture loop
try {
buf = new char[2048];
charcount = file.read(buf, 0, 2048);
} catch (IOException e) {
return null; // unknown IO error
}
line.append(buf);
} while (charcount != -1);
// close and output
problem was appending a buffer that wasnt full, so the later values were still at their initial values of null. Reason I couldnt reproduce it was because some data filled in the buffers nicely, some didn't.
Why I couldn't seem to view the problem on my text editors I still have no idea of, but I should be able to resolve this now. Any suggestions on the best way to do so are welcome, as this is part of one of my long term utility libraries I want to keep it as generic and optimised as possible.

Categories

Resources