JLine, how to get the full filename in Windows - java

Dear all JLine users,
I am recently developing a console application where I use JLine, to provide command and file name completion.
It works pretty well with FileNameCompleter but however I cannot get the full file name right.
My code is like below:
List<Completer> loadCompleter =
Arrays.asList(
new StringsCompleter(commands),
new FileNameCompleter(),
new NullCompleter()
);
console.addCompleter(new ArgumentCompleter(loadCompleter));
while ((line = console.readLine()) != null) {
line = line.trim();
// here I print out the line in char.
char[] result = line.toCharArray();
for (int i = 0; i < result.length; i ++) {
System.out.println(result[i] + " : " + (int)result[i]);
}
}
In the last part of my code I am printing out the line I got from the console, if for example, I have received
myCommand test\new\test.txt
the output is myCommand testnewtest.txt
The backward slash is gone for some reason and I never got the right file path.
This is not an issue when I am testing in Unix like system since forward slash seems ok.
Can anyone help me on the right way of getting the full filename? Many thanks.
Si.

JLine eats backslashes because they are used to escape special characters such as !. You can disable special characters (and the loss of backslashes) by adding the following into your ConsoleReader initialization:
console.setExpandEvents(false);
Alternatively, if you do want to retain special characters, you need to double up your backslashes (so instead of foo\bar, input foo\\bar).

It works for me now, I am jline-2.11.jar.
See https://github.com/Qatar-Computing-Research-Institute/NADEEF/blob/master/console/src/qa/qcri/nadeef/console/Console.java

Related

Java String contains/indexof fails due to wrong encoding from local file

EDIT:
I have a semi-working solution at the bottom.
Or, the original text:
I have a local CSV file. The file is encoded in utf16le. I want to read the file into memory in java, modify it, then write it out. I have been having incredibly strange problems for hours.
The source of the file is Facebook leads generation. It is a CSV. Each line of the file contains the text "2022-08-08". However when I read in the line with a buffered reader, all String methods fail. contains("2022-08-08") returns false. I print out the line directly after checking, and it indeed contains the text "2022-08-08". So the String methods are totally failing.
I think it's possibly due to encoding but I'm not sure. I tried pasting the code into this website for help, but any part of the code that includes copy pasted strings from the CSV file refuses to paste into my browser.
int i = s.indexOf("2022");
if (i < 0) {
System.out.println(s.contains("2022") + ", "+s);
continue;
}
Prints: false, 2022-08-08T19:57:51+07:00
There are tons of invisible characters in the CSV file and in my IDE everywhere I have copy pasted from the file. I know the characters are there because when I backspace them it deletes the invisible character instead of the actual character I would expect it to delete.
Please help me.
EDIT:
This code appears to fix the problem. I think partially the problem is Facebook's encoding of the file, and partially because the file is from user generated inputs and there are a few very strange inputs. If anyone has more to add or a better solution I will award it. Not sure exactly why it works. Combined from different sources that had sparse explanation.
Is there a way to determine the encoding automatically? Windows Notepad is able to do it.
BufferedReader fr = new BufferedReader(new InputStreamReader(new FileInputStream(new File("C:\\New folder\\form.csv")), "UTF-16LE"));
BufferedWriter fw = Files.newBufferedWriter(Paths.get("C:\\New folder", "form3.txt"));
String s;
while ((s = fr.readLine()) != null) {
s = s.replaceAll("\\p{C}", "?").replaceAll("[^A-Za-z0-9],", "").replaceAll("[^\\x00-\\x7F]", "");
//doo stuff with s normally
}
You can verify what you're getting from the stream by
byte[] b = s.getBytes(StandardCharsets.UTF_16BE);
System.out.println(Arrays.toString(b));
I think the searching condition for indexOf could be wrong:
int i = s.indexOf("2022");
if (i < 0) {
System.out.println(s.contains("2022") + ", "+s);
continue;
}
Maybe the condition should be (i != -1), if I'm not wrong too much.
It's a little tricky, because for (i < 0) the string should not contain "2022".

Strings act weirdly when reading them from a file with the java.util.Scanner and using linebreaks as delimiter

I try to read data from a file using the java.util.Scanner. When I try to use \n as a delimiter, then the resulting Strings react weirdly when I try to add more text to them.
I have a file called "test.txt" and try to read data from it. I then want to add more text to each String, similar to how this would print Hello World!:
String helloWorld = "Hello "+"World!";
System.out.println(helloWorld);.
I tried combining data with +, I tried += and I tried String.concat(), this has worked for me before and usually still works.
I also tried to use different delimiters, or no delimiter at all, both of those work as I expect, but I need the Strings to be separated at line breaks.
The test.txt file for the minimal reproducible example contains this text (there is a space at the end of each line):
zero:
one:
two:
three:
void minimalReproducibleExample() throws Exception { //May throw an exception if the test.txt file can't be found
String[] data = new String[4];
java.io.File file = new java.io.File("test.txt");
java.util.Scanner scanner = new java.util.Scanner(file).useDelimiter("\n");
for (int i=0;i<4;i++) {
data[i] = scanner.next(); //read the next line
data[i] += i; //add a number at the end of the String
System.out.println(data[i]); //print the String with the number
}
scanner.close();
}
I expect this code to print out these lines:
zero: 0
one: 1
two: 2
three: 3
I get this output instead:
0ero:
1ne:
2wo:
three: 3
Why do I not get the expected output when using \n as a delimiter?
The test.txt is most likely using Windows end of line representation \r\n which results in carriage return \r still being present in the String after it's read.
Ensure that test.txt is using \n as line delimtier or use Windows \r\n in Scanner.useDelimiter().
The problem is most likely with a wrong delimiter. If you use Windows 10, the new line separator is \r\n.
Just to be platform-independent use System.getProperty("line.separator") instead of hardcoded \n.

How to remove single quotes with double quotes in java

I don't have any idea about regex pattern matching, and I am having a problem with single quotes in file path when when I run batch file through exec() command, I get the following error i.e
Error is-
Windows cannot find 'C:\Program'.
I am having trouble with single quotes when CMD tries to get into the desired directory.
So, anyone could tell me what to do here??
I created a batch file to compile and run java programs I have a function called createrunbat(String,String), and following code:
private File createrunbat(String str,String par)
{
if(str.startsWith("Text Editor-",0))
{
str=str.replaceFirst("Text Editor-","");
}
String sng,s2;
File fe;
try{
FileOutputStream fos;
DataOutputStream dos;
sng=str;
int a=sng.indexOf(".");
sng=sng.substring(0,a);
file=new File(jfc.getSelectedFile().getParent(),sng+".bat");
fd=file.getAbsoluteFile();
str=fd.getParent().substring(0, 2);
fos=new FileOutputStream(file);
dos=new DataOutputStream(fos);
dos.writeBytes("#echo off \n");
dos.writeBytes("cd\\"+"\n");
if(fd.getParentFile().isDirectory())
{
dos.writeBytes(str+"\n");
}
s2=jfc.getSelectedFile().getParent();//I am having single quote problem from here
dos.writeBytes("cd "+s2+"\\"+"\n");
dos.writeBytes("javac "+sng+".java"+"\n");
dos.writeBytes("java "+sng+" "+par+"\n");
dos.writeBytes("pause \n");
dos.writeBytes("exit \n");
dos.close();
}
catch(FileNotFoundException ex)
{
}
catch(IOException ex2)
{
JOptionPane.showMessageDialog(this,ex2.toString());
}
return fd;
}
I think this is a rather more of a case that the blank in the path name is causing trouble and you will need to wrap quotes around the path
dos.writeBytes ("cd \"" + s2 +"\""+"\n");
You are perhaps confusing the error output with the input.
Windows cannot find 'C:\Program'.
The single quotes there are used to wrap the problematic data so you the developer know the boundaries of the input causing issues. The single quotes are not part of what your program interprets.
As others have suggested I imagine the real issue is the whitespace in your path. Your command line is reading the path as two separate arguments instead of one. Ironically, wrapping the path in quotes should fix the issue.
'C:\Program Files\SomePlace\...'
^ gets cut on whitespace and becomes two arguments instead of one:
'C:\Program' and 'Files\SomePlace\...'
'"C:\Program Files\SomePlace\..."'
^ quotes will keep the path together as a single argument
Edit: How to wrap the path.
Java1 has a good solution in their answer so I'll offer an alternative one that uses String formatting.
String safePath = String.format("\"%s\"", jfc.getSelectedFile().getParent().getAbsolutePath());
In this instance, the first argument to the String.format() method is the pattern to use, the second is the variable to substitute.
The actual quotes that will be around the path must be escaped (\") as they have a special meaning in java to denote the start or end of a String. You must escape them to use them inside a String. The place-holder is where your path will be placed (%s).
Side note:
You should really use much more descriptive variable names in your code. It is quite bad practice to use names like s2, sng, fe, fd and so on. Be descriptive and exact with your naming and following, debugging and writing your code will become easier.

Java BufferedWriter Creating Null Characters

I've been using Java's BufferedWriter to write to a file to parse out some input. When I open the file after, however, there seems to be added null characters. I tried specifying the encoding as "US-ASCII" and "UTF8" but I get the same result. Here's my code snippet:
Scanner fileScanner = new Scanner(original);
BufferedWriter out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(file), "US-ASCII"));
while(fileScanner.hasNextLine())
{
String next = fileScanner.nextLine();
next = next.replaceAll(".*\\x0C", ""); //remove up to ^L
out.write(next);
out.newLine();
}
out.flush();
out.close();
Maybe the issue isn't even with the BufferedWriter?
I've narrowed it down to this code block because if I comment it out, there are no null-characters in the output file. If I do a regex replace in VIM the file is null-character free (:%s/.*^L//g).
Let me know if you need more information.
Thanks!
EDIT:
hexdump of a normal line looks like:
0000000 5349 2a41 3030 202a
But when this code is run the hexdump looks like:
0000000 5330 2a49 4130 202a
I'm not sure why things are getting mixed up.
EDIT:
Also, even if the file doesn't match the regex and runs through that block of code, it comes out with null characters.
EDIT:
Here's a hexdump of the first few lines of a diff:
http://pastie.org/pastes/8964701/text
command was: diff -y testfile.hexdump expectedoutput.hexdump
The rest of the lines are different like the last two.
EDIT: Looking at the hexdump diff you gave, the only difference is that one has LF line endings (0A) and the other has CRLF line endings (0D 0A). All the other data in your diff is shifted ahead to accomodate the extra byte.
The CRLF is the default line ending on the OS you're using. If you want a specific line ending in your output, write the string "\n" or "\r\n".
Previously I noted that the Scanner doesn't specify a charset. It should specify the appropriate one that the input is known to be encoded in. However, this isn't the source of the unexpected output.
Scanner.nextLine() is eating the existing line endings.
The javadoc for nextLine states:
This method returns the rest of the current line, excluding any line separator at the end.
The javadoc for BufferedWriter.newLine explains:
Writes a line separator. The line separator string is defined by the system property line.separator, and is not necessarily a single newline ('\n') character.
In your case your system's default newline seperator is "\n". The EDI file you are parsing uses "\r\n".
Using the system defined newLine seperator isn't the appropriate thing to do in this case. The newline separator to use is dictated by the file format and should be put in a format specific static constant somewhere.
Change "out.newLine();" to "out.write("\r\n");"
I think what is going on is the following
All lines that contain ^L (ff) get modified to remove everything before the ^L but in addition you have the side effect in 1 that all \r (cr) also get removed. However, if cr appears before ^L nextLine() is treating that as a line too. Note how, in the output file below, the number of cr + nl is 6 in the input file and the number of cr + nl is also 6 but they're all nl, so the line with c gets preserved because it's being treated on a different line than ^L. Probably not what you want. See below.
Some observations
The source file is being generated on a system that uses \r\n to define a new line, and your program is being run on a system that does not. Because of this all occurrences of 0xd are going to be removed. This will make the two files different sizes even if there are no ^L.
But you probably overlooked #1 because vim will operate in DOS mode (recognize \r\n as a newline separator) or non-DOS mode (only \n) depending on what it reads when it opens the file and hides the fact from the user if it can. In fact to test I had to brute force in \r using ^v^m because I was editing on Linux using vim more here.
Your means to test is probably using od -x (for hex right)? But that outputs ints which is not what you want. Consider the following input file and output file. After your program runs. As viewed in vi
Input file
a
b^M
c^M^M ^L
d^L
Output file
a
b
c
Well maybe that's right, lets see what od has to say
od -x of input File
0a61 0d62 630a 0d0d 0c20 640a 0a0c
od -x of output File
0a61 0a62 0a63 0a0a 000a
Huh, what where did that null come from? But wait from the man page of od
-t type Specify the output format. type is a string containing one or more of the following kinds of type specifiers:
q a Named characters (ASCII). Control characters are displayed using the following names:
-h, -x Output hexadecimal shorts. Equivalent to -t x2.
-a Output named characters. Equivalent to -t a.
Oh, ok so instead use the -a option
od -a of input
a nl b cr nl c cr cr sp ff nl d ff nl
od -a of output
a nl b nl c nl nl nl nl
Forcing java to ignore \r
And finally, all that being said, you really have to overcome the implicit understanding of java that \r delimits a line, even contrary to the documentation. Even when explicitly setting the scanner to use a \r ignoring pattern, it still operates contrary to the documentation and you must override that again by setting the delimiter (see below). I've found the following will probably do what you want by insisting on Unix line semantics. I also added in some logic to not output a blank line.
public static void repl(File original,File file) throws IOException
{
Scanner fileScanner = new Scanner(original);
Pattern pattern1 = Pattern.compile("(?d).*");
fileScanner.useDelimiter("(?d)\\n");
BufferedWriter out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(file), "UTF8"));
while(fileScanner.hasNext(pattern1))
{
String next = fileScanner.next(pattern1);
next = next.replaceAll("(?d)(.*\\x0C)|(\\x0D)","");
if(next.length() != 0)
{
out.write(next);
out.newLine();
}
}
out.flush();
out.close();
}
With this change, the output above changes to.
od -a of input
a nl b cr nl c cr cr sp ff nl d ff nl
od -a of output
a nl b nl
Stuart Caie provided the answer. if you are looking for an code to avoid these characters.
Basic issue is , Org file using different line separator and the new file using different line separator character.
One easy way, find the Org file Separator character and use the same in new file.
try(BufferedWriter out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(file)));
Scanner fileScanner = new Scanner(original);) {
String lineSep = null;
boolean lineSepFound = false;
while(fileScanner.hasNextLine())
{
if (!lineSepFound){
MatchResult matchResult = fileScanner.match();
if (matchResult != null){
lineSep = matchResult.group(1);
if (lineSep != null){
lineSepFound = true;
}
}
}else{
out.write(lineSep);
}
String next = fileScanner.nextLine();
next = next.replaceAll(".*\\x0C", ""); //remove up to ^L
out.write(next);
}
} catch ( IOException e) {
e.printStackTrace();
}
Note ** MatchResult matchResult = fileScanner.match(); would provide the matchResult for the last Match performed. And in our case we have used hasNextLine() - Scanner used linePattern to find the next line .. Scanner.hasNextLine Source code finding the line Separator ,
but unfortunately no way to get the line separator back. So i have used thier code to get the lineSep only once. and used that lineSep for creating new file.
Also per your code , you would be having extra line separator at the end of file. Corrected here.
Let me know if that works.

Scanner's nextLine(), Only fetching partial

So, using something like:
for (int i = 0; i < files.length; i++) {
if (!files[i].isDirectory() && files[i].canRead()) {
try {
Scanner scan = new Scanner(files[i]);
System.out.println("Generating Categories for " + files[i].toPath());
while (scan.hasNextLine()) {
count++;
String line = scan.nextLine();
System.out.println(" ->" + line);
line = line.split("\t", 2)[1];
System.out.println("!- " + line);
JsonParser parser = new JsonParser();
JsonObject object = parser.parse(line).getAsJsonObject();
Set<Entry<String, JsonElement>> entrySet = object.entrySet();
exploreSet(entrySet);
}
scan.close();
// System.out.println(keyset);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}
}
as one goes over a Hadoop output file, one of the JSON objects in the middle is breaking... because scan.nextLine() is not fetching the whole line before it brings it to split. ie, the output is:
->0 {"Flags":"0","transactions":{"totalTransactionAmount":"0","totalQuantitySold":"0"},"listingStatus":"NULL","conditionRollupId":"0","photoDisplayType":"0","title":"NULL","quantityAvailable":"0","viewItemCount":"0","visitCount":"0","itemCountryId":"0","itemAspects":{ ... "sellerSiteId":"0","siteId":"0","pictureUrl":"http://somewhere.com/45/x/AlphaNumeric/$(KGrHqR,!rgF!6n5wJSTBQO-G4k(Ww~~
!- {"Flags":"0","transactions":{"totalTransactionAmount":"0","totalQuantitySold":"0"},"listingStatus":"NULL","conditionRollupId":"0","photoDisplayType":"0","title":"NULL","quantityAvailable":"0","viewItemCount":"0","visitCount":"0","itemCountryId":"0","itemAspects":{ ... "sellerSiteId":"0","siteId":"0","pictureUrl":"http://somewhere.com/45/x/AlphaNumeric/$(KGrHqR,!rgF!6n5wJSTBQO-G4k(Ww~~
Most of the above data has been sanitized (not the URL (for the most part) however... )
and the URL continues as:
$(KGrHqZHJCgFBsO4dC3MBQdC2)Y4Tg~~60_1.JPG?set_id=8800005007
in the file....
So its slightly miffing.
This also is entry #112, and I have had other files parse without errors... but this one is screwing with my mind, mostly because I dont see how scan.nextLine() isnt working...
By debug output, the JSON error is caused by the string not being split properly.
And almost forgot, it also works JUST FINE if I attempt to put the offending line in its own file and parse just that.
EDIT:
Also blows up if I remove the offending line in about the same place.
Attempted with JVM 1.6 and 1.7
Workaround Solution:
BufferedReader scan = new BufferedReader(new FileReader(files[i]));
instead of scanner....
Based on your code, the best explanation I can come up with is that the line really does end after the "~~" according to the criteria used by Scanner.nextLine().
The criteria for an end-of-line are:
Something that matches this regex: "\r\n|[\n\r\u2028\u2029\u0085]" or
The end of the input stream
You say that the file continues after the "~~", so lets put EOF aside, and look at the regex. That will match any of the following:
The usual line separators:
<CR>
<NL>
<CR><NL>
... and three unusual forms of line separator that Scanner also recognizes.
0x0085 is the <NEL> or "next line" control code in the "ISO C1 Control" group
0x2028 is the Unicode "line separator" character
0x2029 is the Unicode "paragraph separator" character
My theory is that you've got one of the "unusual" forms in your input file, and this is not showing up in .... whatever tool it is that you are using to examine the files.
I suggest that you examine the input file using a tool that can show you the actual bytes of the file; e.g. the od utility on a Linux / Unix system. Also, check that this isn't caused by some kind of character encoding mismatch ... or trying to read or write binary data as text.
If these don't help, then the next step should be to run your application using your IDE's Java debugger, and single-step it through the Scanner.hasNextLine() and nextLine() calls to find out what the code is actually doing.
And almost forgot, it also works JUST FINE if I attempt to put the offending line in its own file and parse just that.
That's interesting. But if the tool you are using to extract the line is the same one that is not showing the (hypothesized) unusual line separator, then this evidence is not reliable. The process of extraction may be altering the "stuff" that is causing the problems.

Categories

Resources