I am reading a CSV file of 4GB in java what I have to do is extract 100000 record from file and make a separate file but problem is when I am reading a line
line = br.readLine() and String[] record = line.split(cvsSplitBy); it adds one extra "" in every string like when I open a record array it look like
""abc"",""bcd"",""cef"",""dgh"",""elk"" it should be like "abc","bcd","cef","dgh","elk"
Kindly let me know why its adding extra commas against every string
Post your code so we can investigate. In the mean time you can remove those extra "" or do something like:
line.split("\"" + cvsSplitBy + "\"")
Post your code and I'll edit this reply.
Related
first time I am asking for help in here so If my thread format is wrong I am so sorry.My problem is I can't make new line in Java JDA(Java Discord API).
When I use:
eb.setDescription("For \n Example");
It works just fine but I am trying to do this.
First I am getting string from config file:
String Message = plugin.getConfig().getString("Discord." + "PrivateMessage")
It gets the String fine but when I use eb.setDescription(Message);
Message coming without lines. There are new line brackets "\n" in message but they are not making new lines.Message coming like:
"For \n Example" without new lines.
I'm assuming your actual string is in a file and you just read it out. When you put \n in a file that doesn't automatically convert to a newline when you read it. You can do string = string.replace("\\n", "\n") to convert it.
I'm trying to make a simple dictionary program based on server-client socket communication. I'm trying to save user word and meaning input as a JSON file (which is dictionary data to search later on) but when I do add query it ends up with having duplicated JSON objects
for example, if I add happy and then weather and hello, the result written in JSON file is
like below
{"hello":"greeting"}{"happy":"joy","hello":"greeting"}
{"happy":"joy","weather":"cold","hello":"greeting"}`
instead of getting
{"hello":"greeting"}{"happy":"joy"}{"weather":"cold"} like I wanted
how can I fix this problem?
my code for that function is
case "add":{
FileWriter dictionaryWriter = new FileWriter("dictionary.json",true);
//split command again into 2 part now using delimiter ","
String break2[] = msgBreak[1].split(",");
String word = break2[0];
String meaning = break2[1];
dictionary.put(word, meaning);
System.out.println("Writing... " + word+":"+meaning);
dictionaryWriter.write(dictionary.toString());
//flush remain byte
dictionaryWriter.flush();
//close writer
dictionaryWriter.close();
break;}
this function is in while(true) loop with other dictionary functions
I tried to remove the appending file part, but when I remove the (,true) part the duplication error stopped but whenever I get a new connection, new dictionary file is created instead of having all data saved.
If anyone can help me solve this problem, I would appreciate it a lot!
Thanks you in advance.
You can try to create a new dictionary every time instead of using the existing one
Map<String, String> dictionary = new HashMap<>();
dictionary.put(word, meaning);
...
I'm writing a program where I need to write data to csv. However the string what I'm writing to csv has some commas. So when I'm writing to it I'm getting values in different fields, which I don't want
Eg: my string will come like : 165328,1234582,21346
I'm getting output as each value in one field, but I want the string as it is in one field
while((s=lnr.readLine())!=null)
{
str=s.split(" ");
br.write(str[7]);
}
Please add the required thing to get the desired output.
Eg : the string I'm writing will look like this 165328,123482,123414...
Write the string which contains comma in double quotes and then write it into the csv file with File Writer or any writer you wish .
value="\"" +value + "\"";
fos.append(value);
final BufferedReader bufferedReader = new BufferedReader(new
InputStreamReader(file.getInputStream(entry)));
String line = "";
while ((line = bufferedReader.readLine()) != null) {
System.out.println("line" + line);
final String[] rows = line.split(",");
this is my csv file
" 9:42:43AM","Aug 20, 2015","RaceSummary","Page:1","Id","Race","Type","Rot.","District","PrideFor","ArtSeq","ReportSeq","Content","Type","Md","Bar Group","1","LINC ADAPTER SECTION 4","Content","N","A - ARLIN","1","1","1","Oscar James, Sr.","Content","0","<N.P.>"
i am trying to print the column which i mentioned in the csv.But i dont know why my out put is getting upto "Pride" as one line and "For" as another line like that it was repeating for the next two values ("ArtSeq","ReportSeq").Can any one suggest me where i went wrong.
Thanks.
As you can see in your input you have second value have commas "Aug 20, 2015" this leads to more numbers of splits than that you expect.
Example :
You would expect this " 9:42:43AM","Aug 20, 2015" to be 2 parts but it will be three
[0]" 9:42:43AM"
[1]"Aug 20
[2] 2015"
You can change you split to be
line.split("\",\"");
I believe that should solve your problem.
Based on the output you provided...
line" 9:42:43AM","Aug 20, 2015","Race Summary","Page: 1","Id","Race","Type","Rot.","District","Pride
lineFor","Art lineSeq","Report lineSeq","Content","Type","Md","Bar Group","1","LINC ADAPTER SECTION 4","Content","N","A - ARLIN","1","1","1","Oscar James, Sr.","Content","0","<N.P.>"
Considering it is different then your input, I'd guess there might be a special character or something on the input file (for example, a tab or line spaceing). This is causing your while loop to read the first line (up to the line break), and then read the next line. If you put both of these onto the same line in the file it will probably work better.
I should clarify as well, nothing in the code you posted would cause this behaviour, it is either somewhere else in your code or in the file itself.
So, using something like:
for (int i = 0; i < files.length; i++) {
if (!files[i].isDirectory() && files[i].canRead()) {
try {
Scanner scan = new Scanner(files[i]);
System.out.println("Generating Categories for " + files[i].toPath());
while (scan.hasNextLine()) {
count++;
String line = scan.nextLine();
System.out.println(" ->" + line);
line = line.split("\t", 2)[1];
System.out.println("!- " + line);
JsonParser parser = new JsonParser();
JsonObject object = parser.parse(line).getAsJsonObject();
Set<Entry<String, JsonElement>> entrySet = object.entrySet();
exploreSet(entrySet);
}
scan.close();
// System.out.println(keyset);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}
}
as one goes over a Hadoop output file, one of the JSON objects in the middle is breaking... because scan.nextLine() is not fetching the whole line before it brings it to split. ie, the output is:
->0 {"Flags":"0","transactions":{"totalTransactionAmount":"0","totalQuantitySold":"0"},"listingStatus":"NULL","conditionRollupId":"0","photoDisplayType":"0","title":"NULL","quantityAvailable":"0","viewItemCount":"0","visitCount":"0","itemCountryId":"0","itemAspects":{ ... "sellerSiteId":"0","siteId":"0","pictureUrl":"http://somewhere.com/45/x/AlphaNumeric/$(KGrHqR,!rgF!6n5wJSTBQO-G4k(Ww~~
!- {"Flags":"0","transactions":{"totalTransactionAmount":"0","totalQuantitySold":"0"},"listingStatus":"NULL","conditionRollupId":"0","photoDisplayType":"0","title":"NULL","quantityAvailable":"0","viewItemCount":"0","visitCount":"0","itemCountryId":"0","itemAspects":{ ... "sellerSiteId":"0","siteId":"0","pictureUrl":"http://somewhere.com/45/x/AlphaNumeric/$(KGrHqR,!rgF!6n5wJSTBQO-G4k(Ww~~
Most of the above data has been sanitized (not the URL (for the most part) however... )
and the URL continues as:
$(KGrHqZHJCgFBsO4dC3MBQdC2)Y4Tg~~60_1.JPG?set_id=8800005007
in the file....
So its slightly miffing.
This also is entry #112, and I have had other files parse without errors... but this one is screwing with my mind, mostly because I dont see how scan.nextLine() isnt working...
By debug output, the JSON error is caused by the string not being split properly.
And almost forgot, it also works JUST FINE if I attempt to put the offending line in its own file and parse just that.
EDIT:
Also blows up if I remove the offending line in about the same place.
Attempted with JVM 1.6 and 1.7
Workaround Solution:
BufferedReader scan = new BufferedReader(new FileReader(files[i]));
instead of scanner....
Based on your code, the best explanation I can come up with is that the line really does end after the "~~" according to the criteria used by Scanner.nextLine().
The criteria for an end-of-line are:
Something that matches this regex: "\r\n|[\n\r\u2028\u2029\u0085]" or
The end of the input stream
You say that the file continues after the "~~", so lets put EOF aside, and look at the regex. That will match any of the following:
The usual line separators:
<CR>
<NL>
<CR><NL>
... and three unusual forms of line separator that Scanner also recognizes.
0x0085 is the <NEL> or "next line" control code in the "ISO C1 Control" group
0x2028 is the Unicode "line separator" character
0x2029 is the Unicode "paragraph separator" character
My theory is that you've got one of the "unusual" forms in your input file, and this is not showing up in .... whatever tool it is that you are using to examine the files.
I suggest that you examine the input file using a tool that can show you the actual bytes of the file; e.g. the od utility on a Linux / Unix system. Also, check that this isn't caused by some kind of character encoding mismatch ... or trying to read or write binary data as text.
If these don't help, then the next step should be to run your application using your IDE's Java debugger, and single-step it through the Scanner.hasNextLine() and nextLine() calls to find out what the code is actually doing.
And almost forgot, it also works JUST FINE if I attempt to put the offending line in its own file and parse just that.
That's interesting. But if the tool you are using to extract the line is the same one that is not showing the (hypothesized) unusual line separator, then this evidence is not reliable. The process of extraction may be altering the "stuff" that is causing the problems.