I'm trying to write a simple output to a file but I'm getting the wrong output. This is my code:
Map<Integer,List<Client>> hashMapClients = new HashMap<>();
hashMapClients = clients.stream().collect(Collectors.groupingBy(Client::getDay));
Map<Integer,List<Transaction>> hasMapTransactions = new HashMap<>();
hasMapTransactions = transactions.stream().collect(Collectors.groupingBy(Transaction::getDay));
//DAYS
String data;
for (Integer key: hashMapClients.keySet()) {
data = key + " | ";
for (int i = 0; i <hashMapClients.get(key).size();i++) {
data += hashMapClients.get(key).get(i).getType() + " | " + hashMapClients.get(key).get(i).getAmountOfClients() + ", ";
writer.println(data);
}
}
I get this output
1 | Individual | 0,
1 | Individual | 0, Corporation | 0,
2 | Individual | 0,
2 | Individual | 0, Corporation | 0,
But it should be, also it should not end with , if it's the last one.
1 | Individual | 0, Corporation | 0
2 | Individual | 0, Corporation
| 0
What am I doing wrong?
It sounds like you only want to write data to the output in the outer loop, not the inner loop. The latter of which is just for building the data value to write. Something like this:
String data;
for (Integer key: hashMapClients.keySet()) {
// initialize the value
data = key + " | ";
// build the value
for (int i = 0; i <hashMapClients.get(key).size();i++) {
data += hashMapClients.get(key).get(i).getType() + " | " + hashMapClients.get(key).get(i).getAmountOfClients() + ", ";
}
// write the value
writer.println(data);
}
Edit: Thanks for pointing out that the last character also still needs to be removed. Without more error checking, that could be as simple as:
data = data.substring(0, data.length() - 1);
You can add error checking as your logic requires, perhaps confirming that the last character is indeed a comma or confirming that the inner loop executes at least once, etc.
One problem is that you are calling println after every Client, rather than waiting until the whole list is built. Then, to fix the problem with the trailing comma, you can use a joining collector.
Map<Integer,List<Client>> clientsByDay = clients.stream()
.collect(Collectors.groupingBy(Client::getDay));
/* Iterate over key-value pairs */
for (Map.Entry<Integer, List<Client>> e : clientsByDay) {
/* Print the key */
writer.print(e.getKey());
/* Print a separator */
writer.print(" | ");
/* Print the value */
writer.println(e.getValue().stream()
/* Convert each Client to a String in the desired format */
.map(c -> c.getType() + " | " + c.getAmountOfClients())
/* Join the clients together in a comma-separated list */
.collect(Collectors.joining(", ")));
}
Related
I would like to extract some info from a .lnk file in Java, specifically the entire target (with command line parameters after the initial .exe) and the working directory as well.
In the question Windows shortcut (.lnk) parser in Java?
by user Zarkonnen we can find the WindowsShortcut library created by multiple community users. See Code Blings answer here.
However, as of now, this library provides only access to the file path itself, but not to command line arguments or working directory (or any other additional info that might be inside a shortcut file).
I tried to figure out a way to get the additional info using the WindowsShortcut library, but didn't succeed. The library only provides me with a getRealFilename() method:
public static void main(String[] args) throws Exception
{
WindowsShortcut windowsShortcut = new WindowsShortcut(new File("C:\test\test.lnk"));
System.out.println(windowsShortcut.getRealFilename());
}
Does anyone know of a way to do this?
Your question is really good. As of the date of your question the WindowsShortcut class you refer to only implements code to get the path to the file pointed to by a shortcut, but doesn't provide any further data inside a shortcut file. But it's open source, so let's extend it!
Let's do some research first
In the inofficial documentation by Jesse Hager we find this:
______________________________________________________________________________
| |
| **The flags** |
|______________________________________________________________________________|
| | | |
| Bit | Meaning when 1 | Meaning when 0 |
|_____|____________________________________|___________________________________|
| | | |
| 0 | The shell item id list is present. | The shell item id list is absent. |
| 1 | Points to a file or directory. | Points to something else. |
| 2 | Has a description string. | No description string. |
| 3 | Has a relative path string. | No relative path. |
| 4 | Has a working directory. | No working directory. |
| 5 | Has command line arguments. | No command line arguments. |
| 6 | Has a custom icon. | Has the default icon. |
|_____|____________________________________|___________________________________|
So we know that we can check the flags byte for the existence of these additional strings. And we already have access to the flags byte prepared in our WindowsShortcut class.
Now we only need to know where those strings are stored in the shortcut file. In the inofficial documentation we also find this structure:
File header
Shell item ID list
Item 1
Item 2
etc..
File locator info
Local path
Network path
Description string
Relative path string
Working directory string
Command line string
Icon filename string
Extra stuff
So the strings we are interested in come directly after the File locator info block. Which is neat, because the existing WindowsShortcut class already parses the File locator info to get the file path.
The docs also say that each string consists of a length given as unsigned short and then ASCII characters. However, at least under Windows10, I encountered UTF-16 strings and implemented my code accordingly.
Let's implement!
We can simply add a few more lines at the end of the parseLink method.
First we get the offset directly after the File locator info block and call it next_string_start, as it now points to the first additional string:
final int file_location_size = bytesToDword(link, file_start);
int next_string_start = file_start + file_location_size;
We then check the flags for each of the strings in order, and if it exists, we parse it:
final byte has_description = (byte)0b00000100;
final byte has_relative_path = (byte)0b00001000;
final byte has_working_directory = (byte)0b00010000;
final byte has_command_line_arguments = (byte)0b00100000;
// if description is present, parse it
if ((flags & has_description) > 0) {
final int string_len = bytesToWord(link, next_string_start) * 2; // times 2 because UTF-16
description = getUTF16String(link, next_string_start + 2, string_len);
next_string_start = next_string_start + string_len + 2;
}
// if relative path is present, parse it
if ((flags & has_relative_path) > 0) {
final int string_len = bytesToWord(link, next_string_start) * 2; // times 2 because UTF-16
relative_path = getUTF16String(link, next_string_start + 2, string_len);
next_string_start = next_string_start + string_len + 2;
}
// if working directory is present, parse it
if ((flags & has_working_directory) > 0) {
final int string_len = bytesToWord(link, next_string_start) * 2; // times 2 because UTF-16
working_directory = getUTF16String(link, next_string_start + 2, string_len);
next_string_start = next_string_start + string_len + 2;
}
// if command line arguments are present, parse them
if ((flags & has_command_line_arguments) > 0) {
final int string_len = bytesToWord(link, next_string_start) * 2; // times 2 because UTF-16
command_line_arguments = getUTF16String(link, next_string_start + 2, string_len);
next_string_start = next_string_start + string_len + 2;
}
The getUTF16String method is simply:
private static String getUTF16String(final byte[] bytes, final int off, final int len) {
return new String(bytes, off, len, StandardCharsets.UTF_16LE);
}
And finally we need members and getters for those new Strings:
private String description;
private String relative_path;
private String working_directory;
private String command_line_arguments;
public String getDescription() {
return description;
}
public String getRelativePath() {
return relative_path;
}
public String getWorkingDirectory() {
return working_directory;
}
public String getCommandLineArguments() {
return command_line_arguments;
}
I tested this under Windows 10 and it worked like a charm.
I made a pull request with my changes on the original repo, until then you can also find the complete code here.
I came across a problem with regex parsing columns in ASCII tables.
Imagine an ASCII table like:
COL1 | COL2 | COL3
======================
ONE | APPLE | PIE
----------------------
TWO | APPLE | PIES
----------------------
THREE | PLUM- | PIES
| APRICOT |
For the first 2 entries a trivial capture regex does the deal
(?:(?<COL1>\w+)\s*\|\s*(?<COL2>\w+)\s*\|\s*(?<COL3>\w+)\s*)
However this regex captures the header, as well as it doesn't capture the 3rd line.
I can't solve following two problems :
How to exclude the header?
How to extend the COL2 capture group to capture the multiline entry PLUM-APRICOT?
Thanks for your help!
Some people, when confronted with a problem, think
“I know, I'll use regular expressions.” Now they have two problems. (http://regex.info/blog/2006-09-15/247)
I've assumed an input string like:
String input = ""
+ "\n" + "COL1 | COL2 | COL3"
+ "\n" + "======================"
+ "\n" + "ONE | APPLE | PIE "
+ "\n" + "----------------------"
+ "\n" + "TWO | APPLE | PIES"
+ "\n" + "----------------------"
+ "\n" + "THREE | PLUM- | PIES"
+ "\n" + " | APRICOT | ";
To split the header and the table you can use input.split("={2,}"). This returns an array of strings of the header and the table.
After trimming the table you can use table.split("-{2,}") to get the rows of the table.
All rows can be converted to arrays of cells by using row.split("\\|").
Dealing with multiline rows: Before converting the rows to cells, you can call row.split("\n") to split multiline rows.
When this split operations returns an array with more than one element, they should be split on pipes (split("\\|")) and the resulting cells should be merged.
From here its just element manipulation to get it into the format you desire.
I have a string array of student info:
StudentNumber Integer, Subject String, mmarks integer
What would be the best way to use the java & collection to find out the topper in each subject.
ArrayList<String> strings = new ArrayList<String>();
strings.add("1 | Computers | 48");
strings.add("2 | Data Structures | 89");
strings.add("33 | English | 35");
strings.add("24 | Maths | 70");
strings.add("15 | Computers | 58");
strings.add("6 | Data Structures | 55");
strings.add("7 | English | 40");
strings.add("18 | Maths | 73");
for (String str : strings) {
String [] strArray = str.split("\\|");
// in completed code
for (int i = 0; i < str.length(); i++) {
sam.put(strArray[0], strArray[2]);
s.add(strArray[1]);
}
}
Expected output
15 Computers
2 Data structures
7 English
18 Maths
Create a Result class to store the information:
class Result {
private int studentNumber;
private String subject;
private int mark;
// constructor, getters and setters go here ...
}
Now convert your List<String> to a List<Result>:
List<Result> results = new ArrayList<>();
for (String s : strings){
String[] sa = s.split(" \\| ");
results.add(new Result(Integer.parseInt(sa[0]), sa[1], Integer.parseInt(sa[2])));
}
Create a stream from the results list, group by subject, and find the student with the highest mark:
Map<String, Integer> map = results.stream()
.collect(Collectors.groupingBy(Result::getSubject,
Collectors.collectingAndThen(Collectors.maxBy(Comparator.comparing(Result::getMark)), r -> r.get().getStudentNumber())));
Print the result:
map.forEach((k,v) -> System.out.println(v + " " + k));
15 Computers
18 Maths
7 English
2 Data Structures
I have the following part of string:
{{Infobox musical artist
|honorific-prefix = [[The Honourable]]
| name = Bob Marley
| image = Bob-Marley.jpg
| alt = Black and white image of Bob Marley on stage with a guitar
| caption = Bob Marley in concert, 1980.
| background = solo_singer
| birth_name = Robert Nesta Marley
| alias = Tuff Gong
| birth_date = {{birth date|df=yes|1945|2|6}}
| birth_place = [[Nine Mile, Jamaica|Nine Mile]], [[Jamaica]]
| death_date = {{death date and age|df=yes|1981|5|11|1945|2|6}}
| death_place = [[Miami]], [[Florida]]
| instrument = Vocals, guitar, percussion
| genre = [[Reggae]], [[ska]], [[rocksteady]]
| occupation = [[Singer-songwriter]], [[musician]], [[guitarist]]
| years_active = 1962–1981
| label = [[Beverley's]], [[Studio One (record label)|Studio One]],
| associated_acts = [[Bob Marley and the Wailers]]
| website = {{URL|bobmarley.com}}
}}
And I'd like to remove all of it. Now if I try the regex: \{\{(.*?)\}\} it catches {{birth date|df=yes|1945|2|6}}, which makes sense so I tried : \{\{([^\}]*?)\}\} which thens grabs from the start but ends in the same line, which also makes sense as it has encoutered }}, i've also tried without the ? greedy ,still same results. my question is, how can I remove everything that's inside a {{}}, no matter how many of the same chars are inside?
Edit: If you want my entire input, it's this:
https://en.wikipedia.org/w/index.php?maxlag=5&title=Bob+Marley&action=raw
Here's a solution with a DOTALL Pattern and a greedy quantifier for an input that contains only one instance of the fragment you wish to remove (i.e. replace with an empty String):
String input = "Foo {{Infobox musical artist\n"
+ "|honorific-prefix = [[The Honourable]]\n"
+ "| name = Bob Marley\n"
+ "| image = Bob-Marley.jpg\n"
+ "| alt = Black and white image of Bob Marley on stage with a guitar\n"
+ "| caption = Bob Marley in concert, 1980.\n"
+ "| background = solo_singer\n"
+ "| birth_name = Robert Nesta Marley\n"
+ "| alias = Tuff Gong\n"
+ "| birth_date = {{birth date|df=yes|1945|2|6}}\n"
+ "| birth_place = [[Nine Mile, Jamaica|Nine Mile]], [[Jamaica]]\n"
+ "| death_date = {{death date and age|df=yes|1981|5|11|1945|2|6}}\n"
+ "| death_place = [[Miami]], [[Florida]]\n"
+ "| instrument = Vocals, guitar, percussion\n"
+ "| genre = [[Reggae]], [[ska]], [[rocksteady]]\n"
+ "| occupation = [[Singer-songwriter]], [[musician]], [[guitarist]] \n"
+ "| years_active = 1962–1981\n"
+ "| label = [[Beverley's]], [[Studio One (record label)|Studio One]],\n"
+ "| associated_acts = [[Bob Marley and the Wailers]]\n"
+ "| website = {{URL|bobmarley.com}}\n" + "}} Bar";
// |DOTALL flag
// | |first two curly brackets
// | | |multi-line dot
// | | | |last two curly brackets
// | | | | | replace with empty
System.out.println(input.replaceAll("(?s)\\{\\{.+\\}\\}", ""));
Output
Foo Bar
Notes after comments
This case implies using regular expressions to manipulate markup language.
Regular expressions are not made to parse hierarchical markup entities, and would not serve in this case so this answer is only a stub for what would be an ugly workaround at best in this case.
See here for a famous SO thread on parsing markup with regex.
Use a greedy quantifier instead of the reluctant one you're using.
http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
Edit: spoonfeeding: "\{\{.*\}\}"
Try this pattern, it should take care of everything:
"\\D\\{\\{I.+[\\P{M}\\p{M}*+].+\\}\\}\\D"
specify: DOTALL
code:
String result = searchText.replaceAll("\\D\\{\\{I.+[\\P{M}\\p{M}*+].+\\}\\}\\D", "");
example: http://fiddle.re/5n4zg
This regex matches a single such block (only):
\{\{([^{}]*?\{\{.*?\}\})*.*?\}\}
See a live demo.
In java, to remove all such blocks:
str = str.replaceAll("(?s)\\{\\{([^{}]*?\\{\\{.*?\\}\\})*.*?\\}\\}", "");
I am working on a COBOL Parser using JavaCC. The COBOL file usually will have columns 1 to 6 as Line/Column numbers. If Line/Column numbers are not present it will have spaces.
I need to know how to handle comments and Sequence Area in a COBOL file and parse only Main Area.
I have tried many expressions but none is working. I created a special token that will check for new line and then six occurrences of spaces or any character except space and carriage return and after that seventh character will be "*" for comments and " " for normal lines.
I am using the Cobol.jj file available here http://java.net/downloads/javacc/contrib/grammars/cobol.jj
Can anyone suggest me what grammar should i use?
the sample of my grammar file:
PARSER_END(CblParser)
////////////////////////////////////////////////////////////////////////////////
// Lexical structure
////////////////////////////////////////////////////////////////////////////////
SPECIAL_TOKEN :
{
< EOL: "\n" > : LINE_START
| < SPACECHAR: ( " " | "\t" | "\f" | ";" | "\r" )+ >
}
SPECIAL_TOKEN :
{
< COMMENT: ( ~["\n","\r"," "] ~["\n","\r"," "] ~["\n","\r"," "] ~["\n","\r"," "] ~["\n","\r"," "] ~["\n","\r"," "] ) ( "*" | "|" ) (~["\n","\r"])* >
| < PREPROC_COMMENT: "*|" (~["\n","\r"])* >
| < SPACE_SEPARATOR : ( <SPACECHAR> | <EOL> )+ >
| < COMMA_SEPARATOR : "," <SPACE_SEPARATOR> >
}
<LINE_START> SKIP :
{
< ((~[])(~[])(~[])(~[])(~[])(~[])) (" ") >
}
Since the parser starts at the start of a line, you should use the DEFAULT state to represent the start of a line. I would do something like the following [untested code follows].
// At the start of each line, the first 6 characters are ignored and the 7th is used
// to determine whether this is a code line or a comment line.
// (Continuation lines are handled elsewhere.)
// If there are fewer than 7 characters on the line, it is ignored.
// Note that there will be a TokenManagerError if a line has at least 7 characters and
// the 7th character is other than a "*", a "/", or a space.
<DEFAULT> SKIP :
{
< (~[]){0,6} ("\n" | "\r" | "\r\n") > :DEFAULT
|
< (~[]){6} (" ") > :CODE
|
< (~[]){6} ("*"|"/") :COMMENT
}
<COMMENT> SKIP :
{ // At the end of a comment line, return to the DEFAULT state.
< "\n" | "\r" | "\r\n" > : DEFAULT
| // All non-end-of-line characters on a comment line are ignored.
< ~["\n","\r"] > : COMMENT
}
<CODE> SKIP :
{ // At the end of a code line, return to the DEFAULT state.
< "\n" | "\r" | "\r\n" > : DEFAULT
| // White space is skipped, as are semicolons.
< ( " " | "\t" | "\f" | ";" )+ >
}
<CODE> TOKEN :
{
< ACCEPT: "accept" >
|
... // all rules for tokens should be in the CODE state.
}