How can i parse the given string? - java

Hi guys i have been given a task to parse a string which will be coming from the server.
The string looks like:
<first name=$Jon$ last name=$Doe$/><first name=$Doe$ last name=$Jon$/><first name=$r$ last name=$k$/>
and the output needed is:
first name: Jon
last name: Doe
-------------------
first name: Doe
last name: Jon
-------------------
first name: r
last name: k
-------------------
i.e.,
key: value
I have done some simple text-parsing which included a simple delimiter like a $ or a %.
but in this case i don't understand how to parse the text. Your help will be very helpful.

Matcher keys = Pattern.compile("[<\\s)](.*?)[=]").matcher(string);
Matcher values = Pattern.compile("[$](.*?)[$]").matcher(string);
while(keys.find() && values.find()) {
System.out.println(keys.group(1)+" : "+values.group(1));
}

replace $ in xml string from server with ", load it as xml document, use XPath or some other mechanism to parse the information you need

There can be multiple ways to reach to the solution
Can use XSLT with Java. (Java provides apis like TransformerFactory, Transformer etc.)
Can use XSLT in IDE like eclipse. Several plugins available.
Can check this out www.vogella.com/articles/XSLT/article.html
can use unix script to do the same.
How to convert xml file in to a property file using unix shell script
It is not the exact solution to your problem but solutions you can try. Similarly there can be many other ways for sure.

Related

Regex to capture #Facilitator:"Full Name <mail#mail.domain>" tags

Please help with the regex, the language can be any. I'll later translate it to python.
I'm trying to build a regex to capture the tag below:
#Facilitator:"Full Name <mail#mail.domain>"
Full name can be with accents like José, Pâmela, or any available in the ASCII table.
Full name can have 1, 2 or n family names. Could have or not a '(comapny name)' at the end of the name: like #Facilitator:"Name1 Name2 Name3 (Company Inc) <mail#domain>"
The tag can appear 0, 1 or n times in strings.
The tag can appear in any place of the string.
So far trying like this (python) but no success:
import re
notes = 'Verbal confirmation #Facilitator:"Fernas P. Loyola (YARDA LTDA) <ope#yahoo.com>"from ATUX with Melanie. Waiting for scheduling#Facilitator:"Fernandes <v-rrlo#stttr.de>" #Facilitator:"Pablito Ferdinandes <papa#gmail.com>"'
facilitator_regex = '^.*((#Facilitator:".*"){1,}).*$'
regex_replace = '\\1'
print(re.sub(facilitator_regex, regex_replace, notes))
The output i expect is a list of 0, 1 or more #tags separated by a space.
Any help on any language? I need help mostly with the regex itself. thanks so much.
You can find all the facilitators using re.findall with this regex:
'#Facilitator:"[^"]*"'
e.g.
facilitator_regex = '#Facilitator:"[^"]*"'
facilitators = re.findall(facilitator_regex, notes)
For your sample data this gives
[
'#Facilitator:"Fernas P. Loyola (YARDA LTDA) <ope#yahoo.com>"',
'#Facilitator:"Fernandes <v-rrlo#stttr.de>"',
'#Facilitator:"Pablito Ferdinandes <papa#gmail.com>"'
]
You could then use str.join to make a space-separated list:
print(' '.join(facilitators))
Output:
#Facilitator:"Fernas P. Loyola (YARDA LTDA) <ope#yahoo.com>" #Facilitator:"Fernandes <v-rrlo#stttr.de>" #Facilitator:"Pablito Ferdinandes <papa#gmail.com>"

Java Regex FileName not repeated

Im trying to set a validation for not repeated file names, if some file is repeated it should be writen this way:
FILENAME.PDF
FILENAME_1.PDF
FILENAME_2.PDF
So I can make the first one, but all the rest how can be taken?.
I will get a list of all filenames documents before, so now im trying to make like a for each document but i dont know how to make regex for get it.
FILENAME_1.PDF -> REGEX() -> GET RETURN -> 1
UPDATE
If I have this files in my bbdd: [filename.pdf, filename_1.pdf, filename_2.pdf].
I need when someone upload some new, based on that files, my new name will be filename_3.pdf if exist file name 2.
FILENAME_VERSION.EXTENSION
Base on this i need get the version of the last filename. Thanks!
Note: There will be more filesnames differents, for example FILANEM_FILENAM_VERSION.pdf
Thanks you!
I would use the following regex :
_(\d+)\.[^.]+$
Or in Java :
_(\\d+)\\.[^.]+$
The regex captures a number between an underscore and the last dot of the filename.
The number you seek is captured in the first capturing group and needs to be extracted using Matcher.group.
You can try it here :
No version found in filename.pdf
No version found in filename1.pdf
filename_1.pdf - version found : 1
filename1_2.pdf - version found : 2
No version found in filename_1.test.pdf
filename_1.test_2.pdf - version found : 2
No version found in filename_1

How to replace a searched line with new contents in eclipse?

I am working on a big project which has thousand of Java Files. What i have to do is replace all the System.out.println("Argument") lines used in the code with log4j logging.I can find the line which uses System.out.println(...) by following regex System\.out.*;. is there a way to replace the println call with
LGR.info( LGR.isInfoEnabled() ? "Argument": null);
This is how it should look:
Before:
System.out.println("Argument")
After:
LGR.info( LGR.isInfoEnabled() ? "Argument": null);
You can use File search:
Check Case sensitive and Regular expression
Containing text: System\.out\.println\((.+)\);
File name patterns: *.java
Click Replace...
Check Regular expression
With: LGR.info( LGR.isInfoEnabled() ? \1 : null);
I think that this should work using search tool (CTRL+F) or File Search (CTRL+H)
search: (System\.out\.println\((.+)\));
replace: LGR.info( LGR.isInfoEnabled() ? $2: null);
If you right-click on a search result in the Search view, there is an option called Replace All .... A dialog opens that allows you to enter the replacement text. If a regular expression was used during the search, it also allows to use matcher groups from the regex in the replacement.
There is also an option Replace selected ... if you don't want to replace all occurences.

Special character while using Files.readAllLines

i'm trying to use Files.readAllLines to read a file and editing it.
List<String> l = Files.readAllLines(manejador.getArchivo().toPath(), StandardCharsets.UTF_8);
The file has a list of games and its players:
ID: Fm550.0
Federico Schmidt
Iván Petrini
Germán Gómez
Tomás Perotti
ID: VO101000.0
Alex Morgan
So then, i want to check every position in the list to see if it is equals to some ID.
The problem is that when i use Files.readAllLines, i get this:
?ID: Fm550.0
Federico Schmidt
Iván Petrini
Germán Gómez
Tomás Perotti
ID: VO101000.0
Alex Morgan
How can i get rid of that ? at the beggining?
While your question is slightly different, i.e., not a duplicate of Reading UTF-8 - BOM marker because it appears you may not have known the issue of BOM's in UTF-8, there is an answer for dealing with BOM in UTF-8 files in Java in the link above.

java regex matcher results != to notepad++ regex find result

I am trying to extract data out of a website access log as part of a java program. Every entry in the log has a url. I have successfully extracted the url out of each record.
Within the url, there is a parameter that I want to capture so that I can use it to query a database. Unfortunately, it doesn't seem that the web developers used any one standard to write the parameter's name.
The parameter is usually called "course_id", but I have also seen "courseId", "course%3DId", "course%253Did", etc. The format for the parameter name and value is usually course_id=_22222_1, where the number I want is between the "_" and "_1". (The value is always the same, even if the parameter name varies.)
So, my idea was to use the regex /^.*course_id[^_]*_(\d*)_1.*$/i to find and extract the number.
In java, my code is
java.util.regex.Pattern courseIDPattern = java.util.regex.Pattern.compile(".*course[^i]*id[^_]*_(\\d*)_1.*", java.util.regex.Pattern.CASE_INSENSITIVE);
java.util.regex.Matcher courseIDMatcher = courseIDPattern.matcher(_url);
_courseID = "";
if(courseIDMatcher.matches())
{
_courseID = retrieveCourseID(courseIDMatcher.group(1));
return;
}
This works for a lot of the records. However, some records do not record the course_id, even though the parameter is in the url. One such example is the record:
/webapps/contentDetail?course_id=_223629_1&content_id=_3641164_1&rich_content_level=RICH&language=en_US&v=1&ver=4.1.2
However, I used notepad++ to do a regex replace on this (in fact, every) url using the regex above, and the url was successfully replaced by the course ID, implying that the regex is not incorrect.
Am I doing something wrong in the java code, or is the java matcher broken?

Categories

Resources