Special character while using Files.readAllLines

Special character while using Files.readAllLines - java

i'm trying to use Files.readAllLines to read a file and editing it.
List<String> l = Files.readAllLines(manejador.getArchivo().toPath(), StandardCharsets.UTF_8);
The file has a list of games and its players:
ID: Fm550.0
Federico Schmidt
Iván Petrini
Germán Gómez
Tomás Perotti
ID: VO101000.0
Alex Morgan
So then, i want to check every position in the list to see if it is equals to some ID.
The problem is that when i use Files.readAllLines, i get this:
?ID: Fm550.0
Federico Schmidt
Iván Petrini
Germán Gómez
Tomás Perotti
ID: VO101000.0
Alex Morgan
How can i get rid of that ? at the beggining?

While your question is slightly different, i.e., not a duplicate of Reading UTF-8 - BOM marker because it appears you may not have known the issue of BOM's in UTF-8, there is an answer for dealing with BOM in UTF-8 files in Java in the link above.

Related

Regex to capture #Facilitator:"Full Name <mail#mail.domain>" tags

Please help with the regex, the language can be any. I'll later translate it to python.
I'm trying to build a regex to capture the tag below:
#Facilitator:"Full Name <mail#mail.domain>"
Full name can be with accents like José, Pâmela, or any available in the ASCII table.
Full name can have 1, 2 or n family names. Could have or not a '(comapny name)' at the end of the name: like #Facilitator:"Name1 Name2 Name3 (Company Inc) <mail#domain>"
The tag can appear 0, 1 or n times in strings.
The tag can appear in any place of the string.
So far trying like this (python) but no success:
import re
notes = 'Verbal confirmation #Facilitator:"Fernas P. Loyola (YARDA LTDA) <ope#yahoo.com>"from ATUX with Melanie. Waiting for scheduling#Facilitator:"Fernandes <v-rrlo#stttr.de>" #Facilitator:"Pablito Ferdinandes <papa#gmail.com>"'
facilitator_regex = '^.*((#Facilitator:".*"){1,}).*$'
regex_replace = '\\1'
print(re.sub(facilitator_regex, regex_replace, notes))
The output i expect is a list of 0, 1 or more #tags separated by a space.
Any help on any language? I need help mostly with the regex itself. thanks so much.

You can find all the facilitators using re.findall with this regex:
'#Facilitator:"[^"]*"'
e.g.
facilitator_regex = '#Facilitator:"[^"]*"'
facilitators = re.findall(facilitator_regex, notes)
For your sample data this gives
[
'#Facilitator:"Fernas P. Loyola (YARDA LTDA) <ope#yahoo.com>"',
'#Facilitator:"Fernandes <v-rrlo#stttr.de>"',
'#Facilitator:"Pablito Ferdinandes <papa#gmail.com>"'
]
You could then use str.join to make a space-separated list:
print(' '.join(facilitators))
Output:
#Facilitator:"Fernas P. Loyola (YARDA LTDA) <ope#yahoo.com>" #Facilitator:"Fernandes <v-rrlo#stttr.de>" #Facilitator:"Pablito Ferdinandes <papa#gmail.com>"

Two Java strings same value differerent Eclipse ids don't HashMap right

Here is the code (class names are from Knime):
HashMap<String,DataColumnSpec> rcols = new HashMap<String, DataColumnSpec>();
rightSpec.forEach(rs -> { rcols.put(rs.getName(), rs); });
DataColumnSpec[] jcols = leftSpec.stream()
.filter(s -> rcols.containsKey(s.getName()))
.toArray(DataColumnSpec[]::new);
The result is empty, but it should not be! There really is one matching column!
Here is the debugger screenshot:
Note P# in the first instance with id=14978 and the second id=666.
What is going on here? What do I do to fix it?

The answer, sad to admit, was a non-printing character in one of the strings. The source of the data is the FileReader node on Knime, and it has a bug handling UTF-8-BOM data files. It injects a NUL character into the first string it reads, which is invisible in the debugger but throws off all the comparisons.
Full credit to #Ole V.V. It just didn't occur to me. Lesson learned!

How can I write a _FillValue parameter in a NetCDF CHAR Variable using Java?

I am trying to create a NetCDF file using java (unidata library). One of the requirements is to include the _FillValue attribute in all the Variables. I have one of type CHAR, and I can not do it.
The Attribute constructor only accepts Strings or numbers (or arrays of them), not chars. I have tried both of them anyway but the final netcdf does not show the attribute.
Other languages let you do it (we have seen this working in matlab), but I don't know how to do it using java.
I see in the documentation that the _FillValue should be of the same type of the Variable itself but Attribute values does not accept Chars, only String or Numbers
For example: When I try
Nc4Chunking chunker = Nc4ChunkingStrategy.factory(Nc4Chunking.Strategy.standard, 6, true);
NetcdfFileWriter dataFile = NetcdfFileWriter.createNew(NetcdfFileWriter.Version.netcdf4_classic, fileName, chunker);
....
Variable varid_scdr = dataFile.addVariable(null, "SCDR", DataType.CHAR, dimsTMS15);
varid_scdr.addAttribute(new Attribute("_FillValue", " "));
....
dataFile.write(varid_scdr, scodData);
dataFile.close();
The resulting netcdf file has no _FillValue, it is not written in the file.
But if I change the attribute name and do this
varid_scdr.addAttribute(new Attribute("FillValue", " "));
the parameter is present in the output file
I have no problems with other data types or other attribute names. I am prety sure that the problem is about the attribute _FillValue for the variable of type Char. I dont know how to write it and I need the _FillValue attribute to be explicity present in the variable attribute list.
********* 5th July 2019 ***********
I realized that the problem is only related to netcdf4 and netcdf4_classic files. So perhaps is about chunking or something like that. If I try it creating netcdf3 files it workis.
Any help about this issue? what am I missing?

I think this is due to bug that has been addressed in the latest version of netcdf-java (v5.0.0). v5.0.0 has been released and is available for download; my hope is that the announcement will go out today.
If you want to be explicit about writing a CHAR valued attribute, one way to to it would be:
String fillValue = " ";
Array charArrayFillValue = ArrayChar.makeFromString(fillValue, 1);
charAttrFillValue = new Attribute("_FillValue", charArrayFillValue);
varid_scdr.addAttribute(charAttrFillValue)
another way would be:
String fillValue = " ";
Array charArrayFillValue = ArrayChar.makeFromString(fillValue, 1);
charAttrFillValue = new Attribute("_FillValue", DataType.CHAR);
charAttrFillValue.setValues(charArrayFillValue);
varid_scdr.addAttribute(charAttrFillValue)
Both of those are a bit verbose, though. I just checked using version 5, and your one liner works:
varid_scdr.addAttribute(new Attribute("_FillValue", " "));
However, if you try to pass in a value for _FillValue that isn't a string of length 1, the netCDF-C library will throw an error. So this:
varid_scdr.addAttribute(new Attribute("_FillValue", "ab"));
will result in:
-36 (NetCDF: Invalid argument) on attribute ':_FillValue = "ab"' on var varid_scdr
netCDF-Java will make sure the string you pass in gets converted to CHARs, but it won't truncate the resulting set of CHARs to fit into the single character limit on the _FillValue attribute.

Java Regex FileName not repeated

Im trying to set a validation for not repeated file names, if some file is repeated it should be writen this way:
FILENAME.PDF
FILENAME_1.PDF
FILENAME_2.PDF
So I can make the first one, but all the rest how can be taken?.
I will get a list of all filenames documents before, so now im trying to make like a for each document but i dont know how to make regex for get it.
FILENAME_1.PDF -> REGEX() -> GET RETURN -> 1
UPDATE
If I have this files in my bbdd: [filename.pdf, filename_1.pdf, filename_2.pdf].
I need when someone upload some new, based on that files, my new name will be filename_3.pdf if exist file name 2.
FILENAME_VERSION.EXTENSION
Base on this i need get the version of the last filename. Thanks!
Note: There will be more filesnames differents, for example FILANEM_FILENAM_VERSION.pdf
Thanks you!

I would use the following regex :
_(\d+)\.[^.]+$
Or in Java :
_(\\d+)\\.[^.]+$
The regex captures a number between an underscore and the last dot of the filename.
The number you seek is captured in the first capturing group and needs to be extracted using Matcher.group.
You can try it here :
No version found in filename.pdf
No version found in filename1.pdf
filename_1.pdf - version found : 1
filename1_2.pdf - version found : 2
No version found in filename_1.test.pdf
filename_1.test_2.pdf - version found : 2
No version found in filename_1

How can i parse the given string?

Hi guys i have been given a task to parse a string which will be coming from the server.
The string looks like:
<first name=$Jon$ last name=$Doe$/><first name=$Doe$ last name=$Jon$/><first name=$r$ last name=$k$/>
and the output needed is:
first name: Jon
last name: Doe
-------------------
first name: Doe
last name: Jon
-------------------
first name: r
last name: k
-------------------
i.e.,
key: value
I have done some simple text-parsing which included a simple delimiter like a $ or a %.
but in this case i don't understand how to parse the text. Your help will be very helpful.

Matcher keys = Pattern.compile("[<\\s)](.*?)[=]").matcher(string);
Matcher values = Pattern.compile("[$](.*?)[$]").matcher(string);
while(keys.find() && values.find()) {
System.out.println(keys.group(1)+" : "+values.group(1));
}

replace $ in xml string from server with ", load it as xml document, use XPath or some other mechanism to parse the information you need

There can be multiple ways to reach to the solution
Can use XSLT with Java. (Java provides apis like TransformerFactory, Transformer etc.)
Can use XSLT in IDE like eclipse. Several plugins available.
Can check this out www.vogella.com/articles/XSLT/article.html
can use unix script to do the same.
How to convert xml file in to a property file using unix shell script
It is not the exact solution to your problem but solutions you can try. Similarly there can be many other ways for sure.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Special character while using Files.readAllLines - java

While your question is slightly different, i.e., not a duplicate of Reading UTF-8 - BOM marker because it appears you may not have known the issue of BOM's in UTF-8, there is an answer for dealing with BOM in UTF-8 files in Java in the link above.

Related

Regex to capture #Facilitator:"Full Name <mail#mail.domain>" tags

Two Java strings same value differerent Eclipse ids don't HashMap right

How can I write a _FillValue parameter in a NetCDF CHAR Variable using Java?

Java Regex FileName not repeated

How can i parse the given string?

Categories

Resources