Java Regex FileName not repeated - java

Im trying to set a validation for not repeated file names, if some file is repeated it should be writen this way:
FILENAME.PDF
FILENAME_1.PDF
FILENAME_2.PDF
So I can make the first one, but all the rest how can be taken?.
I will get a list of all filenames documents before, so now im trying to make like a for each document but i dont know how to make regex for get it.
FILENAME_1.PDF -> REGEX() -> GET RETURN -> 1
UPDATE
If I have this files in my bbdd: [filename.pdf, filename_1.pdf, filename_2.pdf].
I need when someone upload some new, based on that files, my new name will be filename_3.pdf if exist file name 2.
FILENAME_VERSION.EXTENSION
Base on this i need get the version of the last filename. Thanks!
Note: There will be more filesnames differents, for example FILANEM_FILENAM_VERSION.pdf
Thanks you!

I would use the following regex :
_(\d+)\.[^.]+$
Or in Java :
_(\\d+)\\.[^.]+$
The regex captures a number between an underscore and the last dot of the filename.
The number you seek is captured in the first capturing group and needs to be extracted using Matcher.group.
You can try it here :
No version found in filename.pdf
No version found in filename1.pdf
filename_1.pdf - version found : 1
filename1_2.pdf - version found : 2
No version found in filename_1.test.pdf
filename_1.test_2.pdf - version found : 2
No version found in filename_1

Related

Java - xgboost DMatrix input

When creating a DMatrix in java with the xgboost4j package, at first i succeed to create the matrix using a "filepath".
DMatrix trainMat = new DMatrix("...\\xgb_training_input.csv");
But when I try to train the model:
Booster booster = XGBoost.train(trainMat, params, round, watches, null, null);
I get the following error:
...regression_obj.cc:108: label must be in [0,1] for logistic regression
now my data is solid. I've checked it out on an xgb model built in python.
I'm guessing the problem is with the data format somehow.
currently the format is as follows:
x1,x2,x3,x4,x5,y
where x1-x5 are "Real" numbers and y is either 0 or 1. file end is .csv
Maybe the separator shouldn't be ',' ?
DMatrix gets an .libsvm file. which can be easily created with python.
libsvm looks like this:
target 0:column1 1:column2 2:column3 ... and so on
so the target is the first column, while every other column (predictor) is being attached to increasing index with ":" in between.

How to replace a searched line with new contents in eclipse?

I am working on a big project which has thousand of Java Files. What i have to do is replace all the System.out.println("Argument") lines used in the code with log4j logging.I can find the line which uses System.out.println(...) by following regex System\.out.*;. is there a way to replace the println call with
LGR.info( LGR.isInfoEnabled() ? "Argument": null);
This is how it should look:
Before:
System.out.println("Argument")
After:
LGR.info( LGR.isInfoEnabled() ? "Argument": null);
You can use File search:
Check Case sensitive and Regular expression
Containing text: System\.out\.println\((.+)\);
File name patterns: *.java
Click Replace...
Check Regular expression
With: LGR.info( LGR.isInfoEnabled() ? \1 : null);
I think that this should work using search tool (CTRL+F) or File Search (CTRL+H)
search: (System\.out\.println\((.+)\));
replace: LGR.info( LGR.isInfoEnabled() ? $2: null);
If you right-click on a search result in the Search view, there is an option called Replace All .... A dialog opens that allows you to enter the replacement text. If a regular expression was used during the search, it also allows to use matcher groups from the regex in the replacement.
There is also an option Replace selected ... if you don't want to replace all occurences.

Special character while using Files.readAllLines

i'm trying to use Files.readAllLines to read a file and editing it.
List<String> l = Files.readAllLines(manejador.getArchivo().toPath(), StandardCharsets.UTF_8);
The file has a list of games and its players:
ID: Fm550.0
Federico Schmidt
Iván Petrini
Germán Gómez
Tomás Perotti
ID: VO101000.0
Alex Morgan
So then, i want to check every position in the list to see if it is equals to some ID.
The problem is that when i use Files.readAllLines, i get this:
?ID: Fm550.0
Federico Schmidt
Iván Petrini
Germán Gómez
Tomás Perotti
ID: VO101000.0
Alex Morgan
How can i get rid of that ? at the beggining?
While your question is slightly different, i.e., not a duplicate of Reading UTF-8 - BOM marker because it appears you may not have known the issue of BOM's in UTF-8, there is an answer for dealing with BOM in UTF-8 files in Java in the link above.

Convert OpenIE triplet to N-Triplet (NT)

I downloaded and used OpenIE4.1 jar file (downloadable from http://knowitall.github.io/openie/) to process some free text documents and produced triplet-like outputs along with the text and confidence score, for instance,
The rail launchers are conceptually similar to the underslung SM-1
0.93 (The rail launchers; are; conceptually similar to the underslung SM-1)
I wrote a java parser to extract OpenIE triplets which confidence score is >= 0.85 and
need to know the way to convert it to N-triplet (NT), format look like.
Not sure if I need to be familiar with the ontology that I'm trying to map to.
After discussion with my colleagues. This is what I should do to create N-Triplet(NT) and Detailed Java codes can be found in another Question: Use RDF API (Jena, OpenRDF or Protege) to convert OpenIE outputs
Create a blank node identifier for each distinct :subject in the file (call it node_s)
Create a blank node identifier for each distinct :object in the file (call it node_o)
Define a URI for each distinct predicate
Create these triples:
1. node_s rdf:type <http://mypage.org/vocab#Corpus>
2. node_s dc:title “The rail launchers”
3. node_s dc:source “Sample File”
4. node_s rdf:predicate <http://mypage.org/vocab#are>
5. node_o rdf:type <http://mypage.org/vocab#Corpus>
6. node_o dc:title “conceptually similar to the underslung SM-1”

How can i parse the given string?

Hi guys i have been given a task to parse a string which will be coming from the server.
The string looks like:
<first name=$Jon$ last name=$Doe$/><first name=$Doe$ last name=$Jon$/><first name=$r$ last name=$k$/>
and the output needed is:
first name: Jon
last name: Doe
-------------------
first name: Doe
last name: Jon
-------------------
first name: r
last name: k
-------------------
i.e.,
key: value
I have done some simple text-parsing which included a simple delimiter like a $ or a %.
but in this case i don't understand how to parse the text. Your help will be very helpful.
Matcher keys = Pattern.compile("[<\\s)](.*?)[=]").matcher(string);
Matcher values = Pattern.compile("[$](.*?)[$]").matcher(string);
while(keys.find() && values.find()) {
System.out.println(keys.group(1)+" : "+values.group(1));
}
replace $ in xml string from server with ", load it as xml document, use XPath or some other mechanism to parse the information you need
There can be multiple ways to reach to the solution
Can use XSLT with Java. (Java provides apis like TransformerFactory, Transformer etc.)
Can use XSLT in IDE like eclipse. Several plugins available.
Can check this out www.vogella.com/articles/XSLT/article.html
can use unix script to do the same.
How to convert xml file in to a property file using unix shell script
It is not the exact solution to your problem but solutions you can try. Similarly there can be many other ways for sure.

Categories

Resources