Parsing CSV files using Regex in Java

Parsing CSV files using Regex in Java - java

I'm trying to create a program, which reads CSV files from a directory, using a regex it parses each line of the file and displays the lines after matching the regex pattern.
For instance if this is the first line of my csv file
1997,Ford,E350,"ac, abs, moon",3000.00
my output should be
1997 Ford E350 ac, abs, moon 3000.00
I don't want to use any existing CSV libraries. I'm not good at regex, I've used a regex I found on net but its not working in my program
This is my source code, I'll be grateful if any one tells me where and what I"ve to modify in order to make my code work. Pls explain me.
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.nio.CharBuffer;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.charset.Charset;
import java.nio.charset.CharsetDecoder;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class RegexParser {
private static Charset charset = Charset.forName("UTF-8");
private static CharsetDecoder decoder = charset.newDecoder();
String pattern = "\"([^\"]*)\"|(?<=,|^)([^,]*)(?=,|$)";
void regexparser( CharBuffer cb)
{
Pattern linePattern = Pattern.compile(".*\r?\n");
Pattern csvpat = Pattern.compile(pattern);
Matcher lm = linePattern.matcher(cb);
Matcher pm = null;
while(lm.find())
{
CharSequence cs = lm.group();
if (pm==null)
pm = csvpat.matcher(cs);
else
pm.reset(cs);
if(pm.find())
{
System.out.println( cs);
}
if (lm.end() == cb.limit())
break;
}
}
public static void main(String[] args) throws IOException {
RegexParser rp = new RegexParser();
String folder = "Desktop/sample";
File dir = new File(folder);
File[] files = dir.listFiles();
for( File entry: files)
{
FileInputStream fin = new FileInputStream(entry);
FileChannel channel = fin.getChannel();
int cs = (int) channel.size();
MappedByteBuffer mbb = channel.map(FileChannel.MapMode.READ_ONLY, 0, cs);
CharBuffer cb = decoder.decode(mbb);
rp.regexparser(cb);
fin.close();
}
}
}
This is my input file
Year,Make,Model,Description,Price
1997,Ford,E350,"ac, abs, moon",3000.00
1999,Chevy,"Venture ""Extended Edition""","",4900.00
1999,Chevy,"Venture ""Extended Edition, Very Large""","",5000.00
1996,Jeep,Grand Cherokee,"MUST SELL!
air, moon roof, loaded",4799.00
I'm getting the same as output where is the problem in my code? why doesn't my regex have any impact on the code?

Using regexp seems "fancy", but with CSV files (at least in my opinion) is not worth it. For my parsing I use http://commons.apache.org/csv/. It has never let me down. :)

Anyway I've found the fix myself, thanks guys for your suggestion and help.
This was my initial code
if(pm.find()
System.out.println( cs);
Now changed this to
while(pm.find()
{
CharSequence css = pm.group();
//print css
}
Also I used a different Regex. I'm getting the desired output now.

You can try this : [ \t]*+"[^"\r\n]*+"[ \t]*+|[^,\r\n]*+ with this code :
try {
Pattern regex = Pattern.compile("[ \t]*+\"[^\"\r\n]*+\"[ \t]*+|[^,\r\n]*+", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE | Pattern.MULTILINE);
Matcher matcher = regex.matcher(subjectString);
while (matcher.find()) {
// Do actions
}
} catch (PatternSyntaxException ex) {
// Take care of errors
}
But yeah, if it's not a very critical demand do try to use something that already working : )

Take the advice offered and do not use regular expressions to parse a CSV file. The format is deceptively complicated in the way it can be used.
The following answer contains links to wikipedia and the RFC describing the CSV file format:
field size limitation of csv file

Related

Using Google Translate Java Library, Languages with special chars return question marks

I have setup a Java program that I made for my apprenticeship project that takes in a JSON file of English strings and outputs a different language JSON file that is defined in the console. Some languages like french and Italian will output with the correct translations whereas Russian or Japanese will output with question marks as seen in the images bellow.
I had searched around at saw that I needed to get the bytes of my string and then encode that to UTF-8 I did do this but was still getting question marks so I started to use he standard charsets built into Java and tried different ways of encoding/decoding the string I tried this:
and this gave me a different output of this : Ð?Ñ?Ð¸Ð²ÐµÑ?
package com.bis.propertyfiletranslator;
import java.io.IOException;
import java.nio.charset.Charset;
import java.nio.charset.StandardCharsets;
import java.security.GeneralSecurityException;
import java.util.List;
import com.google.api.client.googleapis.javanet.GoogleNetHttpTransport;
import com.google.api.client.googleapis.json.GoogleJsonResponseException;
import com.google.api.client.json.jackson2.JacksonFactory;
import com.google.api.services.translate.Translate;
import com.google.api.services.translate.model.TranslationsListResponse;
import com.google.api.services.translate.model.TranslationsResource;
public class Translator {
public static Translate.Translations.List list;
private static final Charset UTF_8 = Charset.forName("UTF-8");
private static final Charset ISO = Charset.forName("ISO-8859-1");
public static void translateJSONMapThroughGoogle(String input, String output, String API, String language,
List<String> subLists) throws IOException, GeneralSecurityException {
Translate t = new Translate.Builder(GoogleNetHttpTransport.newTrustedTransport(),
JacksonFactory.getDefaultInstance(), null).setApplicationName("PhoenUX-Google-Translate").build();
try {
list = t.new Translations().list(subLists, language).setFormat("text");
list.setKey(API);
} catch (GoogleJsonResponseException e) {
if (e.getDetails().getMessage().equals("Invalid Value")) {
System.err.println(
"\n Language not currently supported, check the accepted language codes and try again.\n\n Language Requested: "
+ language);
} else {
System.out.println(e.getDetails().getMessage());
}
}
for (TranslationsResource translationsResource : response.getTranslations()) {
for (String key : JSONFunctions.jsonHashMap.keySet()) {
JSONFunctions.jsonHashMap.remove(key);
String value = translationsResource.getTranslatedText();
String encoded = new String(value.getBytes(StandardCharsets.UTF_8), StandardCharsets.ISO_8859_1);
JSONFunctions.jsonHashMap.put(key, encoded);
System.out.println(encoded);
break;
}
}
JSONFunctions.outputTranslationsBackToJson(output);
}
}
So this is using the google cloud library, I added a sysout so I could see the results of what I had tried, so this code should be all you need to replicate it.
I expect the output of "Hello" to be "Привет"(russian) actual output is ???? or Ð?Ñ?Ð¸Ð²ÐµÑ? dependent on the encoding I use.

String encoded = new String(...) is dead wrong. Just
put(key, value):
Note that System.out.println will always have problems as the OS encoding might be some Windows ANSI encoding. Then it is likely non Unicode-capable - and String contains Unicode.

A simple stemming algorithm with String for input

I've been looking at word stemming algorithms such as the porter algorithm, but everything I've found so far has dealt with files as input.
Are there any existing algorithms which would let me simply pass the stemmer a string, and have it return the stemmed string?
Something like:
String toBeStemmed = "The man worked tirelessly";
Stemmer s = new Stemmer();
String stemmed = s.stem(toBeStemmed);

The algorithms themselves don't take files. The code probably takes the file and reads it in as a series of Strings, which are fed to the algorithm. You just need to look at the part of the code that reads the Strings in from the file, and pass the Strings in yourself in a similar way.

In your example, toBeStemmed is a sentence, that you want to tokenize first. Then you would stem the individual tokens/words, like 'worked' or 'tirelessly'.
Here's fine morphological analyzer I use as a stemmer in some of my projects.
stemmer JAR: https://code.google.com/p/hunglish-webapp/source/browse/trunk/#trunk%2Flib%2Fnet%2Fsf%2Fjhunlang%2Fjmorph%2F1.0
stemmer source: https://code.google.com/p/j-morph/source/checkout
language resource files: https://code.google.com/p/hunglish-webapp/source/browse/trunk/#trunk%2Fsrc%2Fmain%2Fresources%2Fresources-lang%2Fjmorph
How I use it with Lucene: https://code.google.com/p/hunglish-webapp/source/browse/trunk/src/main/java/hu/mokk/hunglish/jmorph/
properties file: https://code.google.com/p/hunglish-webapp/source/browse/trunk/src/main/resources/META-INF/spring/stemmer.properties
Example usage:
import net.sf.jhunlang.jmorph.lemma.Lemma;
import net.sf.jhunlang.jmorph.lemma.Lemmatizer;
import net.sf.jhunlang.jmorph.analysis.Analyser;
import net.sf.jhunlang.jmorph.analysis.AnalyserContext;
import net.sf.jhunlang.jmorph.analysis.AnalyserControl;
import net.sf.jhunlang.jmorph.factory.Definition;
import net.sf.jhunlang.jmorph.factory.JMorphFactory;
import net.sf.jhunlang.jmorph.parser.ParseException;
import net.sf.jhunlang.jmorph.sample.AnalyserConfig;
import net.sf.jhunlang.jmorph.sword.parser.EnglishAffixReader;
import net.sf.jhunlang.jmorph.sword.parser.EnglishReader;
import net.sf.jhunlang.jmorph.sword.parser.SwordAffixReader;
import net.sf.jhunlang.jmorph.sword.parser.SwordReader;
AnalyserConfig acEn = new AnalyserConfig();
//TODO: set path to the English affix file
String enAff = "src/main/resources/resources-lang/jmorph/en.aff";
Definition affixDef = acEn.createDefinition(enAff, "utf-8", EnglishAffixReader.class);
//TODO set path to the English dict file
String enDic = "src/main/resources/resources-lang/jmorph/en.dic";
Definition dicDef = acEn.createDefinition(enDic, "utf-8", EnglishReader.class);
int enRecursionDepth = 3;
acEn.setRecursionDepth(affixDef, enRecursionDepth);
JMorphFactory jf = new JMorphFactory();
Analyser enAnalyser = jf.build(new Definition[] { affixDef, dicDef });
AnalyserControl acEn = new AnalyserControl(AnalyserControl.ALL_COMPOUNDS);
AnalyserContext analyserContextEn = new AnalyserContext(acEn);
boolean enStripDerivates = true;
Lemmatizer enLemmatizer = new net.sf.jhunlang.jmorph.lemma.LemmatizerImpl(enAnalyser, enStripDerivates, analyserContextEn);
//After somewhat complex initializing, here we go
List<Lemma> lemmas = enLemmatizer.lemmatize("worked");
for (Lemma lemma : lemmas) {
System.out.println(lemma.getWord());
}

Using boilerpipe to extract non-english articles

I am trying to use boilerpipe java library, to extract news articles from a set of websites.
It works great for texts in english, but for text with special characters, for example, words with accent marks (história), this special characters are not extracted correctly. I think it is an encoding problem.
In the boilerpipe faq, it says "If you extract non-English text you might need to change some parameters" and then refers to a paper. I found no solution in this paper.
My question is, are there any params when using boilerpipe where i can specify the encoding? Is there any way to go around and get the text correctly?
How i'm using the library:
(first attempt based on the URL):
URL url = new URL(link);
String article = ArticleExtractor.INSTANCE.getText(url);
(second on the HTLM source code)
String article = ArticleExtractor.INSTANCE.getText(html_page_as_string);

You don't have to modify inner Boilerpipe classes.
Just pass InputSource object to the ArticleExtractor.INSTANCE.getText() method and force encoding on that object. For example:
URL url = new URL("http://some-page-with-utf8-encodeing.tld");
InputSource is = new InputSource();
is.setEncoding("UTF-8");
is.setByteStream(url.openStream());
String text = ArticleExtractor.INSTANCE.getText(is);
Regards!

Well, from what I see, when you use it like that, the library will auto-chose what encoding to use. From the HTMLFetcher source:
public static HTMLDocument fetch(final URL url) throws IOException {
final URLConnection conn = url.openConnection();
final String ct = conn.getContentType();
Charset cs = Charset.forName("Cp1252");
if (ct != null) {
Matcher m = PAT_CHARSET.matcher(ct);
if(m.find()) {
final String charset = m.group(1);
try {
cs = Charset.forName(charset);
} catch (UnsupportedCharsetException e) {
// keep default
}
}
}
Try debugging their code a bit, starting with ArticleExtractor.getText(URL), and see if you can override the encoding

Ok, got a solution.
As Andrei said, i had to change the class HTMLFecther, which is in the package de.l3s.boilerpipe.sax
What i did was to convert all the text that was fetched, to UTF-8.
At the end of the fetch function, i had to add two lines, and change the last one:
final byte[] data = bos.toByteArray(); //stays the same
byte[] utf8 = new String(data, cs.displayName()).getBytes("UTF-8"); //new one (convertion)
cs = Charset.forName("UTF-8"); //set the charset to UFT-8
return new HTMLDocument(utf8, cs); // edited line

Boilerpipe's ArticleExtractor uses some algorithms that have been specifically tailored to English - measuring number of words in average phrases, etc. In any language that is more or less verbose than English (ie: every other language) these algorithms will be less accurate.
Additionally, the library uses some English phrases to try and find the end of the article (comments, post a comment, have your say, etc) which will clearly not work in other languages.
This is not to say that the library will outright fail - just be aware that some modification is likely needed for good results in non-English languages.

Java:
import java.net.URL;
import org.xml.sax.InputSource;
import de.l3s.boilerpipe.extractors.ArticleExtractor;
public class Boilerpipe {
public static void main(String[] args) {
try{
URL url = new URL("http://www.azeri.ru/az/traditions/kuraj_pehlevanov/");
InputSource is = new InputSource();
is.setEncoding("UTF-8");
is.setByteStream(url.openStream());
String text = ArticleExtractor.INSTANCE.getText(is);
System.out.println(text);
}catch(Exception e){
e.printStackTrace();
}
}
}
Eclipse:
Run > Run Configurations > Common Tab. Set Encoding to Other(UTF-8), then click Run.

I had the some problem; the cnr solution works great. Just change UTF-8 encoding to ISO-8859-1. Thank's
URL url = new URL("http://some-page-with-utf8-encodeing.tld");
InputSource is = new InputSource();
is.setEncoding("ISO-8859-1");
is.setByteStream(url.openStream());
String text = ArticleExtractor.INSTANCE.getText(is);

Extracting anchor tag from html using Java

I have several anchor tags in a text,
Input: <a href="http://stackoverflow.com" >Take me to StackOverflow</a>
Output:
http://stackoverflow.com
How can I find all those input strings and convert it to the output string in java, without using a 3rd party API ???

There are classes in the core API that you can use to get all href attributes from anchor tags (if present!):
import java.io.*;
import java.util.*;
import javax.swing.text.*;
import javax.swing.text.html.*;
import javax.swing.text.html.parser.*;
public class HtmlParseDemo {
public static void main(String [] args) throws Exception {
String html =
"<a href=\"http://stackoverflow.com\" >Take me to StackOverflow</a> " +
"<!-- " +
"<a href=\"http://ignoreme.com\" >...</a> " +
"--> " +
"<a href=\"http://www.google.com\" >Take me to Google</a> " +
"<a>NOOOoooo!</a> ";
Reader reader = new StringReader(html);
HTMLEditorKit.Parser parser = new ParserDelegator();
final List<String> links = new ArrayList<String>();
parser.parse(reader, new HTMLEditorKit.ParserCallback(){
public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos) {
if(t == HTML.Tag.A) {
Object link = a.getAttribute(HTML.Attribute.HREF);
if(link != null) {
links.add(String.valueOf(link));
}
}
}
}, true);
reader.close();
System.out.println(links);
}
}
which will print:
[http://stackoverflow.com, http://www.google.com]

public static void main(String[] args) {
String test = "qazwsxTake me to StackOverflowfdgfdhgfd"
+ "Take me to StackOverflow2dcgdf";
String regex = "<a href=(\"[^\"]*\")[^<]*</a>";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(test);
System.out.println(m.replaceAll("$1"));
}
NOTE: All Andrzej Doyle's points are valid and if you have more then simple Y in your input, and you are sure that is parsable HTML, then you are better with HTML parser.
To summarize:
The regex i posted doesn't work if you have <a> in comment. (you can treat it as special case)
It doesn't work if you have other attributes in the <a> tag. (again you can treat it as special case)
there are many other cases that regex wont work, and you can not cover all of them with regex, since HTML is not regular language.
However, if your req is always replace Y with "X" without considering the context, then the code i've posted will work.

You can use JSoup
String html = "<p>An <a href=\"http://stackoverflow.com\" >Take me to StackOverflow</a> link.</p>";
Document doc = Jsoup.parse(html);
Element link = doc.select("a").first();
String linkHref = link.attr("href"); // "http://stackoverflow.com"
Also See
Example

The above example works perfect; if you want to parse an HTML document say instead of concatenated strings, write something like this to compliment the code above.
Existing code above ~ modified to show: HtmlParser.java (HtmlParseDemo.java) above
complementing code with HtmlPage.java below. The content of the HtmlPage.properties file is at the bottom of this page.
The main.url property in the HtmlPage.properties file is:
main.url=http://www.whatever.com/
That way you can just parse the url that your after. :-)
Happy coding :-D
import java.io.Reader;
import java.io.StringReader;
import java.util.ArrayList;
import java.util.List;
import javax.swing.text.MutableAttributeSet;
import javax.swing.text.html.HTML;
import javax.swing.text.html.HTMLEditorKit;
import javax.swing.text.html.parser.ParserDelegator;
public class HtmlParser
{
public static void main(String[] args) throws Exception
{
String html = HtmlPage.getPage();
Reader reader = new StringReader(html);
HTMLEditorKit.Parser parser = new ParserDelegator();
final List<String> links = new ArrayList<String>();
parser.parse(reader, new HTMLEditorKit.ParserCallback()
{
public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos)
{
if (t == HTML.Tag.A)
{
Object link = a.getAttribute(HTML.Attribute.HREF);
if (link != null)
{
links.add(String.valueOf(link));
}
}
}
}, true);
reader.close();
// create the header
System.out.println("<html>\n<head>\n <title>Link City</title>\n</head>\n<body>");
// spit out the links and create href
for (String l : links)
{
System.out.print(" " + l + "\n");
}
// create footer
System.out.println("</body>\n</html>");
}
}
import java.io.BufferedReader;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.StringWriter;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.ResourceBundle;
public class HtmlPage
{
public static String getPage()
{
StringWriter sw = new StringWriter();
ResourceBundle bundle = ResourceBundle.getBundle(HtmlPage.class.getName().toString());
try
{
URL url = new URL(bundle.getString("main.url"));
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setRequestMethod("GET");
connection.setDoOutput(true);
InputStream content = (InputStream) connection.getInputStream();
BufferedReader in = new BufferedReader(new InputStreamReader(content));
String line;
while ((line = in.readLine()) != null)
{
sw.append(line).append("\n");
}
} catch (Exception e)
{
e.printStackTrace();
}
return sw.getBuffer().toString();
}
}
For example, this will output links from http://ebay.com.au/ if viewed in a browser.
This is a subset, as there are a lot of links
Link City
#mainContent
http://realestate.ebay.com.au/

The most robust way (as has been suggested already) is to use regular expressions (java.util.regexp), if you are required to build this without using 3d party libs.
The alternative is to parse the html as XML, either using a SAX parser to capture and handle each instance of an "a" element or as a DOM Document and then searching it using XPATH (see http://download.oracle.com/javase/6/docs/api/javax/xml/parsers/package-summary.html). This is problematic though, since it requires the HTML page to be fully XML compliant in markup, a very dangerous assumption and not an approach I would recommend since most "real" html pages are not XML compliant.
Still, I would recommend also looking at existing frameworks out there built for this purpose (like JSoup, also mentioned above). No need to reinvent the wheel.

'Un'-externalize strings from Eclipse or Intellij

I have a bunch of strings in a properties file which i want to 'un-externalize', ie inline into my code.
I see that both Eclipse and Intellij have great support to 'externalize' strings from within code, however do any of them support inlining strings from a properties file back into code?
For example if I have code like -
My.java
System.out.println(myResourceBundle.getString("key"));
My.properties
key=a whole bunch of text
I want my java code to be replaced as -
My.java
System.out.println("a whole bunch of text");

I wrote a simple java program that you can use to do this.
Dexternalize.java
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileWriter;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.Map.Entry;
import java.util.Properties;
import java.util.Set;
import java.util.Stack;
import java.util.logging.Level;
import java.util.logging.Logger;
public class Deexternalize {
public static final Logger logger = Logger.getLogger(Deexternalize.class.toString());
public static void main(String[] args) throws IOException {
if(args.length != 2) {
System.out.println("Deexternalize props_file java_file_to_create");
return;
}
Properties defaultProps = new Properties();
FileInputStream in = new FileInputStream(args[0]);
defaultProps.load(in);
in.close();
File javaFile = new File(args[1]);
List<String> data = process(defaultProps,javaFile);
buildFile(javaFile,data);
}
public static List<String> process(Properties propsFile, File javaFile) {
List<String> data = new ArrayList<String>();
Set<Entry<Object,Object>> setOfProps = propsFile.entrySet();
int indexOf = javaFile.getName().indexOf(".");
String javaClassName = javaFile.getName().substring(0,indexOf);
data.add("public class " + javaClassName + " {\n");
StringBuilder sb = null;
// for some reason it's adding them in reverse order so putting htem on a stack
Stack<String> aStack = new Stack<String>();
for(Entry<Object,Object> anEntry : setOfProps) {
sb = new StringBuilder("\tpublic static final String ");
sb.append(anEntry.getKey().toString());
sb.append(" = \"");
sb.append(anEntry.getValue().toString());
sb.append("\";\n");
aStack.push(sb.toString());
}
while(!aStack.empty()) {
data.add(aStack.pop());
}
if(sb != null) {
data.add("}");
}
return data;
}
public static final void buildFile(File fileToBuild, List<String> lines) {
BufferedWriter theWriter = null;
try {
// Check to make sure if the file exists already.
if(!fileToBuild.exists()) {
fileToBuild.createNewFile();
}
theWriter = new BufferedWriter(new FileWriter(fileToBuild));
// Write the lines to the file.
for(String theLine : lines) {
// DO NOT ADD windows carriage return.
if(theLine.endsWith("\r\n")){
theWriter.write(theLine.substring(0, theLine.length()-2));
theWriter.write("\n");
} else if(theLine.endsWith("\n")) {
// This case is UNIX format already since we checked for
// the carriage return already.
theWriter.write(theLine);
} else {
theWriter.write(theLine);
theWriter.write("\n");
}
}
} catch(IOException ex) {
logger.log(Level.SEVERE, null, ex);
} finally {
try {
theWriter.close();
} catch(IOException ex) {
logger.log(Level.SEVERE, null, ex);
}
}
}
}
Basically, all you need to do is call this java program with the location of the property file and the name of the java file you want to create that will contain the properties.
For instance this property file:
test.properties
TEST_1=test test test
TEST_2=test 2456
TEST_3=123456
will become:
java_test.java
public class java_test {
public static final String TEST_1 = "test test test";
public static final String TEST_2 = "test 2456";
public static final String TEST_3 = "123456";
}
Hope this is what you need!
EDIT:
I understand what you requested now. You can use my code to do what you want if you sprinkle a bit of regex magic. Lets say you have the java_test file from above. Copy the inlined properties into the file you want to replace the myResourceBundle code with.
For example,
TestFile.java
public class TestFile {
public static final String TEST_1 = "test test test";
public static final String TEST_2 = "test 2456";
public static final String TEST_3 = "123456";
public static void regexTest() {
System.out.println(myResourceBundle.getString("TEST_1"));
System.out.println(myResourceBundle.getString("TEST_1"));
System.out.println(myResourceBundle.getString("TEST_3"));
}
}
Ok, now if you are using eclipse (any modern IDE should be able to do this) go to the Edit Menu -> Find/Replace. In the window, you should see a "Regular Expressions" checkbox, check that. Now input the following text into the Find text area:
myResourceBundle\.getString\(\"(.+)\"\)
And the back reference
\1
into the replace.
Now click "Replace all" and voila! The code should have been inlined to your needs.
Now TestFile.java will become:
TestFile.java
public class TestFile {
public static final String TEST_1 = "test test test";
public static final String TEST_2 = "test 2456";
public static final String TEST_3 = "123456";
public static void regexTest() {
System.out.println(TEST_1);
System.out.println(TEST_1);
System.out.println(TEST_3);
}
}

You may use Eclipse "Externalize Strings" widget. It can also be used for un-externalization. Select required string(s) and press "Internalize" button. If the string was externalized before, it'll be put back and removed from messages.properties file.

May be if you can explain on how you need to do this, then you could get the correct answer.
The Short answer to your question is no, especially in Intellij (I do not know enough about eclipse). Of course the slightly longer but still not very useful answer is to write a plugin. ( That will take a list of property files and read the key and values in a map and then does a regular expression replace of ResourceBundle.getValue("Key") with the value from Map (for the key). I will write this plugin myself, if you can convince me that, there are more people like you, who have this requirement.)
The more elaborate answer is this.
1_ First I will re-factor all the code that performs property file reading to a single class (or module called PropertyFileReader).
2_ I will create a property file reader module, that iterates across all the keys in property file(s) and then stores those information in a map.
4_ I can either create a static map objects with the populated values or create a constant class out of it. Then I will replace the logic in the property file reader module to use a get on the map or static class rather than the property file reading.
5_ Once I am sure that the application performs ok.(By checking if all my Unit Testing passes), then I will remove my property files.
Note: If you are using spring, then there is a easy way to split out all property key-value pairs from a list of property files. Let me know if you use spring.

I would recommend something else: split externalized strings into localizable and non-localizable properties files. It would be probably easier to move some strings to another file than moving it back to source code (which will hurt maintainability by the way).
Of course you can write simple (to some extent) Perl (or whatever) script which will search for calls to resource bundles and introduce constant in this place...
In other words, I haven't heard about de-externalizing mechanism, you need to do it by hand (or write some automated script yourself).

An awesome oneliner from #potong sed 's|^\([^=]*\)=\(.*\)|s#Messages.getString("\1")#"\2"#g|;s/\\/\\\\/g' messages.properties |
sed -i -f - *.java run this inside your src dir, and see the magic.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Parsing CSV files using Regex in Java - java

Using regexp seems "fancy", but with CSV files (at least in my opinion) is not worth it. For my parsing I use http://commons.apache.org/csv/. It has never let me down. :)

Anyway I've found the fix myself, thanks guys for your suggestion and help. This was my initial code if(pm.find() System.out.println( cs); Now changed this to while(pm.find() { CharSequence css = pm.group(); //print css } Also I used a different Regex. I'm getting the desired output now.

Take the advice offered and do not use regular expressions to parse a CSV file. The format is deceptively complicated in the way it can be used. The following answer contains links to wikipedia and the RFC describing the CSV file format: field size limitation of csv file

Related

Using Google Translate Java Library, Languages with special chars return question marks

A simple stemming algorithm with String for input

Using boilerpipe to extract non-english articles

Extracting anchor tag from html using Java

'Un'-externalize strings from Eclipse or Intellij

Categories

Resources