Splitting a String without spaces - java

I have the following string which is generated by an external program (OpenVAS) and returned to my program successfully as a string.
<create_target_response id="b4c8de55-94d8-4e08-b20e-955f97a714f1" status_text="OK, resource created" status="201"></create_target_response>
I am trying to split the string to give me the "b4c8d....14f1" without the inverted commas. I have tried all sorts of escape methods and keep getting the else method "String does not contain a Target ID". I have tried removing the IF statement checking for the string, but continue to have the same issue. The goal is to get my id string into jTextField6. String Lob contains the full string as above.
if (Lob.contains("id=\"")){
// put the split here
String[] parts = Lob.split("id=\"");
String cut1 = parts[1];
String[] part2 = cut1.split("\"");
String TaskFinal = part2[0];
jTextField6.setText(TaskFinal);
}
else {
throw new IllegalArgumentException("String does not contain a Target ID");
}
} catch (IOException e) {
e.printStackTrace();
}
It seems I only need to escape the " and not the = (Java kicks up an error if i do)
Thanks in advance
EDIT: Code as it stands now using jSoup lib - The 'id' string won't display. Any ideas?
Thanks
private void jButton1ActionPerformed(java.awt.event.ActionEvent evt) {
// TODO add your handling code here:
String TargIP = jTextField1.getText(); // Get IP Address
String TargName = jTextField5.getText(); // Get Target Name
String Vag = "8d32ad99-ac84-4fdc-b196-2b379f861def";
String Lob = "";
final String dosCommand = "cmd /c omp -u admin -w admin --xml=\"<create_target><name>" + TargName + "</name><hosts>" + TargIP + "</hosts></create_target>\"";
3</comment><config id='daba56c8-73ec-11df-a475-002264764cea'/><target id='" + Vag + "'/></create_task>\"";
final String location = "C:\\";
try {
final Process process = Runtime.getRuntime().exec(
dosCommand + " " + location);
final InputStream in = process.getInputStream();
int ch;
while((ch = in.read()) != -1) {
System.out.print((char)ch);
Lob = String.valueOf((char)ch);
jTextArea2.append(Lob);
}
} catch (IOException e) {
e.printStackTrace();
}
String id = Jsoup.parse(Lob).getAllElements().attr("id");
System.out.println(id); // This doesn't output?
}

Split on the basis of ". You can get all the key values.
String str = "<create_target_response id=\"b4c8de55-94d8-4e08-b20e-955f97a714f1\" status_text=\"OK, resource created\" status=\"201\"></create_target_response>";
String[] tokens = str.split("\\\"");
System.out.println(tokens[1]);
System.out.println(tokens[5]);
output:
b4c8de55-94d8-4e08-b20e-955f97a714f1
201

This will get you your job id more easily:
int idStart = Lob.indexOf("id=")+("id=\"").length();
System.out.println(Lob.substring(idStart,Lob.indexOf("\"",idStart)));

Everyone's telling you to use an XML parser (and they're right) but noone's showing you how.
Here goes:
String lob = ...
Using Jsoup from http://jsoup.org, actually an HTML parser but also handles XML neatly:
String id = Jsoup.parse(lob).getAllElements().attr("id");
// b4c8de55-94d8-4e08-b20e-955f97a714f1
With built-in Java XML APIs, less concise but no addtional libraries:
Document dom = DocumentBuilderFactory.newInstance().newDocumentBuilder()
.parse(new InputSource(new StringReader(lob)));
String id = dom.getDocumentElement().getAttribute("id");
// b4c8de55-94d8-4e08-b20e-955f97a714f1

This is a lot simpler than you're making it, to my mind. First, split on space, then check if an = is present. If it is, split on the =, and finally remove the " from the second token.
The tricky bit is the spaces inside of the "". This will require some regular expressions, which you can work out from this question.
Example
String input; // Assume this contains the whole string.
String pattern; // Have fun working out the regex.
String[] values = input.split(pattern);
for(String value : values)
{
if(value.contains("=")) {
String[] pair = value.split("=");
String key = pair[0];
String value = pair[1].replaceAll("\"");
// Do something with the values.
}
}
Advantage of my approach
Is that provided the input follows the format of key="value" key="value", you can parse anything that comes through, rather than hard coding the name of the attributes.
And if this is XML..
Then use an XML parser. There is a good (awesome) answer that explains why you shouldn't be using Stringmanipulation to parse XML/HTML. Here is the answer.

You can use a regex to extract what is needed; what is more, it looks like the value of id is a UUID. Therefore:
private static final Pattern PATTERN
= Pattern.compile("\\bid=\"([^\"]+)\"");
// In code...
public String getId(final String input)
{
final Matcher m = PATTERN.matcher(input);
if (!m.find())
throw new IllegalArgumentException("String does not contain a Target ID");
final String uuid = m.group(1);
try {
UUID.fromString(uuid);
} catch (IllegalArgumentException ignored) {
throw new IllegalArgumentException("String does not contain a Target ID");
}
return uuid;
}

Related

How to determine the delimiter in CSV file

I have a scenario at which i have to parse CSV files from different sources, the parsing code is very simple and straightforward.
String csvFile = "/Users/csv/country.csv";
String line = "";
String cvsSplitBy = ",";
try (BufferedReader br = new BufferedReader(new FileReader(csvFile))) {
while ((line = br.readLine()) != null) {
// use comma as separator
String[] country = line.split(cvsSplitBy);
System.out.println("Country [code= " + country[4] + " , name=" + country[5] + "]");
}
} catch (IOException e) {
e.printStackTrace();
}
my problem come from the CSV delimiter character, i have many different formats, some time it is a , sometimes it is a ;
is there is any way to determine the delimiter character before parsing the file
univocity-parsers supports automatic detection of the delimiter (also line endings and quotes). Just use it instead of fighting with your code:
CsvParserSettings settings = new CsvParserSettings();
settings.detectFormatAutomatically();
CsvParser parser = new CsvParser(settings);
List<String[]> rows = parser.parseAll(new File("/path/to/your.csv"));
// if you want to see what it detected
CsvFormat format = parser.getDetectedFormat();
Disclaimer: I'm the author of this library and I made sure all sorts of corner cases are covered. It's open source and free (Apache 2.0 license)
Hope this helps.
Yes, but only if the delimiter characters are not allowed to exist as regular text
The most simple answer is to have a list with all the available delimiter characters and try to identify which character is being used. Even though, you have to place some limitations on the files or the person/people that created them. Look a the following two scenarios:
Case 1 - Contents of file.csv
test,test2,test3
Case 2 - Contents of file.csv
test1|test2,3|test4
If you have prior knowledge of the delimiter characters, then you would split the first string using , and the second one using |, getting the same result. But, if you try to identify the delimiter by parsing the file, both strings can be split using the , character, and you would end up with this:
Case 1 - Result of split using ,
test1
test2
test3
Case 2 - Result of split using ,
test1|test2
3|test4
By lacking the prior knowledge of which delimiter character is being used, you cannot create a "magical" algorithm that will parse every combination of text; even regular expressions or counting the number of appearance of a character will not save you.
Worst case
test1,2|test3,4|test5
By looking the text, one can tokenize it by using | as the delimiter. But the frequency of appearance of both , and | are the same. So, from an algorithm's perspective, both results are accurate:
Correct result
test1,2
test3,4
test5
Wrong result
test1
2|test3
4|test5
If you pose a set of guidelines or you can somehow control the generation of the CSV files, then you could just try to find the delimiter used with String.contains() method, employing the aforementioned list of characters. For example:
public class MyClass {
private List<String> delimiterList = new ArrayList<>(){{
add(",");
add(";");
add("\t");
// etc...
}};
private static String determineDelimiter(String text) {
for (String delimiter : delimiterList) {
if(text.contains(delimiter)) {
return delimiter;
}
}
return "";
}
public static void main(String[] args) {
String csvFile = "/Users/csv/country.csv";
String line = "";
String cvsSplitBy = ",";
String delimiter = "";
boolean firstLine = true;
try (BufferedReader br = new BufferedReader(new FileReader(csvFile))) {
while ((line = br.readLine()) != null) {
if(firstLine) {
delimiter = determineDelimiter(line);
if(delimiter.equalsIgnoreCase("")) {
System.out.println("Unsupported delimiter found: " + delimiter);
return;
}
firstLine = false;
}
// use comma as separator
String[] country = line.split(delimiter);
System.out.println("Country [code= " + country[4] + " , name=" + country[5] + "]");
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
Update
For a more optimized way, in determineDelimiter() method instead of the for-each loop, you can employ regular expressions.
If the delimiter can appear in a data column, then you are asking for the impossible. For example, consider this first line of a CSV file:
one,two:three
This could be either a comma-separated or a colon-separated file. You can't tell which type it is.
If you can guarantee that the first line has all its columns surrounded by quotes, for example if it's always this format:
"one","two","three"
then you may be able to use this logic (although it's not 100% bullet-proof):
if (line.contains("\",\""))
delimiter = ',';
else if (line.contains("\";\""))
delimiter = ';';
If you can't guarantee a restricted format like that, then it would be better to pass the delimiter character as a parameter.
Then you can read the file using a widely-known open-source CSV parser such as Apache Commons CSV.
While I agree with Lefteris008 that it is not possible to have the function that correctly determine all the cases, we can have a function that is both efficient and give mostly correct result in practice.
def head(filename: str, n: int):
try:
with open(filename) as f:
head_lines = [next(f).rstrip() for x in range(n)]
except StopIteration:
with open(filename) as f:
head_lines = f.read().splitlines()
return head_lines
def detect_delimiter(filename: str, n=2):
sample_lines = head(filename, n)
common_delimiters= [',',';','\t',' ','|',':']
for d in common_delimiters:
ref = sample_lines[0].count(d)
if ref > 0:
if all([ ref == sample_lines[i].count(d) for i in range(1,n)]):
return d
return ','
My efficient implementation is based on
Prior knowledge such as list of common delimiter you often work with ',;\t |:' , or even the likely hood of the delimiter to be used so that I often put the regular ',' on the top of the list
The frequency of the delimiter appear in each line of the text file are equal. This is to resolve the problem that if we read a single line and see the frequency to be equal (false detection as Lefteris008) or even the right delimiter to appear less frequent as the wrong one in the first line
The efficient implementation of a head function that read only first n lines from the file
As you increase the number of test sample n, the likely hood that you get a false answer reduce drastically. I often found n=2 to be adequate
Add a condition like this,
String [] country;
if(line.contains(",")
country = line.split(",");
else if(line.contains(";"))
country=line.split(";");
That depends....
If your datasets are always the same length and/or the separator NEVER occurs in your datacolumns, you could just read the first line of the file, look at it for the longed for separator, set it and then read the rest of the file using that separator.
Something like
String csvFile = "/Users/csv/country.csv";
String line = "";
String cvsSplitBy = ",";
try (BufferedReader br = new BufferedReader(new FileReader(csvFile))) {
while ((line = br.readLine()) != null) {
// use comma as separator
if (line.contains(",")) {
cvsSplitBy = ",";
} else if (line.contains(";")) {
cvsSplitBy = ";";
} else {
System.out.println("Wrong separator!");
}
String[] country = line.split(cvsSplitBy);
System.out.println("Country [code= " + country[4] + " , name=" + country[5] + "]");
}
} catch (IOException e) {
e.printStackTrace();
}
Greetz Kai

swapping column from a text file in Java

I am having some problems with this code. I need to write a code where it says SwapField to display columns from a text file and swaps column 2 to be column 1.
public static void main(String[] args) {
int lineNum = 0;
String delimiter = " ";
if (args.length != 3) {
System.out.println("USAGE: java SwapColumn fileName column# column#");
System.exit(-1);
}
String dataFileName = args[0];
String columnAText = args[1];
String columnBText = args[2];
int columnA = Integer.parseInt(columnAText);
int columnB = Integer.parseInt(columnBText);
File dataFile = new File(dataFileName);
Scanner input;
String outputText = null;
System.out.printf("dataFileName=%s, columnA=%d, columnB=%d\n",
dataFileName, columnA, columnB);
try {
input = new Scanner(dataFile);
while (input.hasNextLine()) {
String inputText = input.nextLine();
lineNum++;
outputText = swapFields(inputText, columnA, columnB, delimiter);
System.out.printf("%d: %s\n", lineNum, outputText);
}
} catch (FileNotFoundException FNF) {
System.out.printf("file not found: %s\n", dataFileName);
}
}
static String swapFields(String input, int fieldA, int fieldB, String delim) {
String outputBuffer = "";
//code needed here
return outputBuffer;
}
OK, so you want the method to take in a String input delimited by delim, and swap fields fieldA and fieldB?
static String swapFields(String input, int fieldA, int fieldB, String delim) {
String[] bits = input.split(delim);
String temp = bits[fieldA];
bits[fieldA] = bits[fieldB];
bits[fieldB] = temp;
return String.join(delim, bits);
}
In this code, the .split() method breaks the input up into an array, using delim as the separator (interpreted as a regular expression; see below for the assumptions regarding this). The two relevant (zero-indexed) fields are then swapped, and the String is reconstructed using .join().
Note that the last line (the .join()) requires Java 8. If you don't have Java 8 then you can use StringUtils.join from Apache Commons Lang.
I am also assuming here that your delim is in the right format for the .split() method, which is to say that it's a string literal that doesn't contain escapes and other regex characters. This seems like a plausible enough assumption if it's a delimiter in a text file (usually a comma, space or tab). It further assumes that the delimiter doesn't occur elsewhere in the input, within quotes or something. You haven't mentioned anything about quotes; you'd need to add something to clarify if you wanted to be able to handle such things.

Java String - See if a string contains only numbers and characters not words?

I have an array of string that I load throughout my application, and it contains different words. I have a simple if statement to see if it contains letters or numbers but not words .
I mean i only want those words which is like AB2CD5X .. and i want to remove all other words like Hello 3 , 3 word , any other words which is a word in English. Is it possible to filter only alphaNumeric words except those words which contain real grammar word.
i know how to check whether string contains alphanumeric words
Pattern p = Pattern.compile("[\\p{Alnum},.']*");
also know
if(string.contains("[a-zA-Z]+") || string.contains([0-9]+])
What you need is a dictionary of English words. Then you basically scan your input and check if each token exists in your dictionary.
You can find text files of dictionary entries online, such as in Jazzy spellchecker. You might also check Dictionary text file.
Here is a sample code that assumes your dictionary is a simple text file in UTF-8 encoding with exactly one (lower case) word per line:
public static void main(String[] args) throws IOException {
final Set<String> dictionary = loadDictionary();
final String text = loadInput();
final List<String> output = new ArrayList<>();
// by default splits on whitespace
final Scanner scanner = new Scanner(text);
while(scanner.hasNext()) {
final String token = scanner.next().toLowerCase();
if (!dictionary.contains(token)) output.add(token);
}
System.out.println(output);
}
private static String loadInput() {
return "This is a 5gse5qs sample f5qzd fbswx test";
}
private static Set<String> loadDictionary() throws IOException {
final File dicFile = new File("path_to_your_flat_dic_file");
final Set<String> dictionaryWords = new HashSet<>();
String line;
final LineNumberReader reader = new LineNumberReader(new BufferedReader(new InputStreamReader(new FileInputStream(dicFile), "UTF-8")));
try {
while ((line = reader.readLine()) != null) dictionaryWords.add(line);
return dictionaryWords;
}
finally {
reader.close();
}
}
If you need more accurate results, you need to extract stems of your words. See Apache's Lucene and EnglishStemmer
You can use Cambridge Dictionaries to verify human words. In this case, if you find a "human valid" word you can skip it.
As the documentation says, to use the library, you need to initialize a request handler and an API object:
DefaultHttpClient httpClient = new DefaultHttpClient(new ThreadSafeClientConnManager());
SkPublishAPI api = new SkPublishAPI(baseUrl + "/api/v1", accessKey, httpClient);
api.setRequestHandler(new SkPublishAPI.RequestHandler() {
public void prepareGetRequest(HttpGet request) {
System.out.println(request.getURI());
request.setHeader("Accept", "application/json");
}
});
To use the "api" object:
try {
System.out.println("*** Dictionaries");
JSONArray dictionaries = new JSONArray(api.getDictionaries());
System.out.println(dictionaries);
JSONObject dict = dictionaries.getJSONObject(0);
System.out.println(dict);
String dictCode = dict.getString("dictionaryCode");
System.out.println("*** Search");
System.out.println("*** Result list");
JSONObject results = new JSONObject(api.search(dictCode, "ca", 1, 1));
System.out.println(results);
System.out.println("*** Spell checking");
JSONObject spellResults = new JSONObject(api.didYouMean(dictCode, "dorg", 3));
System.out.println(spellResults);
System.out.println("*** Best matching");
JSONObject bestMatch = new JSONObject(api.searchFirst(dictCode, "ca", "html"));
System.out.println(bestMatch);
System.out.println("*** Nearby Entries");
JSONObject nearbyEntries = new JSONObject(api.getNearbyEntries(dictCode,
bestMatch.getString("entryId"), 3));
System.out.println(nearbyEntries);
} catch (Exception e) {
e.printStackTrace();
}
Antlr might help you.
Antlr stands for ANother Tool for Language Recognition
Hibernate uses ANTLR to parse its query language HQL(like SELECT,FROM).
if(string.contains("[a-zA-Z]+") || string.contains([0-9]+])
I think this is a good starting point, but since you're looking for strings that contain both letters and numbers you might want:
if(string.contains("[a-zA-Z]+") && string.contains([0-9]+])
I guess you might also want to check if there are spaces? Right? Because you that could indicate that there are separate words or some sequence like 3 word. So maybe in the end you could use:
if(string.contains("[a-zA-Z]+") && string.contains([0-9]+] && !string.contains(" "))
Hope this helps
You may try this,
First tokenize the string using StringTokenizer with default delimiter, for each token if it contains only digits or only characters, discard it, remaining will be the words which contains combination of both digits and characters. For identifying only digits only characters you can have regular expressions used.

Java - Reading CSV and Converting Object[] values to String values

I am fairly new to Java and am having issue with the code below. The doAggregate method below is reading through the columns of data from a csv file. I want to be able to check the value of fields[i] for special characters and non-escaped quotes and remove them from the value before they are appended. The code as is errors at the point noted below: java.lang.Double cannot be cast to java.lang.String. So it sounds like not all the types of fields will convert to a String value. Is there a good way to test if the String is a String? Or is there a better way of going about this?
public String doAggregate(Object[] fields) {
if (ObjectUtils.isEmpty(fields)) {
return "";
}
if (fields.length == 1) {
return "\"" + ObjectUtils.nullSafeToString(fields[0]) + "\"";
}
String tmp_field_value;
StringBuilder sb = new StringBuilder();
for (int i = 0; i < fields.length; i++) {
if (i > 0) {
sb.append(getDelimiter());
}
sb.append("\"");
//start new code
tmp_field_value = (String) fields[i];
//error on line below
sb.append(getCleanUTF8String(tmp_field_value));
//end new code
//start old code
//sb.append(fields[i]);
//end old code
sb.append("\"");
}
return sb.toString();
}
public String getCleanUTF8String(String dirtyString){
String cleanString = "";
try {
byte[] cleanBytes = dirtyString.getBytes("UTF-8");
cleanString = new String(cleanBytes, "UTF-8");
cleanString = cleanString.replace("\"", "\\\"");
cleanString = cleanString.replace("'", "\\'");
} catch (UnsupportedEncodingException uee){
System.out.println("*******ERROR********: Unable to remove non UTF-8 characters in string: |" + dirtyString + "| -- Java Error Message:" + uee.getMessage());
//TODO - may need to revisit this next line, some additional character checks may need to take place if the character set exclusion fails.
cleanString = dirtyString;
}
return cleanString;
}
instead of doing tmp_field_value = (String) fields[i] do like below code.
if(fields[i]!=null){
tmp_field_value = fields[i].toString();
}

How to match a url from a list of patterns in a textfile?

I have a text file that contains meta-urls in the following form:
http://www.xyz.com/.*services/
http://www.xyz.com/.*/wireless
I want to compare all the patterns from that file with my URL, and execute an action if I find a match. This matching process is hard to understand for me.
Assuming splitarray[0] contains the first line of text file:
String url = page.getWebURL().getURL();
URL url1 = new URL(url);
how can we compare url1 with splitarray[0]?
UPDATED
BufferedReader readbuffer = null;
try {
readbuffer = new BufferedReader(new FileReader("filters.txt"));
} catch (FileNotFoundException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
String strRead;
try {
while ((strRead=readbuffer.readLine())!=null){
String splitarray[] = strRead.split(",");
String firstentry = splitarray[0];
String secondentry = splitarray[1];
String thirdentry = splitarray[2];
//String fourthentry = splitarray[3];
//String fifthentry = splitarray[4];
System.out.println(firstentry + " " + secondentry+ " " +thirdentry);
URL url1 = new URL("http://www.xyz.com/ship/reach/news-and");
Pattern p = Pattern.compile("http://www.xyz.com/.*/reach");
Matcher m = p.matcher(url1.toString());
if (m.matches()) {
//Do whatever
System.out.println("Yes Done");
}
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Matching is working fine... But if I want that any url which start with the pattern giving in the splitarray[0] then do this... how we can implement this... As in the above case it is not matching but this url http://www.xyz.com/ship/w is from this pattern only http://www.xyz.com/.*/reach So any url that starts with this pattern.. just do this thing in the if loop... Any suggestions will be appreciated...!!
You are missing a step here. You first need to translate your URLs to a regular expression, or design a method to use those URLs, then only can you compare your URL url1 to those patterns.
Based on the patterns you have shown, I assume you are designing software for a xyz solution, like their routers. Therefore, your URLs probably fall in a simple pattern style, like
http://www.xyz.com/regular-expression-here
I'm confused as to where the regexes are coming from. The text file? In any case, you'll have a hard time comparing url1 to any regexes because it's a URL object, and regex compares strings. So you'll want to stick with your String url instead.
Try this:
Pattern p = Pattern.compile(splitarray[0]);
Matcher m = p.matcher(url);
if (m.matches()) {
//Do whatever
}
The m.matches() method checks whether the entire String you provide matches the pattern, which is probably what you want here. If you need to check whether part of your String matches, use m.find() instead.
Update
Since you're only looking to match the pattern at the beginning of the String, you'll want to use m.find() instead. The special character ^ only matches at the beginning of a String, so add that to the front of your regex, e.g.:
Pattern p = Pattern.compile("^" + splitarray[0]);
etc.

Categories

Resources