Escape rogue backslashes in a string - java

I have a JSON string like this in the database:
{"name" : "DROCK", "details" : [{"age" : "23", "degree" : "BE", "Experience" : "worked in a company\6as an intern"}]}
I want to remove \ character but which is not a valid escape string. I tried using StringEscapeUtil but it is not helping me out here.
The code I've used here:
import org.apache.commons.lang.xwork.StringEscapeUtils;
public class JSONEscape {
public static void main(String args[]) {
String testString = "{\"name\" : \"DROCK\", \"details\" : [{\"age\" : \"23\", \"degree\" : \"BE\", \"Experience\" : \"worked in a company\6as an intern\"}]}";
System.out.println(StringEscapeUtils.escapeJava(testString));
}
}
How should I escape valid \ characters and remove rogue characters like this. but I would like to retain the valid escape characters like \n, \t,etc.
Finally i would like a result like this:
{"name" : "DROCK", "details" : [{"age" : "23", "degree" : "BE", "Experience" : "worked in a company6as an intern"}]}

If you want to remove bad escape sequences in a JSON string, then that's easy.
String testString = "{\"name\" : \"DROCK\", \"details\" : [\"age\" : \"23\", \"degree\" : \"BE\", \"Experience\" : \"worked in a company\\6as an intern\"]}";
System.out.println(testString.replaceAll("\\([^bfnrt\"\\])", "$1"));
That RegEx literally just means "wherever a backslash is followed by an invalid escape character, drop the backslash".

Related

Why double slashes front of JSON String

I have a JSON string from google, which i can not use it because of the double slashes in front of the JSON string, it returns JSON exception.
How can i escape them before parse?
https://www.google.com/finance/info?q=asels
// [ { "id": "530630671033488" ,"t" : "ASELS" ,"e" : "IST" ,"l" :
"13.72" ,"l_fix" : "13.72" ,"l_cur" : "TRY13.72" ,"s": "0"
,"ltt":"6:10PM GMT+3" ,"lt" : "Jan 20, 6:10PM GMT+3" ,"lt_dts" :
"2017-01-20T18:10:00Z" ,"c" : "+0.04" ,"c_fix" : "0.04" ,"cp" : "0.29"
,"cp_fix" : "0.29" ,"ccol" : "chg" ,"pcls_fix" : "13.68" } ]
You need to remove the two double slashes in beginning, then you have a regular JSON String that you can parse.
You can do it like this:
String yourJsonStringFromGoogle = ...
fixedJsonString = yourJsonStringFromGoogle.substring(2);
Here, substring(n) removes n letters from the beginning of the string.

How to search for a JSON field in elasticsearch using java?

This is JSON that I want to use for a search:
{
"_index" : "test", "_type" : "insert", "_id" : "3",
"_version" : 2, "found" : true,
"_source" : {
"ACCOUNT_ID" : "123",
"CONTACT_ID" : "ABC"
}
}
How do I search for all the JSON which have ACCOUNT_ID starting from 1?
You can use Wildcard in elasticsearch to search for an ACCOUNT_ID which starts from 1
GET index/_search
{
"query": {
"wildcard": {
"ACCOUNT_ID ": {
"value": "1*"
}
}
}
}
In Java, you can try something like this:
QueryBuilders.wildcardQuery("ACCOUNT_ID ", "1*");
From what i see in your comments you are trying to find id's starting with 1 for example. Well if your analyzer is the standard one the id "123" is tokenized like "123". You can use wildcard and search like '1*'. Be careful using wildcards cause it takes some memory.
See here: QueryString - Wildcard

OR Query mongodb from java with "like" and "line break" and "case insensitive" at the same time

This is sample of one document in my mongodb collection page_link_titles:
{
"_id" : ObjectId("553b11f30b81511d64152416"),
"id" : 36470831,
"linkTitles" : [
"Syrian civil war",
"Damascus",
"Geographic coordinate system",
"Bashar al-Assad",
"Al Jazeera English",
"Free Syrian Army",
...
"February 2012 Aleppo bombings",
"2012 Deir ez-Zor bombing",
"Aleppo University bombings"
]
}
I want to find all the documents that the text in their linkTitles contains a phrase like '%term1%' or '%term2%' or (so on). term1 and term2 must have a line break in both sides. For example looking into "Syrian civil war". If term1 = "war" I want this document to be returned as the result of query, however if term1 = "yria" which is a part of a word in this document, it shouldn't be returned.
This is my java code:
for (String term : segment.terms) {
DBObject clause1 = new BasicDBObject("linkTitles",
java.util.regex.Pattern.compile("\\b"
+ stprocess.singularize(term) + "\\b"));
or.add(clause1);
}
DBObject mongoQuery = new BasicDBObject("$or", or);
DBCursor cursor = pageLinks.find(mongoQuery);
In line: java.util.regex.Pattern.compile("\\b"+ stprocess.singularize(term) + "\\b")); I only assumed line break. I don't know how I should write the regex to consider all my conditions : line break, case insensitive, like.
Any ideas?
It is possible to do a regular expression that achieves what you want. You can also use a single regular expression rather using $or.
I'm using the shell for a quick example and wanting to search for boxer or cat. First insert the test data:
db.test.drop()
db.test.insert([
{ "a" : "Boxer One" },
{ "a" : "A boxer dog" },
{ "a" : "A box shouldn't match" },
{ "a" : "should match BOXER" },
{ "a" : "wont match as this it the plural BOXERs" },
{ "a" : "also match on cat" }])
Using the following regular expression we can search for all our terms:
                                       
      /(^|\b)(boxer|cat)(\b|$)/i       
       +---+ +-------+  +---+         
          |       |        |           
          |       |        |           
   Start or space |       Space or end 
                  |                    
              Search terms
                      
And do a find like so:
db.test.find({a: /(^|\b)(boxer|cat)(\b|$)/i})
That query will return the following results:
{ "_id" : ObjectId("555f18eee7b6d1b7e622de36"), "a" : "Boxer One" }
{ "_id" : ObjectId("555f18eee7b6d1b7e622de37"), "a" : "A boxer dog" }
{ "_id" : ObjectId("555f18eee7b6d1b7e622de39"), "a" : "should match BOXER" }
{ "_id" : ObjectId("555f18eee7b6d1b7e622de3b"), "a" : "also match on cat" }
In Java you might build this query up like so:
StringBuilder singularizedTerms = new StringBuilder();
for (String term : terms) {
singularizedTerms.append("|").append(stprocess.singularize(term));
}
String regexPattern = format("(^|\\b)(%s)(\\b|$)", singularizedTerms.substring(1));
Pattern regex = Pattern.compile(regexPattern, Pattern.CASE_INSENSITIVE);
Theres two problems with this approach.
It will be slow
It can't use an index so will do a full scan of the collection, if you have 10 million documents it will check each one!
It won't match plurals
For example it won't match the document containing "BOXERs" because our regular expression explicitly doesn't allow for partial matches!
Text indexes support this. Using an index will make the operation faster as well as matching plural or single values, for example:
db.test.createIndex( { a: "text" } )
db.test.find({ $text: { $search: "boxer cat"}})
{ "_id" : ObjectId("555f18eee7b6d1b7e622de3b"), "a" : "also match on cat" }
{ "_id" : ObjectId("555f18eee7b6d1b7e622de3a"), "a" : "wont match as this it the plural BOXERs" }
{ "_id" : ObjectId("555f18eee7b6d1b7e622de36"), "a" : "Boxer One" }
{ "_id" : ObjectId("555f18eee7b6d1b7e622de37"), "a" : "A boxer dog" }
{ "_id" : ObjectId("555f18eee7b6d1b7e622de39"), "a" : "should match BOXER" }

Regular expression to eliminate some data from string in java?

I have a string data which has some unwanted text. I want to delete that data from string using a regular expression. The example data is given below from which I want to eliminate the key meth and its corresponding data. The regular expression I created for this purpose is "meth(S+)([\\},])" but its not working.
I used the following code to use this regular expression:
json = json.replaceAll("meth(S+)([\\},])","");
Below is the string data that is present in json :
{
data: [
{
"city": "barcelona",
"Date": "4 Apr 2014",
"Name": "A-B",
"meth": function(){
return_LANG=="en-us"?"A-T": "A-T "
},
"fo": "null",
},
{
"city": "Newyork",
"Date": "4 Apr 2014",
"Name": "B-C",
"meth": function(){
return_LANG=="en-us"?"S-E": "शक्तिपुंजएक्स."
},
"fo": "null",
}
]
}
The result I am getting is the same string as my regular expression is unable to find the matching data in String.
Please help me correcting my regex.
This should do the trick:
json = json.replaceAll("(?s)meth.*?\\},", "");
Explanation:
(?s) is for multiline regular expressions. The dot . will then also match new line characters (equal to the flag DOTALL, see Pattern#DOTALL).
.*? searches any letter in a non greedy way till it finds the },
try this
s = s.replaceAll("(?s)\"meth\":.*?},\\s+","");

Elastic Search - get records by starting character of a field

I am using Elastic Search Server. I need to get records based on starting character of a field value in source JSON.
JSON:
Index JSON1 : "{\"id\":\"1\",\"message\":\"welcome to elastic search\"}"
Index JSON2 : "{\"id\":\"1\",\"message\":\"Hellow world\"}"
Code:
String selectedCharacter = "w";
PrefixQueryBuilder queryBuilder = QueryBuilders.prefixQuery("message", selectedCharacter);
builder.setQuery(queryBuilder);
By using the above code, I am getting both the records. I need only 'Index JSON1'. Please give any solution to achieve this. Thanks in advance.
By default, Elasticsearch will "tokenize" string fields.
It means that your message fields are considered as a multiple terms fields. For JSON1 : ["welcome", "to", "elastic", "search"] and JSON2 : ["Hellow", "world"].
When you make your query, ElasticSearch will try to match on of the term, that's why you get JSON1 for the "welcome" term et JSON2 for the "world" term.
If you want your message field to be "untokenized" (treated as a single string), you have to explicitly set the mapping of this field to keyword. This is done by using the Mapping API.
You can look at :
the keyword analyzer doc : http://www.elasticsearch.org/guide/reference/index-modules/analysis/keyword-analyzer/
the mapping API doc : http://www.elasticsearch.org/guide/reference/api/admin-indices-put-mapping/
If you need a keyword analyzer but case-insensitive, you need to define a custom analyzer with a lowercase filter (you will probably need to delete and recreate your index for that). Ex :
$ curl -XPUT 'localhost:9200/test/_settings' -d '
{
"index": {
"analysis" : {
"analyzer" : {
"lowercaseAnalyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": ["lowercase"]
}
}
}
}
}
And then you define your mapping with this custom analyzer instead of keyword :
"message" : {"type" : "string", "analyzer" : "lowercaseAnalyzer"}
You can also test your analyzer using the analyze API. Ex :
$ curl -XGET 'localhost:9200/test/_analyze?analyzer=lowercaseAnalyzer&pretty=true' -d 'Hello world'
{
"tokens" : [ {
"token" : "hello world",
"start_offset" : 0,
"end_offset" : 11,
"type" : "word",
"position" : 1
} ]
}
You can see all the available tokenizers and filters in the analysis documentation : http://www.elasticsearch.org/guide/reference/index-modules/analysis/

Categories

Resources